Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding warning use PEP 597 env var PYTHONWARNDEFAULTENCODING #733

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

DanielYang59
Copy link
Contributor

@DanielYang59 DanielYang59 commented Dec 20, 2024

Summary

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced warning handling in tests related to encoding practices.
  • New Features

    • Improved test suite for encoding warnings, allowing dynamic environment manipulation.
  • Refactor

    • Updated type hints across various functions for clarity and modern syntax.
    • Simplified parameter types in multiple methods to enhance readability.
    • Clarified argument passing in function calls for improved code clarity.

Copy link

coderabbitai bot commented Dec 20, 2024

Walkthrough

This pull request focuses on modernizing type hints across multiple files in the monty library. The changes primarily involve updating type annotations from older Union and Optional types to the more concise | union syntax introduced in Python 3.10. These modifications enhance type clarity and readability while maintaining the existing functionality of the codebase. The changes span multiple modules, including bisect.py, dev.py, functools.py, io.py, and others, consistently applying the new type hinting approach.

Changes

File Change Summary
.github/workflows/test.yml Updated matrix variable from matrix.python to matrix.python-version
src/monty/bisect.py Updated index function type hint from Optional[float] to `float
src/monty/dev.py Updated type hints for deprecated decorator parameters
src/monty/functools.py Replaced Union[list, tuple] with `list
src/monty/io.py Updated type hints for zopen and reverse_readfile functions
src/monty/os/* Updated type hints using `str
src/monty/serialization.py Enhanced type hints for loadfn and dumpfn functions
src/monty/shutil.py Updated type hints for compress_file and decompress_file
src/monty/string.py Updated list_strings function type hint
src/monty/subprocess.py Updated run method type hint
src/monty/tempfile.py Updated ScratchDir class __init__ method type hint
tests/test_io.py Added monkeypatch parameter to test_warnings method

Assessment against linked issues

Objective Addressed Explanation
Handling MAGMOM settings from MP API This PR does not address the MAGMOM handling issue mentioned in the linked issue.

Poem

🐰 Type hints dancing light and free,
From Union to |, a syntax spree!
Modernizing code with rabbit's grace,
Clarity blooms in each type's embrace.
Monty's library, now sleek and bright! 🌟

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@DanielYang59 DanielYang59 force-pushed the encoding-warning-follow-pep597 branch from 8fc671b to 3e1ffc5 Compare December 20, 2024 05:44
@DanielYang59 DanielYang59 force-pushed the encoding-warning-follow-pep597 branch from e7d9c76 to 2cd39ce Compare December 20, 2024 09:36
Copy link

codecov bot commented Dec 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.18%. Comparing base (26acf0b) to head (8bc0763).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #733      +/-   ##
==========================================
- Coverage   84.19%   84.18%   -0.01%     
==========================================
  Files          27       27              
  Lines        1670     1669       -1     
  Branches      314      314              
==========================================
- Hits         1406     1405       -1     
  Misses        202      202              
  Partials       62       62              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DanielYang59 DanielYang59 marked this pull request as ready for review December 20, 2024 09:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
.github/workflows/test.yml (1)

Windows testing should be re-enabled for path handling verification

The review comment is valid. The evidence shows that:

  1. Windows testing is explicitly commented out in the workflow (#, windows-latest)
  2. The codebase handles path operations in monty.os module, including path manipulation and directory operations
  3. While the test suite includes path-related tests, they are not being verified on Windows platform
  4. The path handling code uses OS-agnostic methods (os.path), but Windows-specific edge cases should still be verified

Recommendations:

  • Uncomment windows-latest in the test matrix to ensure cross-platform compatibility
  • Consider adding Windows-specific test cases for path handling edge cases (backslashes, drive letters)
🔗 Analysis chain

Line range hint 9-10: Verify Windows compatibility

Windows testing appears to be intentionally disabled. Since the changes involve path handling in monty.os, it would be beneficial to verify Windows compatibility.


Let me check the specific Windows-related path handling in the relevant files, particularly monty.os.


Let me try a different approach to check for Windows-specific handling and tests.


Let me check for test files in a different way and look for any Windows-specific test cases.


Let me check the specific test cases for OS-related functionality and the current workflow configuration.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for Windows-specific path handling code
rg -l 'os\.path\.sep|ntpath|\\\\|/|os\.name.*nt' src/monty/

Length of output: 467


Script:

#!/bin/bash
# Check Windows-specific path handling in monty.os
rg -A 5 'os\.name.*nt|ntpath|os\.path\.sep' src/monty/os/__init__.py

# Also check for any Windows-specific tests
fd test_ src/monty/os/ --exec rg -l 'windows|nt|os\.name'

Length of output: 130


Script:

#!/bin/bash
# Check content of monty.os for Windows handling
cat src/monty/os/__init__.py

# Look for test files in monty/os directory
fd . src/monty/os/ --type f --full-path | grep test

Length of output: 1511


Script:

#!/bin/bash
# Look for test files in the entire project related to monty.os
fd test.*\.py --exec rg -l "os\.(name|path|sep)|windows|nt" {}

# Check if there are any existing GitHub workflow files that include Windows
rg -l "windows|Windows" .github/workflows/

Length of output: 725


Script:

#!/bin/bash
# Check the OS-specific test cases
cat tests/test_os.py

# Check the current workflow configuration for Windows
cat .github/workflows/test.yml

Length of output: 4093

🧹 Nitpick comments (5)
src/monty/bisect.py (1)

Line range hint 22-31: Consider enhancing error messages

The ValueError could be more descriptive by including the value that wasn't found and the tolerance used.

-    raise ValueError
+    raise ValueError(f"Value {x} not found in list{f' within tolerance {atol}' if atol is not None else ''}")
src/monty/serialization.py (1)

47-48: Verify docstring format parameter description

The docstring format parameter description uses quotes around the literal values, which might be confusing as it differs from the type hint syntax. Consider updating for consistency.

-        fmt ("json" | "yaml" | "mpk"): If specified, the fmt specified would
+        fmt (Literal["json", "yaml", "mpk"]): If specified, the fmt specified would

Also applies to: 108-109

src/monty/shutil.py (1)

79-81: Consider adding type hints for return values

While updating the type hints, consider adding explicit return type annotations to the functions for better type safety.

def compress_file(
    filepath: str | Path,
    compression: Literal["gz", "bz2"] = "gz",
    target_dir: str | Path | None = None,
-) -> None:
+) -> None:  # explicitly document that function doesn't return anything

Also applies to: 133-135

src/monty/io.py (1)

79-85: Simplify kwargs.get() call and consider adding docstring note

The implementation correctly follows PEP 597 for encoding warnings. However, there are two suggestions:

  1. The kwargs.get() call can be simplified
  2. Consider documenting the PYTHONWARNDEFAULTENCODING behavior in the function's docstring
-    if (
-        "t" in mode
-        and kwargs.get("encoding", None) is None
-        and os.getenv("PYTHONWARNDEFAULTENCODING", False)
-    ):
+    if (
+        "t" in mode
+        and kwargs.get("encoding") is None
+        and os.getenv("PYTHONWARNDEFAULTENCODING", False)
+    ):

Also, consider adding this note to the docstring:

    Notes:
        When PYTHONWARNDEFAULTENCODING environment variable is set (PEP 597),
        a warning will be issued for text mode operations without explicit encoding.
🧰 Tools
🪛 Ruff (0.8.2)

83-83: Use kwargs.get("encoding") instead of kwargs.get("encoding", None)

Replace kwargs.get("encoding", None) with kwargs.get("encoding")

(SIM910)

tests/test_io.py (1)

435-461: LGTM: Well-structured test for PEP 597 encoding warnings

The test properly validates the behavior of encoding warnings with and without the PYTHONWARNDEFAULTENCODING environment variable. The warning filters are correctly configured to catch specific warning types.

Consider adding a comment explaining the test cases to improve maintainability:

    def test_warnings(self, extension, monkeypatch):
+       """Test encoding warnings based on PEP 597:
+       1. Verify warning when PYTHONWARNDEFAULTENCODING is set
+       2. Verify no warning when PYTHONWARNDEFAULTENCODING is not set
+       3. Verify implicit mode warnings
+       """
🧰 Tools
🪛 Ruff (0.8.2)

444-444: Undefined name EncodingWarning. Consider specifying requires-python = ">= 3.10" or tool.ruff.target-version = "py310" in your pyproject.toml file.

(F821)


456-456: Undefined name EncodingWarning. Consider specifying requires-python = ">= 3.10" or tool.ruff.target-version = "py310" in your pyproject.toml file.

(F821)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26acf0b and c4bedb8.

📒 Files selected for processing (13)
  • .github/workflows/test.yml (1 hunks)
  • src/monty/bisect.py (1 hunks)
  • src/monty/dev.py (2 hunks)
  • src/monty/functools.py (2 hunks)
  • src/monty/io.py (3 hunks)
  • src/monty/os/__init__.py (3 hunks)
  • src/monty/os/path.py (2 hunks)
  • src/monty/serialization.py (4 hunks)
  • src/monty/shutil.py (3 hunks)
  • src/monty/string.py (2 hunks)
  • src/monty/subprocess.py (1 hunks)
  • src/monty/tempfile.py (1 hunks)
  • tests/test_io.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_io.py

444-444: Undefined name EncodingWarning. Consider specifying requires-python = ">= 3.10" or tool.ruff.target-version = "py310" in your pyproject.toml file.

(F821)


456-456: Undefined name EncodingWarning. Consider specifying requires-python = ">= 3.10" or tool.ruff.target-version = "py310" in your pyproject.toml file.

(F821)

src/monty/io.py

83-83: Use kwargs.get("encoding") instead of kwargs.get("encoding", None)

Replace kwargs.get("encoding", None) with kwargs.get("encoding")

(SIM910)

🔇 Additional comments (15)
.github/workflows/test.yml (1)

19-22: LGTM: Variable naming follows GitHub Actions best practices

The change from matrix.python to matrix.python-version aligns with GitHub Actions' conventional naming patterns, improving clarity and maintainability.

src/monty/os/__init__.py (1)

25-25: LGTM: Type hints modernized per PEP 604

The update from Union[str, Path] to str | Path aligns with PEP 604 and Python 3.10+ compatibility requirements. The changes maintain the same type safety while improving readability.

Also applies to: 45-45

src/monty/bisect.py (1)

22-22: LGTM: Type hints modernized per PEP 604

The update from Optional[float] to float | None aligns with PEP 604 and improves code readability while maintaining the same type safety.

src/monty/subprocess.py (1)

64-64: LGTM! Type hint modernization looks good.

The change from Optional[float] to float | None aligns with PEP 604's union type syntax. The implementation remains thread-safe with proper timeout handling.

src/monty/string.py (2)

10-10: LGTM! Good practice with TYPE_CHECKING import.

The import of Any under TYPE_CHECKING condition helps reduce runtime overhead.


37-37: LGTM! Type hint modernization looks good.

The change from Union[str, Iterable[str]] to str | Iterable[str] aligns with PEP 604's union type syntax. The implementation correctly handles both input types with proper type casting.

src/monty/os/path.py (2)

15-15: LGTM! Good practice with TYPE_CHECKING import.

The import of Callable and Literal under TYPE_CHECKING condition helps reduce runtime overhead.


44-46: LGTM! Type hint modernization looks good.

The changes align with PEP 604's union type syntax:

  • exts: Union[str, list[str]]str | list[str]
  • exclude_dirs: Optional[str]str | None
  • include_dirs: Optional[str]str | None

The implementation correctly handles all input types with proper wildcard pattern matching.

src/monty/serialization.py (1)

25-25: LGTM: Type hint modernization properly implemented

The update to modern type hint syntax (using | instead of Union) and the introduction of Literal type for fmt parameter improves type safety by restricting the allowed values to specific strings.

Also applies to: 28-33

src/monty/tempfile.py (1)

42-42: LGTM: Type hint updated to modern syntax

The update from Union[str, Path, None] to str | Path | None aligns with PEP 604 while maintaining the same functionality.

src/monty/shutil.py (1)

15-15: LGTM: Type hints consistently updated

The changes properly implement modern type hint syntax across the file:

  • Removed unnecessary Optional import
  • Updated type hints from Optional[str | Path] to str | Path | None
  • Maintained consistency across similar parameters

Also applies to: 79-79, 133-133

src/monty/dev.py (1)

20-20: LGTM: Type hint modernization

The changes correctly implement PEP 604 union types, replacing Optional[T] with T | None and Union[T1, T2] with T1 | T2. This improves code readability while maintaining the same type safety.

Also applies to: 26-26, 28-28

src/monty/functools.py (1)

16-16: LGTM: Type hint modernization

The changes correctly implement PEP 604 union types, improving code readability while maintaining type safety.

Also applies to: 133-133

src/monty/io.py (1)

22-22: LGTM: Type hint modernization

The changes correctly implement PEP 604 union types, improving code readability while maintaining type safety.

Also applies to: 26-26, 174-174

tests/test_io.py (1)

429-429: LGTM: Warning check properly added

The FutureWarning check for LZW compression is correctly placed within the context manager block.

tests/test_io.py Show resolved Hide resolved
@DanielYang59 DanielYang59 marked this pull request as draft December 20, 2024 10:03
@DanielYang59
Copy link
Contributor Author

@esoteric-ephemera Is current change looking good to you? i.e. use the PYTHONWARNDEFAULTENCODING environment variable from PEP 597 to enable that optional EncodingWarning which would be otherwise disabled by default.

Meanwhile I didn't find where for loadfn an encoding warning would be emitted as explicit encoding has been added, maybe you could share me a code?

if fmt == "mpk":
if msgpack is None:
raise RuntimeError(
"Loading of message pack files is not possible as msgpack-python is not installed."
)
if "object_hook" not in kwargs:
kwargs["object_hook"] = object_hook
with zopen(fn, "rb") as fp:
return msgpack.load(fp, *args, **kwargs) # pylint: disable=E1101
else:
with zopen(fn, "rt", encoding="utf-8") as fp:
if fmt == "yaml":
if YAML is None:
raise RuntimeError("Loading of YAML files requires ruamel.yaml.")
yaml = YAML()
return yaml.load(fp, *args, **kwargs)
if fmt == "json":
if "cls" not in kwargs:
kwargs["cls"] = MontyDecoder
return json.load(fp, *args, **kwargs)

@esoteric-ephemera
Copy link
Contributor

Could you set the default encoding for zopen to be utf-8 and nix the warning altogether?

@DanielYang59
Copy link
Contributor Author

DanielYang59 commented Dec 21, 2024

Could you set the default encoding for zopen to be utf-8 and nix the warning altogether?

I believe this is almost what we're doing:

monty/src/monty/io.py

Lines 82 to 90 in 26acf0b

# Warn against default `encoding` in text mode
if "t" in mode and kwargs.get("encoding", None) is None:
warnings.warn(
"We strongly encourage explicit `encoding`, "
"and we would use UTF-8 by default as per PEP 686",
category=EncodingWarning,
stacklevel=2,
)
kwargs["encoding"] = "utf-8"

Except for after this patch, the encoding warning would only be emitted after PYTHONWARNDEFAULTENCODING is set (I believe ** this would not be disruptive as no warning would be emitted by default**).

I personally prefer to give user the option to turn on this warning:

Does this sound good to you?

@DanielYang59 DanielYang59 force-pushed the encoding-warning-follow-pep597 branch from 837a6c1 to 8bc0763 Compare December 21, 2024 09:45
@DanielYang59 DanielYang59 marked this pull request as ready for review December 24, 2024 02:14
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/monty/io.py (1)

80-89: LGTM: PEP 597 encoding warning implementation

The implementation correctly follows PEP 597 by checking the PYTHONWARNDEFAULTENCODING environment variable before emitting the encoding warning.

Consider simplifying kwargs.get("encoding", None) to kwargs.get("encoding") since None is the default return value of the get method:

-    if "t" in mode and kwargs.get("encoding", None) is None:
+    if "t" in mode and kwargs.get("encoding") is None:
🧰 Tools
🪛 Ruff (0.8.2)

82-82: Use kwargs.get("encoding") instead of kwargs.get("encoding", None)

Replace kwargs.get("encoding", None) with kwargs.get("encoding")

(SIM910)


87-87: Undefined name EncodingWarning. Consider specifying requires-python = ">= 3.10" or tool.ruff.target-version = "py310" in your pyproject.toml file.

(F821)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4bedb8 and 8bc0763.

📒 Files selected for processing (4)
  • src/monty/io.py (6 hunks)
  • src/monty/re.py (1 hunks)
  • src/monty/serialization.py (6 hunks)
  • src/monty/shutil.py (5 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/monty/re.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/monty/serialization.py
  • src/monty/shutil.py
🧰 Additional context used
🪛 Ruff (0.8.2)
src/monty/io.py

82-82: Use kwargs.get("encoding") instead of kwargs.get("encoding", None)

Replace kwargs.get("encoding", None) with kwargs.get("encoding")

(SIM910)


87-87: Undefined name EncodingWarning. Consider specifying requires-python = ">= 3.10" or tool.ruff.target-version = "py310" in your pyproject.toml file.

(F821)

🔇 Additional comments (3)
src/monty/io.py (3)

22-22: LGTM: Type hint updates follow PEP 604

The changes to use the union operator (|) instead of Union are consistent with modern Python type hinting practices and improve code readability.

Also applies to: 26-26, 172-172


144-144: LGTM: Consistent use of explicit modes

The updates to use explicit mode="rb" in zopen calls align with the function's documentation about requiring explicit binary/text modes in future versions.

Also applies to: 190-190


43-44: LGTM: Clear documentation of encoding behavior

The docstring clearly communicates that UTF-8 will be used as the default encoding when none is specified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

What's the correct way to use the MAGMOM's value when using MP API data?
2 participants