-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix _legalize_path
types
#5224
base: master
Are you sure you want to change the base?
Fix _legalize_path
types
#5224
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
79cf3fd
to
6a1f684
Compare
80c3daa
to
860daa9
Compare
b53b089
to
48f6d71
Compare
@wisp3rwind this is ready for a review - trying to revive this work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice seeing some progress here!
I think it's a great observation that there's only really one caller to legalize_path
(i.e. Item.destination
), which makes it straightforward to adapt the signature of legalizat_path
to get rid of all its weirdness. This seems like the right thing to do!
It seems correct that the input to legalize_path
should always be Unicode: The source it always a template evaluation, which is performed using Unicode strings, rather than something filesystem-specific or bytes. The output is most naturally bytes
, since it's our canonical representation for things supposed to be passed to the filesystem.
At a relatively superficial glance, this PR looks reasonable to me (the one inline comment that I left is probably out-of-scope).
One thing I'm unsure about is the use truncate
(but I'm not sure it made sense before, either): If the purpose of truncation is to deal with filesystem limitations, this should model as closely as possible what matters to the underlying filesystem. Previously, truncation could be performed on str
or bytes
depending on the fragment
flag (notably, not depending on platform or filesystem). Now, it's always based on Unicode code points: In one sense, this is an improvement, since the result will always be valid UTF-8. On the other hand, the byte-length of the result is not bounded by the specified length. Ideally, I feel like this should attempt to truncate at Unicode graphemes boundaries, but measure length in terms of encoded bytes or Unicode code points depending on platform/filesystem.
It's also unexpected that the previous implementation used displayable_path
in the legalization procedure: That is meant to remove non-printable characters, but does that align with what makes sense to be part of a filename? Maybe it would be interesting to check the git blame
whether it has some rationale for its use here?
In any case, the previous solution doesn't seem to be ideal either (or, at least, it lacks sufficient documentation to make sense of it for me), so perfecting the truncation might be worth opening a tracking issue for, but out-of-scope in this PR. Do we have any bug reports that might be related to path truncation and legalization?
48f6d71
to
bba0fdc
Compare
I followed your suggestion and had a look at the history of this functionality: this PR and this issue may be the most informative. Adrian nicely summarised the point of having to use bytes in this comment:
See the rest of the thread around this comment. It seems like this may be less relevant in Python 3 ;) |
All of this functionality was written back in the day when some useful libraries did not exist yet. Take pathvalidate, for example, which could be helpful for us here. |
Background The `_legalize_stage` function was causing issues with Mypy due to inconsistent type usage between the `path` and `extension` parameters. This inconsistency stemmed from the `fragment` parameter influencing the types of these variables. Key issues 1. `path` was defined as `str`, while `extension` was `bytes`. 2. Depending on `fragment`, `extension` could be either `str` or `bytes`. 3. `path` was sometimes converted to `bytes` within `_legalize_stage`. Item.destination` method - The `fragment` parameter determined the output format: - `False`: Returned absolute path as bytes (default) - `True`: Returned path relative to library directory as str Thus - Rename `fragment` parameter to `relative_to_libdir` for clarity - Ensure `Item.destination` returns `bytes` in all cases - Code expecting strings now converts the output to `str` - Use only `str` type in `_legalize_stage` and `_legalize_path` functions - These functions are no longer dependent on `relative_to_libdir`
fe0083b
to
96e011e
Compare
Description
Part 2 of the work fixing types in
beets.util.__init__
#5215.Mypy was not happy here because
_legalize_stage
function implementation concatenatespath
andextension
parameters, implying that their types need to match.You can see that initially
path
parameter was defined as astr
whileextension
wasbytes
.In reality, depending on the
fragment
parameter value,extension
was sometimes provided as astr
and sometimes asbytes
. The same parameter decided whetherpath
gets converted intobytes
within_legalize_stage
implementation. No surprise that mypy was confused here._legalize_stage
is only used withinItem.destination
method implementation which is wherefragment
is defined. I determined that thefragment
parameter controls the form of the output path:fragment=False
returned absolute path as bytes (default)fragment=True
returned path relative to the library directory as str.Given the above, the change
Renames
fragment
parameter torelative_to_libdir
for clarityMakes
Item.destination
to return the same type in both cases. I pickedbytes
since that's the type that majority of the code using this method expects.I converted the output path to
str
for the code that has been expecting a string there.Decouples
_legalize_stage
and_legalize_path
implementations from therelative_to_libdir
. The logic now usesstr
type only.