-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update signature open_dataset for API v2 #4547
Conversation
Add necessary imports for this function.
# Conflicts: # xarray/backends/api.py
…ad-refactor # Conflicts: # xarray/backends/apiv2.py
- to be used in apiv2 without instantiate the object
- modify signature - move default setting inside backends
…2.dataset_from_backend_dataset`
…ay into change-signature-open_dataset � Conflicts: � xarray/backends/apiv2.py
…iour is unchanged.
- add plugins.py cotaining backneds info
xarray/backends/apiv2.py
Outdated
signature = inspect.signature(ENGINES[engine]).parameters | ||
if decode_cf is False: | ||
for d in decoders: | ||
if d in signature and d != "use_cftime": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this special case d != "use_cftime"
? Does it break any tests if we simply remove it?
(My guess is that the existing code may not bother to set use_cftime = False
, but only because the value of use_cftime
is ignored if decode_times = False
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have forgotten to check it. You are right, we can remove it.
decoders = resolve_decoders_kwargs( | ||
decode_cf, | ||
engine=engine, | ||
mask_and_scale=mask_and_scale, | ||
decode_times=decode_times, | ||
decode_timedelta=decode_timedelta, | ||
concat_characters=concat_characters, | ||
use_cftime=use_cftime, | ||
decode_coords=decode_coords, | ||
) | ||
|
||
backend_kwargs = backend_kwargs.copy() | ||
overwrite_encoded_chunks = backend_kwargs.pop("overwrite_encoded_chunks", None) | ||
|
||
open_backend_dataset = _get_backend_cls(engine, engines=ENGINES) | ||
open_backend_dataset = _get_backend_cls(engine, engines=plugins.ENGINES)[ | ||
"open_dataset" | ||
] | ||
backend_ds = open_backend_dataset( | ||
filename_or_obj, | ||
drop_variables=drop_variables, | ||
**decoders, | ||
**backend_kwargs, | ||
**{k: v for k, v in kwargs.items() if v is not None}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness, let me mention an alternative design that would not need inspect.signature
. Instead of using a single function open_backend_dataset
, we could use a "Loader" class with two separate methods, one for decoding a dataset and another for returning a raw dataset, e.g.,
backend_cls = _get_backend_cls(engine)
loader = backend_loader(filename_or_obj, **backend_kwargs)
if decode:
ds = loader.load_decoded(**decoders)
else:
ds = loader.load_raw()
Is this better than the current approach in this PR (using inspect), or the current approach (on master) of calling the single open_backend_dataset()
function with decode_cf=False
? Honestly I'm not sure. It is most explicit, but also probably a little more annoying to write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that in this way we are going to complicate the interface without a real advantage. But I'm not sure about it. @alexamici, @jhamman what do you think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer the main reason we proposed the use inspect
is to raise an appropriate error message when a backend doesn't support a specific decoding option.
Your proposal simplifies the mangling of the decode options, but I'd still use inspect
in the same as we do now before calling load_decode
.
Personally I'd favour keeping the current implementation.
raise TypeError( | ||
f'All the parameters in {engine["open_dataset"]!r} signature should be explicit. ' | ||
"*args and **kwargs is not supported" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it correctly implements my suggestion from last week 👍 .
I don't know how annoying backend authors would find this restriction to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current implementation a backend developer can add the "signature"
key explicitly to the engine dictionary to skip running the inspection.
This allows also providing a "open_dataset"
as a C-function.
I like the current implementation.
As discussed, let's stick with the current PR @aurghs @alexamici you should have "merge" permissions now! |
I fixed the type ints and the two test failures are unrelated to the changes. I'm going to test my shiny new merge rights :) |
if "signature" not in engine: | ||
parameters = inspect.signature(engine["open_dataset"]).parameters | ||
for name, param in parameters.items(): | ||
if param.kind in ( | ||
inspect.Parameter.VAR_KEYWORD, | ||
inspect.Parameter.VAR_POSITIONAL, | ||
): | ||
raise TypeError( | ||
f'All the parameters in {engine["open_dataset"]!r} signature should be explicit. ' | ||
"*args and **kwargs is not supported" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a lot of indentation! I think it might be a bit easier to read if we get rid of one level of indentation using
if "signature" in engine:
continue
nothing urgent, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are completely right!
There is some style fix that I really need to do. I will do a new pull request.
Proposal for the new API of
open_dataset()
. It is implemented inapiv2.py
and it doesn't modify the current behavior ofapi.open_dataset()
.It is something in between the first and second alternative suggested at #4490 (comment), see the related quoted text:
Instead of a class for the decoders, I have added a function:
resolve_decoders_kwargs
.resolve_decoders_kwargs
performs two tasks:False
, it sets toFalse
all the decoders supported by the backend (usinginspect
).So xarray manages the keyword decode_cf and passes on only the non-default decoders to the backend. If the user sets to a non-None value a decoder not supported by the backend, the backend will raise an error.
With this implementation
drop_variable
should be always supported by the backend. But I think this could be implemented easely by all the backends. I wouldn't group it with the decoders: to me, it seems to be more a filter rather than a decoder.The behavior
decode_cf
is unchanged.PRO:
open_dataset
.open_backend_dataset_${engine}
API the accepted decoders.Missing points:
deccode_cf
should be renameddecode
. Probably, the behavior ofdecode
should be modified for two reason:decode_cf
isFalse
, it sets the decoders toFalse
, but there is no check on the other values. The accepted values should be:None
(it keeps decoders default values), True (it sets all the decoders toTrue
),False
(it sets all the decoders toFalse
).decode_cf
without any warning. , but theI think that we need a different PR for the three of them.
isort . && black . && mypy . && flake8