-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
otk/utils: introduce Annotated* objects #280
base: main
Are you sure you want to change the base?
Conversation
4fae95d
to
9d8acdf
Compare
one of the examples of a "new" error message is in
|
9d8acdf
to
ed674bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still really happy about the possibilities that having this metadata unlocks in the way of error messages and user friendliness but also still very on the fence of the current approach of pretending-to-be-builtin-objects-but-not really.
Did you consider any other approaches and if so; why does the current approach fit best?
You've looked into this a lot more than I have but let me say very naively, would something like this:
class Annotated[T]:
value: T
and then passing Annotated
around and adjusting our code to refer to Annotated.value
fit? If we need to we can still implement all the dunders on Annotated
but I'd prefer to avoid that.
targetless_keys = [key for key in deserialized_data | ||
if not key.startswith(PREFIX)] | ||
targetless_keys = [ | ||
f" * \"{key}\" in {deserialized_data[key].get_annotation('src')}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I drop in a key, goobers:
into the top level I get a traceback from this:
File "/root/otk/src/otk/document.py", line 38, in __init__
Omnifest.ensure(tree)
File "/root/otk/src/otk/document.py", line 53, in ensure
f" * \"{key}\" in {deserialized_data[key].get_annotation('src')}"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get_annotation'
so it probably also needs a None check :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran all possible tests and examples - how did you do that, so I can reproduce?
Not sure if the error message makes sense in this case then :-/
I'm pretty sure that with this approach you would have to re-implement even more, including all handling of the tree, write-to-file operations and json serialization to pass to externals etc. 🤔 In my current implementation it mainly was changing the types and some special cases like list comprehension |
What would change concretely in the handling of the tree? The write-to-file and JSON serialization would be be a 5-line thing (unless I'm missing something it'd need a custom encoder class that typechecks and returns
|
@supakeen I did some small unit-test-like things, your approach could also work. |
@schuellerf I do like explicitness. Let's wait for @mvo5 to chime in as well if he feels that that might be a better approach. |
@supakeen @mvo5 I put up the "other" version as a "wrapper class" as Simon suggested to try. e.g. context.py |
33c167f
to
7e9bf68
Compare
3d368d0
to
898b131
Compare
898b131
to
580e4da
Compare
pre-commit does not seem to support PYTHONPATH which is needed to find things like "otk.command" so we'll workaround by adding `src` with an "init-hook" of pylint
The Objects represent all basic types but with "hidden" annotations to be able to track the source of the data from the originating yaml file.
to be able to test older versions of python all directories with `venv.*` are ignored
as we explicitly support python 3.9 and the implementations of pathlib are different we need to have special support as we want to inherit from pathlib.
580e4da
to
5acf377
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through this a bit quickly so I only have some surface-level comments for now.
def squash_annotations(cls, annotation_list: list) -> dict: | ||
""" just aggregates all annotations in the list. | ||
|
||
new annotation keys get appended to the resulting dict | ||
|
||
duplicate annotations will get converted to string | ||
(if you need an exception for a key please implement here) | ||
|
||
for "src" there is an exception | ||
this is expected to be in the form "filename:line_number" | ||
and will be sanely unified | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a rather poor docstring for a method that isn't very obvious or easy to read.
The first line starting with "just" doesn't help. "Aggregates all annotations in the list" would be a good first line for it, the "just" makes it sound like things will be simple, but the function body suggests otherwise. But I would also add "into a dictionary keyed by [something]".
"New annotation keys get appended to the dict". Isn't every key here new? How can there be old annotation keys? This looks like a pure function (it's annotated as a classmethod but it's actually static right? cls
isn't used).
"Duplicate annotations will get converted to a string". Does this mean unique annotations are different types? What does that mean for the result?
"(if you need an exception for a key please implement here)". I would change this to:
"Exceptions for specific keys are implemented here."
and then follow it with:
Key exceptions:
- src: ...
But also, I'm not entirely clear what the standard behaviour is like so I don't really understand what the exception is excepting.
@@ -176,29 +177,25 @@ def squash_annotations(cls, annotation_list: list) -> dict: | |||
|
|||
|
|||
class AnnotatedList(Generic[AnnotatedListT], AnnotatedBase, list): | |||
def __init__(self, seq: list[Any] | None = None) -> None: | |||
def __init__(self, seq: Optional[list[Any]] = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change (and the other type annotation changes in this commit) doesn't seem relevant to the commit and its message.
if sys.version_info >= (3, 10): | ||
class AnnotatedPath(AnnotatedPathBase, pathlib.Path): | ||
def __init__(self, *args: Any, **kwargs: Any) -> None: | ||
pathlib.Path.__init__(self, *args, **kwargs) | ||
AnnotatedPathBase.__init__(self) | ||
else: | ||
class AnnotatedPath(AnnotatedPathBase, pathlib.PosixPath): # pylint: disable=abstract-method | ||
def __init__(self, *args: Any) -> None: # pylint: disable=W0613 | ||
pathlib.PosixPath.__init__(self) | ||
AnnotatedPathBase.__init__(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a common behaviour we can use here instead of having potentially different behaviour when switching Python versions? Does using PosixPath
everywhere cause issues?
I'm looking at this and thinking it'll be a pain to troubleshoot AnnotatedPath
behaviour if it does slightly different things based on interpreter version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of pathlib changed in those python versions and only has to be handled explicitly by code that directly calls the "init" method which you usually don't do but in this case it's needed as we inherit here and the object doesn't seem to be properly instantiated otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR and thank you for improving the errors and the path information.
I spend a lot of time looking at this, I have many comments but I don't have the time to write them up in a nice and proper way (sorry!). I wrote down a bunch though and I hope some are helpful.
Overall I would like that we try to simplify it more, there is a lot going on and I see the risk that the extra complexity will make our life in the future harder when we work with transform.py. The fact that we have to subclass every basic type shows the cost for the feature is high and we run into the issues like bools or None that cannot be subclassed (and OtkBool(True) is True
will be false therefore) and we now need to be conscious about the annotations in many places make me uneasy. Even though I spend a good deal of time I don't think I spend nearly enough. I pushed a bunch of ideas into https://github.com/osbuild/otk/compare/main...mvo5:annotations-2?expand=1 which is a variant of your code/ideas with (hopefully but this is of course subjective) reduces some of the complexity by (trying) to hide some details. But that branch does not have enough tests (I think test_annotate(tmp_path) needs the various joins/merges as well for completeness) so it's hard to say if it reasonable or not (or what is missing). I added some inline comments, some/a bunch are "addressed" (or tried to get addressed) in the branch I linked to.
Sorry that I'm so difficult and I hope this is useful nevertheless. I understand that my comments are probably quite terse and maybe hard to understand, we can have a hangout or other sync to talk in person if you prefer that and if there are questions.
for key, value in self.items(): | ||
self[key] = self.deep_convert(value) | ||
|
||
def get_annotation(self, key: str, default: Any = None) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the desire to write a nice generic way to do annotations. At the same time, going more into YAGNI/KISS land might be beneficial here, AFAICT we only have a single annotation that we care about ("src") and the rest is there because we might need them later. We could probably archive the same via a single otk annotation that becomes just a dataclass so we can still expand it.
My strawman would be to have Otk*
classes with just otk_src
as property. So Otk{Dict,List}
all simple subclasses with just a mixin.
I played around with this branch a bit in a version and I think just using a otk_src
property in a base mix is probably all we nee, somethng like https://github.com/osbuild/otk/compare/main...mvo5:annotations-2?expand=1#diff-7a465d7d1dc04950d6fb0e7f8e2a314b952542824dd3ef0145b7dc00947f67d8R9
|
||
return ret | ||
|
||
def add_source_attributes(self, key_data, prefix=""): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This add a tight coupling of AnnotatedBase to an yaml node, i.e. Annotated node now needs to know details of yaml nodes, having the helper outside would reduce this coupling.
|
||
@classmethod | ||
def deep_convert(cls, data: Any) -> Any: | ||
"""Recursively convert nested dicts and lists into AnnotatedDict and HiddenAttrList.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT this is only needed when we we load data from the externals where it comes in via json. might be worth a comment. For the yaml the values are constructed from the ground up, so if we hook into the constructoin we always have fully Annoated* things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think this might be obsolete when also all tests use annotated objects when interacting with internals. I wanted to collect feedback first before changing all tests.
|
||
@classmethod | ||
def deep_dump(cls, data: Any) -> Any: | ||
""" Converting to builtin types is usually not necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT this can be done in a slightly simpler way with a custom json encoder, c.f. https://github.com/osbuild/otk/compare/main...mvo5:annotations-2?expand=1#diff-7a465d7d1dc04950d6fb0e7f8e2a314b952542824dd3ef0145b7dc00947f67d8R65 - it is only needed for custom bools (which is a bit sad)
self.set_annotation(f"{prefix}column", column) | ||
self.set_annotation(f"{prefix}column_end", column_end) | ||
|
||
def __add__(self, o): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaict the various *add is only used in a test, could we YAGNI it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will also be used when implementing more otk operations?
|
||
return [resolve(ctx, state, val) for val in tree] | ||
# list comprehension need a separate copy of all annotations | ||
new_state.set_annotations(tree.get_annotations()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above about that the user of the list now needs to remember to get/set, in https://github.com/osbuild/otk/compare/main...mvo5:annotations-2?expand=1#diff-7e2cad722099400fcad150a92cd04d88d586940ecd51139b26037b58365ccc34R137 it is:
- return [resolve(ctx, state, val) for val in tree]
+ return tree.otk_dup([resolve(ctx, state, val) for val in tree])
] | ||
|
||
|
||
def save_yaml_constructors(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a custom loader by using it instead of adding to SafeLoader we can avoid the save/restore helpers here and also make this (IMHO) slightly more uniform, see https://github.com/osbuild/otk/compare/main...mvo5:annotations-2?expand=1#diff-6c473e728657cf3b050b5222403efc9dcbb4a91902e878a476b5ead8d8ea7db2R13
|
||
# callbacks to store information about the source of all data in the yaml files | ||
saved_constructors = save_yaml_constructors() | ||
yaml.SafeLoader.add_constructor(yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, annotated_dict_constructor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my suggestion above about how the loader could be slightly more uniform by extending the already existing custom class
validate_var_name(name) | ||
|
||
cur_var_scope = self._variables | ||
parts = name.split(".") | ||
for i, part in enumerate(parts[:-1]): | ||
if not isinstance(cur_var_scope.get(part), dict): | ||
self._maybe_log_var_override(cur_var_scope, parts[:i+1], {".".join(parts[i+1:]): value}) | ||
cur_var_scope[part] = {} | ||
cur_var_scope[part] = AnnotatedDict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this area needs more work (and a test), I played with it in https://github.com/osbuild/otk/compare/main...mvo5:annotations-2?expand=1#diff-ad6f996bbd08d84913b570b67a441454f57432e4129c7b8601e0454652501e0fR1 and I think we will need to tweak the way we define dicts in process_defines
to make this fully work (and my branch is full of hacks)
@@ -33,3 +33,4 @@ repos: | |||
hooks: | |||
- id: pylint | |||
additional_dependencies: ["PyYAML", "types-PyYAML", "pytest"] | |||
args: ["--init-hook", "import sys; sys.path.insert(0, './src')"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unrelated to the PR(?) and could be it's own?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my setup this is needed to run pytest, it could be split but I need to merge this first because otherwise it fails to commit.
I'll to a separate PR, merge and rebase :-D
The Objects (in
src/otk/annotation.py
) represent all basic typesbut with "hidden" annotations to be able to
track the source of the data from the originating
yaml file.
Some of the annotations are already used in error messages, other error messages can be extended in the future with this information.
This is replacing the initial approach #271