Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] annotation wrapper class #282

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

schuellerf
Copy link
Contributor

Just an alternative implementation of #280
I don't really like the approach to have .value all over the place. Let's see which approach is more suitable for a future usage
This PR is just one squashed commit until we decide on which we continue.

@schuellerf schuellerf requested a review from supakeen October 18, 2024 08:42
Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this. I have some inline comments but my high level feedback would be:

A lot of what I write below is gut feeling, I don't like making suggestions based on this without some further experimentation but in the interest of time I will. I probably miss stuff (or am flat out wrong) but I hope the suggestions/ideas are useful.

I am not a fan of Annotated{Src,Bool,Int,Dict,...} in general, this is true for this approach and for #280. I am worried about the complexity and how we need to overload magic methods and then we have a getitem on a bool etc).

But I do like the idea of AnnotatedNode (or OtkNode) and AnnoatedNote.value because it gives access to the real thing and all operations will work as expected on ".value" plus the inconvenience of having to do "node.value" is actually useful as it hints the user of the API that AnnotatedNode is not the thing to work with but just a wrapper that carries the value and extra data.

I would love to experiment with something slightly smaller:

  1. Just have "AnnotatedNode" with the minimal needed interface, probably hash/eq. And AnnotatedNode takes a value and (optionally an exiting AnnotatedNode from which it will inherit the annotation history)
  2. The API of AnnoatedNode would be just .value and .src with src a dataclass that carries a list of filename,line for now (and the two magic methods so that Nodes can be put into a dict and compared)
  3. when we need to check the type (like in transform.py:resolve() we use isinstance(data.value, dict)
  4. we will most likely need convenience OtkDict, OtkList or navigation the tree will be extremly painful, they just need to provide: def __getitem__(self, item): return self.value.__getitem__ and we should construct them only in the yaml loader for list/mapping (or explicitely in transform.py)

I did a little bit of experimenting in https://github.com/osbuild/otk/compare/main...mvo5:annotations-3?expand=1 - its extremly messy and incomplete but I would love to see this explored a bit more. If that works out I think it's preferable over the subclass approvch in #280 but if we really need classes for all python types here too I think the subclass one is better (but I hope we won't)

Sorry for this slightly messy review and my rambling, hope it's helpful still

class AnnotatedNode(Generic[AnnotatedNodeT]):

# to be able to print relative paths
_basedirs = [os.path.curdir]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(unrelated but I can't help noticing) - using a class attribute like this is problematic pattern as this is shared access all instance, I think it's best to avoid this kind of shared state as it may lead to subtle surprises down the line.

_basedirs = [os.path.curdir]

def __init__(self, value: Optional[AnnotatedNodeT] = None, annotations: Optional[dict[str, Any]] = None):
self.value: AnnotatedNodeT = self.deep_convert(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that self.value should be the real value, not anything we convert. If it's a bool, it is a bool, dict etc. This way we avoid the issue that AnnoatedBool(true) is True is actually (and surprisingly) False. With the real value at value this beomces the correct AnnotatedValue(True).value is True == True .

Fwiw, a similar comment as in #280 (comment) - I think the yaml buildup will give us the right types here and we don't need to convert (except for the loading of data fromthe external which we can probably do there instead of here)

AnnotatedNodeT = TypeVar('AnnotatedNodeT')


class AnnotatedNode(Generic[AnnotatedNodeT]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would just call it OtkValue or OtkNode (as we are not writing a generic library here (yet))


def __init__(self, value: Optional[AnnotatedNodeT] = None, annotations: Optional[dict[str, Any]] = None):
self.value: AnnotatedNodeT = self.deep_convert(value)
self.annotations: dict[str, Any] = annotations or {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just use a dataclass like here with the minimal amount of informatin we need

self.squash_annotations([self.value]) # type: ignore[attr-defined]
self.value = self.value.value # type: ignore[attr-defined]

def __iter__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If .value the real value, could we try to get as minimal as we can on the "magic" methods here? It seems an OtkNode would need __eq__ and __hash__ so that values it can be get/set into dicts. Then we would need a Otk{List,Dict} with __getitem__ so that we can navigate the resulting json in a "natural" way but beyond that I am not sure we need more if we always use ".value" for the real manipulations.

if not isinstance(item, AnnotatedNode):
continue
for k in item.annotations.keys():
if k == "src":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging via string manipulation is not ideal and the special handling of "src" even less. I think we should go YAGNI on this too and just have "src" as a dataclass and then we can have a list of dataclasses if there are multiple annotations (and a maybe a str to format them nicley?)

self.annotations[f"{prefix}column"] = column
self.annotations[f"{prefix}column_end"] = column_end

def squash_annotations(self, annotation_list: list) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe merge_annotations as the other ones are merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants