Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass types for repeated containers #119

Merged
merged 4 commits into from
Dec 7, 2023
Merged

Pass types for repeated containers #119

merged 4 commits into from
Dec 7, 2023

Conversation

SimeonStoykovQC
Copy link
Member

The most time-consuming thing in make_attribute is iterating the values. By passing the attribute type, we save one pass, which could run 10-30% faster (depending on the amount of other work done).

string_bytes = ["1".encode() for _ in range(5_000_000)]
%timeit make_attribute("key", string_bytes, attr_type=AttributeProto.STRINGS)
# 480 ms ± 8.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit make_attribute("key", string_bytes)
# 544 ms ± 5.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

ints = [1 for _ in range(5_000_000)]
%timeit make_attribute("key", ints, attr_type=AttributeProto.INTS)
# 172 ms ± 2.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit make_attribute("key", ints)
# 244 ms ± 4.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Copy link
Collaborator

@jbachurski jbachurski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes a lot of sense, this possibility was only introduced in a later PR after my initial refactor upstream. Did you check if this interacts with any attribute type checking we have going on? For instance when something of the wrong type gets passed in. Otherwise I think LGTM.

@SimeonStoykovQC
Copy link
Member Author

SimeonStoykovQC commented Nov 27, 2023

if this interacts with any attribute type checking

@jbachurski Is this test covering that failure mode?

Also, is it acceptable that the errors are now raised from different places and not our custom validation? Do you see any potential issues with that, and if yes, what other tests would you add?

The errors are now raised from protobuf/onnx:

            if attr_type == AttributeProto.INTS:
>               attr.ints.extend(value)
E               TypeError: 'a' has type str, but expected one of: int

../../mambaforge/envs/spox/lib/python3.11/site-packages/onnx/helper.py:899: TypeError

whereas before, they originated from our validation:

    def _validate(self):
        if self._to_onnx("dummy").type != self._attribute_proto_type_int:
>           raise TypeError(
                f"Unable to instantiate `{type(self).__name__}` with value of type `{type(self.value).__name__}`."
            )
E           TypeError: Unable to instantiate `AttrInt64s` with value of type `tuple`.

../src/spox/_attributes.py:54: TypeError

note that for strings, we always raised from the call to .encode:

>   return make_attribute(key, [v.encode() for v in self.value])
E   AttributeError: 'int' object has no attribute 'encode'

../src/spox/_attributes.py:205: AttributeError

@adityagoel4512
Copy link
Member

adityagoel4512 commented Nov 27, 2023

I think it would be preferable if we could display a more descriptive error message than the protobuf error message. Given this is coupled with the ONNX version anyway, can you catch the exception accurately?

@jbachurski
Copy link
Collaborator

The encode AttributeError looks like an oversight. The protobuf error isn’t that bad, but I agree with Aditya it might be useful to be more descriptive why and when the error got raised. I was primarily concerned whether the error is handled gracefully at all - testing for expected error messages isn’t something I did enough but it’s good to start setting these standards.

@SimeonStoykovQC SimeonStoykovQC mentioned this pull request Dec 1, 2023
3 tasks
@cbourjau
Copy link
Collaborator

cbourjau commented Dec 4, 2023

I think it is fine to rely on upstream for this kind of validation. We can always improve error messages, but this is good for now. Thanks a lot!

@SimeonStoykovQC
Copy link
Member Author

Can we still merge #122 here? It has other small improvements:

  • Print better error messages for repeated containers
  • Parametrized & simplified the args test

@jbachurski
Copy link
Collaborator

Can we still merge #122 here? It has other small improvements:

  • Print better error messages for repeated containers
  • Parametrized & simplified the args test

I agree it should be merged here - I don't think it's good that we are testing for the AttributeError in unit tests.

* what is this

* Hacky implementation

* Don't encode and raise a better error

* Separate type checking to a function

* abstracted calls to make_attribute and deduce_type in parent + improved error messages

* Revert to old type validation since we'd like to try with caching first

* Changelog

* Update src/spox/_attributes.py

Co-authored-by: Jakub Bachurski <[email protected]>

* Kuba's comments

* Christian's comments

* Change custom exception name

* Don't use a custom exception

* Update CHANGELOG.rst

Co-authored-by: Christian Bourjau <[email protected]>

* Exception message

---------

Co-authored-by: Jakub Bachurski <[email protected]>
Co-authored-by: Christian Bourjau <[email protected]>
@SimeonStoykovQC SimeonStoykovQC merged commit 015dc46 into main Dec 7, 2023
9 checks passed
@SimeonStoykovQC SimeonStoykovQC deleted the pass-arg-type branch December 7, 2023 09:23
@adityagoel4512
Copy link
Member

FYI, this will also enable empty sequences to be serialised!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants