Deserializing strings in place. #79

sammhicks · 2023-12-31T13:11:00Z

De-escaping JSON strings will always produce a shorter (or equal length) string, so it's safe to de-escape strings in place, thus allowing types to borrow plaintext (not escaped) strings from the buffer after deserialization.

This is a semver breaking change as it requires the JSON input to be passed in mutably to allow for the de-escaping in place.
This is safe because once the serialization has passed the string, it never reads that part of the buffer again.

sammhicks · 2024-01-05T09:43:05Z

Not to nag, but I've now fixed the tests, they run successfully on my fork, and thus it should now be ready to test and merge.
Sorry for the repeated sync, and absolutely no rush :)

ryan-summers

Sorry, this notification got buried in my Github notifications so I didn't see it until now.

In general, I'm not currently inclined to support merging the changes in this PR as implemented for the following reasons:

Changing the API can have a very large ripple for downstream users, where this dependency may be injected in the middle of their dependency stacks. I recognize this would be a semver breaking change so it wouldn't break peoples code, but I suspect adoption would be low of the newer version because users may not have control of the mutability of their input data. (I know I have a number of dependencies like this personally)
I have concerns about maintability and reliability of allowing the deserialization input to be mutated. We open ourselves up to a whole class of defects that werent previously possible by mutating the data that we're processing. For example, we could introduce bugs where the first serialization works, but the second doesn't. Or, we could cause the first deserialization result to differ from a second pass.

I don't believe that the benefit of supporting string escapes outweighs the downsides above.

That being said, I'd be more than happy to review a change where the input was copied into some buffer and then deserialized (i.e. the input is not mutable, but we do induce a copy of the data), but the user should have to specifically opt-in to this mechanism.

sammhicks · 2024-04-13T11:43:20Z

How about the following design?:

The deserializer takes a shared &str, as per the design before this pull request.
When deserializing JSON strings, the deserializer scans for escape sequences
- If there are no escape sequences, it calls visitor.visit_borrowed_str(v), which will allow zero-copy deserialization
- If there are escape sequences, it decodes the escape sequence using a provided buffer, and calls visitor.visit_str(v)
serde_json_core has a EscapedString newtype struct which contains an escaped &str, with utility methods to iterator over it, where the iterator returns either a &str of characters with no escape sequences, or an unescaped char
- EscapedString is the only structure that is allowed to borrow escaped string data, and uses a special constant to signal to the deserializer that it's special.

I believe that this design will also solve #74

ryan-summers · 2024-04-13T20:47:26Z

That definitely sounds like a nice approach to me! It may be useful to try and look at some cases where string escaping is useful to see if this design is onerous for the end user. Do you have a sample use case in mind? That may be helpful in figuring out if this is a good approach.

sammhicks · 2024-04-14T10:20:13Z

I've written a bare-metal HTTP server framework, and would like it to be able to deserialize JSON encoded POST bodies into arbitary data structures, and on soundness grounds and to have separation of concerns, would like all decoding and unescaping to happen before the data is passed to the request handler.

ryan-summers · 2024-04-15T08:39:12Z

Is this open-sourced and/or something I could look at to get an idea of how you're interested in using it? What I'm really trying to figure out is what this API would look like in a real-world example with actual code

sammhicks · 2024-06-06T16:53:04Z

In case the message got lost in the post (hurrah for email), I've closed this pull request and opened a new one at #83 with my new design.

sammhicks force-pushed the master branch 2 times, most recently from 9a06f6b to 1cafd85 Compare January 1, 2024 15:49

sammhicks mentioned this pull request Jan 9, 2024

Add a JSON extractor sammhicks/picoserve#15

Closed

ryan-summers requested changes Apr 11, 2024

View reviewed changes

sammhicks closed this Apr 15, 2024

sammhicks force-pushed the master branch from 1cafd85 to 9327a14 Compare April 15, 2024 11:00

sammhicks mentioned this pull request Apr 16, 2024

De-Escaping strings using a provided buffer #83

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deserializing strings in place. #79

Deserializing strings in place. #79

sammhicks commented Dec 31, 2023

sammhicks commented Jan 5, 2024

ryan-summers left a comment •

edited

Loading

sammhicks commented Apr 13, 2024 •

edited

Loading

ryan-summers commented Apr 13, 2024 •

edited

Loading

sammhicks commented Apr 14, 2024

ryan-summers commented Apr 15, 2024

sammhicks commented Jun 6, 2024

Deserializing strings in place. #79

Deserializing strings in place. #79

Conversation

sammhicks commented Dec 31, 2023

sammhicks commented Jan 5, 2024

ryan-summers left a comment • edited Loading

Choose a reason for hiding this comment

sammhicks commented Apr 13, 2024 • edited Loading

ryan-summers commented Apr 13, 2024 • edited Loading

sammhicks commented Apr 14, 2024

ryan-summers commented Apr 15, 2024

sammhicks commented Jun 6, 2024

ryan-summers left a comment •

edited

Loading

sammhicks commented Apr 13, 2024 •

edited

Loading

ryan-summers commented Apr 13, 2024 •

edited

Loading