Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoders improvments #35

Merged
merged 16 commits into from
Dec 5, 2024
Merged

Conversation

jrouaix
Copy link
Contributor

@jrouaix jrouaix commented Nov 25, 2024

Hi @puffyCid,

This time I started some work on the decoders module.

It's a draft for now, I left a lot of todos in the dns file.

I think there are some subtleties I don't get.
Something I have the intuition that the same data is parsed multiple times, perhaps with multiple formats, it's very weird.

Also, I have some troubles with the error handling.
Replacing Ok results with some default error String will bite in the end.
We cannot know from the result of a function if the parsing was right.
It's probably not the right moment for a refacto on this topic, but in the end, I think we could tend to a design where Result<_, _>s are first class citizens as return values or even fields in the parsed structures, so the client code can choose what behavior to adopt.

what do you think ?

@puffyCid
Copy link
Collaborator

puffyCid commented Nov 29, 2024

@jrouaix for the dns.rs file get_dns_flags is parsing bits not bytes. We nom 2 bytes first and then pass that to get_dns_flags which then pareses 13 bits

It is a bit weird but there is a small comment mentioning it:

    // Have to work with bits instead of bytes for the DNS flags
    let ((flag_data, offset), query_flag): ((&[u8], usize), u8) = bits::complete::take(size_of::<u8>())((data, 0))?;

https://en.wikipedia.org/wiki/Domain_Name_System#DNS_message_format

I do not think we are parsing the same data multiple times.

let (dns_data, id_data) = take(size_of::<u16>())(data)?;
let (dns_data, flag_data) = take(size_of::<u16>())(dns_data)?;

let (_, id) = be_u16(id_data)?;

let message_result = get_dns_flags(flag_data);
let message = match message_result 
...
 let message_result = parse_counts(dns_data);

For above dns_data is simply the remaining bytes.
Ex:

data = 10 bytes
let (dns_data, id_data) = take(size_of::<u16>())(data)?;

id_data = 2 bytes
dns_data = 8 bytes

let (dns_data, flag_data) = take(size_of::<u16>())(dns_data)?;

flag_data = 2 bytes
dns_data = 6 bytes

etc

For return string vs Result I am open to changes. The main thing i think we should be aware is IMO I think iteration should not stop even we get an error. Ex: If the library is looping/decoding through 200 entries and it fails to decode something, the library should log an error and include something (ex: partial string, original data, etc) vs return error/breaking the loop.

I am open to changing what the decoders should do if it encounters an error on whether to use Result vs String, but as long as overall iteration continues and we do not skip any entries if an error is encountered.

Let me know if this makes sense or helps
thanks

@jrouaix
Copy link
Contributor Author

jrouaix commented Nov 29, 2024

HOOO !

Got it, you use size_of::<u8>() as expression for 1 !
But size_of::<u8>() is meant to be understood as bytes => u8 is 1 byte long. not 1 bit.
That's why I was kinda sure you were consuming bytes !

I'll change that to const values if you don't mind.

@jrouaix
Copy link
Contributor Author

jrouaix commented Nov 29, 2024

@puffyCid you can have a look ?

I spent some more time in the dns.rs file.

Now decoder.rs expect Result<>'s from the sub decoders

I also tried to avoid or delay allocation (strings re-allocations in loops), in the long run it will get us some perf improvements.

I lost some precision about errors location but I find the overall design i little nicer.

Tell me what you think, perhaps it's mergable now. As you wish.

@puffyCid
Copy link
Collaborator

puffyCid commented Dec 3, 2024

@puffyCid you can have a look ?

I spent some more time in the dns.rs file.

Now decoder.rs expect Result<>'s from the sub decoders

I also tried to avoid or delay allocation (strings re-allocations in loops), in the long run it will get us some perf improvements.

I lost some precision about errors location but I find the overall design i little nicer.

Tell me what you think, perhaps it's mergable now. As you wish.

PR looks good. Thanks. Left just one minor comment.

I recently merged #38 to fix an infinite loop bug.
If u could resolve the conflicts, and mark the PR as non-draft. I am fine with merging

@jrouaix jrouaix marked this pull request as ready for review December 3, 2024 17:13
@jrouaix
Copy link
Contributor Author

jrouaix commented Dec 3, 2024

@puffyCid I encourage you to squash merge this PR, commits are a little messy

src/decoders/decoder.rs Outdated Show resolved Hide resolved
src/decoders/dns.rs Outdated Show resolved Hide resolved
src/decoders/dns.rs Outdated Show resolved Hide resolved
src/decoders/network.rs Outdated Show resolved Hide resolved
@puffyCid puffyCid merged commit 945545f into mandiant:main Dec 5, 2024
4 checks passed
@puffyCid
Copy link
Collaborator

puffyCid commented Dec 5, 2024

merged thanks!

@jrouaix jrouaix deleted the decoders_improvments branch December 5, 2024 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants