-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RPM: not all strings are UTF-8 #672
Comments
Well, if there isn't a single character encoding we could specify in the record_type_string_array:
params:
- id: num_values
type: u4
seq:
- id: values
- type: strz
+ terminator: 0
repeat: expr
repeat-expr: num_values A byte array is the implicit type in |
I actually had been thinking about that and looked at the docs, but that seems to indicate that |
Thinking a bit more about this: probably this isn't a good idea, as |
I found it easier to just work around it like this:
This is cleaner than trying to fix it here. |
In the current
rpm.ksy
theencoding
for strings is set to UTF-8. There are RPM files that fail to parse, because as it turns out not everyone has been playing nice with encodings.An example is this file from Fedora Core 3:
https://archives.fedoraproject.org/pub/archive/fedora/linux/core/3/x86_64/os/Fedora/RPMS/bash-3.0-17.x86_64.rpm
One of the tags is a
record_type_string_array
related to ChangeLogs and some people seem to have used Latin-1 characters instead.Currently
record_type_string_array
is defined as follows:and the default encoding is UTF-8, so this will obviously not work. I don't know how I could fix this.
The text was updated successfully, but these errors were encountered: