Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_sets with id1 in Chinese get string lost #87

Open
masonacezllk opened this issue Mar 20, 2024 · 4 comments
Open

write_sets with id1 in Chinese get string lost #87

masonacezllk opened this issue Mar 20, 2024 · 4 comments

Comments

@masonacezllk
Copy link

when data block with string in Chinese like id1='左前方向:S', write_sets lost Chinese string. How to modify this code :
for k, v in dset.items():
if type(v) == str:
dset[k] = v.encode("utf-8").decode('ascii','ignore')
to correct Chinese string?

@jankoslavic
Copy link
Contributor

Thank you @masonacezllk . Would you be so kind and please prepare a Pull request with the proposed corrections and also a test case?

@masonacezllk
Copy link
Author

test.zip
Hi @jankoslavic The zip file include my unv data,named test.unv.
I want to change orignal file data block's id1 with new name,which include Chinese '车速', then wirite the new name in new file and reload it.
But when I reload the new unv file, Chinese string '车速' is missing.
This is my test code.

import pyuff

# original data file
fname=r'data\test.unv'
uffread = pyuff.UFF(fname)
data=uffread.read_sets()

# replace id1 with new name
data[3]['id1']='Time for 车速'

# save new name in new unv file
newfname=r'data\testnew.unv'
uffwrite = pyuff.UFF(newfname)
uffwrite.write_sets(data,'overwrite')

# load the new unv file
uffread_new = pyuff.UFF(newfname)
data_new=uffread.read_sets()
print(data[3]['id1'])
# You will see the id1 is 'Time for'
# but not Time for 车速

@jankoslavic
Copy link
Contributor

I guess this needs work to be prepared as a PR. Any volunteers?

@jankoslavic
Copy link
Contributor

@masonacezllk I have now spent some time on this issue. The problem is that by the uff/unv standard the file should be in ISCII an therefore we here:

dset[k] = v.encode("utf-8").decode('ascii','ignore')

encode the data back to ISCII. The non-ascii characters are lost at this step. We do support reading non-ascii characters, but not writing.

This is a broader issue and I will open a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants