write_sets with id1 in Chinese get string lost #87

masonacezllk · 2024-03-20T03:36:06Z

when data block with string in Chinese like id1='左前方向:S', write_sets lost Chinese string. How to modify this code :
for k, v in dset.items():
if type(v) == str:
dset[k] = v.encode("utf-8").decode('ascii','ignore')
to correct Chinese string?

jankoslavic · 2024-03-20T06:14:42Z

Thank you @masonacezllk . Would you be so kind and please prepare a Pull request with the proposed corrections and also a test case?

masonacezllk · 2024-03-20T08:04:58Z

test.zip
Hi @jankoslavic The zip file include my unv data,named test.unv.
I want to change orignal file data block's id1 with new name,which include Chinese '车速', then wirite the new name in new file and reload it.
But when I reload the new unv file, Chinese string '车速' is missing.
This is my test code.

import pyuff

# original data file
fname=r'data\test.unv'
uffread = pyuff.UFF(fname)
data=uffread.read_sets()

# replace id1 with new name
data[3]['id1']='Time for 车速'

# save new name in new unv file
newfname=r'data\testnew.unv'
uffwrite = pyuff.UFF(newfname)
uffwrite.write_sets(data,'overwrite')

# load the new unv file
uffread_new = pyuff.UFF(newfname)
data_new=uffread.read_sets()
print(data[3]['id1'])
# You will see the id1 is 'Time for'
# but not Time for 车速

jankoslavic · 2024-03-20T19:11:33Z

I guess this needs work to be prepared as a PR. Any volunteers?

jankoslavic · 2024-04-14T05:35:26Z

@masonacezllk I have now spent some time on this issue. The problem is that by the uff/unv standard the file should be in ISCII an therefore we here:

pyuff/pyuff/datasets/dataset_58.py

Line 908 in ac669b9

dset[k] = v.encode("utf-8").decode('ascii','ignore')

encode the data back to ISCII. The non-ascii characters are lost at this step. We do support reading non-ascii characters, but not writing.

This is a broader issue and I will open a new one.

jankoslavic added the help wanted label Mar 20, 2024

jankoslavic mentioned this issue Apr 14, 2024

ENH: writing the header as utf8 encoded #89

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write_sets with id1 in Chinese get string lost #87

write_sets with id1 in Chinese get string lost #87

masonacezllk commented Mar 20, 2024

jankoslavic commented Mar 20, 2024

masonacezllk commented Mar 20, 2024

jankoslavic commented Mar 20, 2024

jankoslavic commented Apr 14, 2024

write_sets with id1 in Chinese get string lost #87

write_sets with id1 in Chinese get string lost #87

Comments

masonacezllk commented Mar 20, 2024

jankoslavic commented Mar 20, 2024

masonacezllk commented Mar 20, 2024

jankoslavic commented Mar 20, 2024

jankoslavic commented Apr 14, 2024