Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writting lists of strings with DigitalMetadataWriter #16

Open
jswoboda opened this issue Apr 29, 2020 · 1 comment
Open

Writting lists of strings with DigitalMetadataWriter #16

jswoboda opened this issue Apr 29, 2020 · 1 comment
Assignees

Comments

@jswoboda
Copy link
Member

I attempted to write a list of strings to digital metadata to keeps track of names of sub-channels. This led to the following error in h5py

h5py error TypeError: No conversion path for dtype: dtype('<U2')

Searching led me to this issue with h5py requiring to change the list using the following numpy command.

np.string_()

I don't know if there's a need to address this directly. I'm just putting this up to note it for now.

@ryanvolz
Copy link
Member

ryanvolz commented Apr 29, 2020

I don't think np.string_() is what you want, since that will turn your whole list into one string, but maybe converting to an array explicitly using h5py's special string dtype:

np.asarray(['ch1', 'ch2', 'ch3'], dtype=h5py.string_dtype(encoding='utf-8'))

I'll note this part of the h5py docs that says it does not support numpy's U dtype: http://docs.h5py.org/en/latest/strings.html#what-about-numpy-s-u-type

I found the h5py.string_dtype docstring illuminating:

Make a numpy dtype for HDF5 strings
encoding may be 'utf-8' or 'ascii'.
length may be an integer for a fixed length string dtype, or None for
variable length strings. String lengths for HDF5 are counted in bytes,
not unicode code points.
For variable length strings, the data should be passed as Python str objects
(unicode in Python 2) if the encoding is 'utf-8', and bytes if it is 'ascii'.
For fixed length strings, the data should be numpy fixed length bytes
arrays, regardless of the encoding. Fixed length unicode data is not
supported.

So basically, you have 3 options:

  1. Array of variable-length strings using the h5py.string_dtype(encoding='utf-8') dtype
  2. Array of variable-length bytes using the h5py.string_dtype(encoding='ascii') dtype
  3. Array of fixed length bytes using the np.string_ or h5py.string_dtype(length=N) dtype

I'm leaning toward this not being something that Digital Metadata handles explicitly since it's kinda intended to be a thin format wrapper to h5py. It definitely could use some documentation as a likely trouble spot though, whenever we have time to write some better documentation.

@ryanvolz ryanvolz self-assigned this Jul 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants