Releases: tensorflow/datasets
Releases · tensorflow/datasets
v4.9.5
Added
-
Support to download and prepare datasets using the
Parquet data format.builder = tfds.builder('fashion_mnist', file_format='parquet') builder.download_and_prepare() ds = builder.as_dataset(split='train') print(next(iter(ds)))
-
tfds.data_source
is pickable, thus working smoothly with
PyGrain. Learn more by following the
tutorial. -
TFDS plays nicely with
Croissant. Learn more by
following the
recipe.
Changed
Deprecated
Removed
Fixed
Security
v4.9.4
Added
- A new CroissantBuilder
which initializes a DatasetBuilder based on a Croissant
metadata file. - New conversion options between different bounding boxes formats.
- Better support for
HuggingfaceDatasetBuilder
. - A script
to convert a dataset from one format to another.
Changed
Deprecated
- Python 3.9 support. TFDS now uses Python 3.10
Removed
Fixed
Security
v4.9.3
Added
- Segment Anything
(SA-1B) dataset.
Changed
- Hugging Face datasets accept
None
values for any features. TFDS has no
tfds.features.Optional
, soNone
values are converted to default values.
Those default values used to be0
and0.0
for int and float. Now, it's
-inf
as defined by NumPy (e.g.,np.iinfo(np.int32).min
or
np.finfo(np.float32).min
). This avoids ambiguous values when0
and0.0
exist in the values of the dataset. The roadmap is to implement
tfds.features.Optional
.
Deprecated
- Python 3.8 support. As per
NEP 29, TFDS now
uses Python>=3.9.
Removed
Fixed
Security
v4.9.2
Added
- [Experimental] A list of freeform text tags can now be attached to a
BuilderConfig
. For example:The tags are recorded with the dataset metadata and can later be retrievedBUILDER_CONFIGS = [ tfds.core.BuilderConfig(name="foo", tags=["foo", "live"]), tfds.core.BuilderConfig(name="bar", tags=["bar", "old"]), ]
using the info object:This feature is experimental and there are no guidelines on tags format.builder.info.config_tags # ["foo", "live"]
Changed
Deprecated
Removed
Fixed
- Fixed generated proto files (see issue 4858).
Security
v4.9.1
Added
Changed
Deprecated
Removed
Fixed
- The installation on macOS now works (see issues
4805 and
4852). The ArrayRecord
dependency is lazily loaded, so the
TensorFlow-less path is
not possible at the moment on macOS. A fix for this will follow soon.
Security
v4.9.0
Added
- Native support for JAX and PyTorch. TensorFlow is no longer a dependency for
reading datasets. See the
documentation. - Added minival split to
LVIS dataset. - Mixed-human and
machine-generated
robomimic datasets. - WebVid dataset.
- ImagenetPI dataset.
- Wikipedia for
20230201.
Changed
- Support for
tensorflow=2.12
.
Deprecated
Removed
Fixed
Security
v4.8.3
Added
Changed
Deprecated
- Python 3.7 support: this version and future version use Python 3.8.
Removed
Fixed
- Flag
ignore_verifications
from Hugging Face'sdatasets.load_dataset
is
deprecated, and used to cause errors intfds.load(huggingface:foo)
.
Security
v4.8.2
Deprecated
- Python 3.7 support: this is the last version of TFDS supporting Python 3.7.
Future versions will use Python 3.8.
Fixed
tfds new
andtfds build
better support the new recommended datasets
organization, where individual datasets have their own package under
datasets/
, builder class is calledBuilder
and is defined within module
${dsname}_dataset_builder.py
.
Security
v4.8.1
Changed
- Added file
valid_tags.txt
to not break builds. - TFDS no longer relies on TensorFlow DTypes. We chose NumPy DTypes to keep the
typing expressiveness, while dropping the heavy dependency on TensorFlow. We
migrated all our internal datasets. Please, migrate accordingly:tf.bool
:np.bool_
tf.string
:np.str_
tf.int64
,tf.int32
, etc:np.int64
,np.int32
, etctf.float64
,tf.float32
, etc:np.float64
,np.float32
, etc
v4.8.0
Added
- [API]
DatasetBuilder
's description and citations can be specified in
dedicatedREADME.md
andCITATIONS.bib
files, within the dataset package
(see https://www.tensorflow.org/datasets/add_dataset). - Tags can be associated to Datasets, in the
TAGS.txt
file. For
now, they are only used in the generated documentation. - [API][Experimental] New
ViewBuilder
to define datasets as transformations
of existing datasets. Also addstfds.transform
with functionality to apply
transformations. - Loggers are also called on
tfds.as_numpy(...)
, baseLogger
class has a
new corresponding method. tfds.core.DatasetBuilder
can have a default limit for the number of
simultaneous downloads.tfds.download.DownloadConfig
can override it.tfds.features.Audio
supports storing raw audio data for lazy decoding.- The number of shards can be overridden when preparing a dataset:
builder.download_and_prepare(download_config=tfds.download.DownloadConfig(num_shards=42))
.
Alternatively, you can configure the min and max shard size if you want TFDS
to compute the number of shards for you, but want to have control over the
shard sizes.