Skip to content

Releases: tensorflow/datasets

v4.9.5

30 May 08:37
Compare
Choose a tag to compare

Added

  • Support to download and prepare datasets using the
    Parquet data format.

    builder = tfds.builder('fashion_mnist', file_format='parquet')
    builder.download_and_prepare()
    ds = builder.as_dataset(split='train')
    print(next(iter(ds)))
  • tfds.data_source
    is pickable, thus working smoothly with
    PyGrain. Learn more by following the
    tutorial.

  • TFDS plays nicely with
    Croissant. Learn more by
    following the
    recipe.

Changed

Deprecated

Removed

Fixed

Security

v4.9.4

18 Dec 13:28
Compare
Choose a tag to compare

Added

  • A new CroissantBuilder
    which initializes a DatasetBuilder based on a Croissant
    metadata file.
  • New conversion options between different bounding boxes formats.
  • Better support for HuggingfaceDatasetBuilder.
  • A script
    to convert a dataset from one format to another.

Changed

Deprecated

  • Python 3.9 support. TFDS now uses Python 3.10

Removed

Fixed

Security

v4.9.3

08 Sep 09:07
Compare
Choose a tag to compare

Added

Changed

  • Hugging Face datasets accept None values for any features. TFDS has no
    tfds.features.Optional, so None values are converted to default values.
    Those default values used to be 0 and 0.0 for int and float. Now, it's
    -inf as defined by NumPy (e.g., np.iinfo(np.int32).min or
    np.finfo(np.float32).min). This avoids ambiguous values when 0 and 0.0
    exist in the values of the dataset. The roadmap is to implement
    tfds.features.Optional.

Deprecated

  • Python 3.8 support. As per
    NEP 29, TFDS now
    uses Python>=3.9.

Removed

Fixed

Security

v4.9.2

13 Apr 11:21
Compare
Choose a tag to compare

Added

  • [Experimental] A list of freeform text tags can now be attached to a
    BuilderConfig. For example:
    BUILDER_CONFIGS = [
        tfds.core.BuilderConfig(name="foo", tags=["foo", "live"]),
        tfds.core.BuilderConfig(name="bar", tags=["bar", "old"]),
    ]
    The tags are recorded with the dataset metadata and can later be retrieved
    using the info object:
    builder.info.config_tags  # ["foo", "live"]
    This feature is experimental and there are no guidelines on tags format.

Changed

Deprecated

Removed

Fixed

  • Fixed generated proto files (see issue 4858).

Security

v4.9.1

11 Apr 13:16
Compare
Choose a tag to compare

Added

Changed

Deprecated

Removed

Fixed

  • The installation on macOS now works (see issues
    4805 and
    4852). The ArrayRecord
    dependency is lazily loaded, so the
    TensorFlow-less path is
    not possible at the moment on macOS. A fix for this will follow soon.

Security

v4.9.0

05 Apr 07:30
Compare
Choose a tag to compare

Added

Changed

  • Support for tensorflow=2.12.

Deprecated

Removed

Fixed

Security

v4.8.3

27 Feb 11:46
Compare
Choose a tag to compare

Added

Changed

Deprecated

  • Python 3.7 support: this version and future version use Python 3.8.

Removed

Fixed

  • Flag ignore_verifications from Hugging Face's datasets.load_dataset is
    deprecated, and used to cause errors in tfds.load(huggingface:foo).

Security

v4.8.2

17 Jan 20:41
Compare
Choose a tag to compare

Deprecated

  • Python 3.7 support: this is the last version of TFDS supporting Python 3.7.
    Future versions will use Python 3.8.

Fixed

  • tfds new and tfds build better support the new recommended datasets
    organization, where individual datasets have their own package under
    datasets/, builder class is called Builder and is defined within module
    ${dsname}_dataset_builder.py.

Security

v4.8.1

02 Jan 18:30
Compare
Choose a tag to compare

Changed

  • Added file valid_tags.txt to not break builds.
  • TFDS no longer relies on TensorFlow DTypes. We chose NumPy DTypes to keep the
    typing expressiveness, while dropping the heavy dependency on TensorFlow. We
    migrated all our internal datasets. Please, migrate accordingly:
    • tf.bool: np.bool_
    • tf.string: np.str_
    • tf.int64, tf.int32, etc: np.int64, np.int32, etc
    • tf.float64, tf.float32, etc: np.float64, np.float32, etc

v4.8.0

21 Dec 11:09
Compare
Choose a tag to compare

Added

  • [API] DatasetBuilder's description and citations can be specified in
    dedicated README.md and CITATIONS.bib files, within the dataset package
    (see https://www.tensorflow.org/datasets/add_dataset).
  • Tags can be associated to Datasets, in the TAGS.txt file. For
    now, they are only used in the generated documentation.
  • [API][Experimental] New ViewBuilder to define datasets as transformations
    of existing datasets. Also adds tfds.transform with functionality to apply
    transformations.
  • Loggers are also called on tfds.as_numpy(...), base Logger class has a
    new corresponding method.
  • tfds.core.DatasetBuilder can have a default limit for the number of
    simultaneous downloads. tfds.download.DownloadConfig can override it.
  • tfds.features.Audio supports storing raw audio data for lazy decoding.
  • The number of shards can be overridden when preparing a dataset:
    builder.download_and_prepare(download_config=tfds.download.DownloadConfig(num_shards=42)).
    Alternatively, you can configure the min and max shard size if you want TFDS
    to compute the number of shards for you, but want to have control over the
    shard sizes.

Changed

Deprecated

Removed

Fixed

Security