02 May 18:03

8e8a649

v1.4.2 - 2023-05-02

This release fixes a bug that caused datetime and numerical transformers to crash if a column was all NaNs. Additionally, it adds support for Pandas 2.0!

Bugs

Numerical & datetime transformers crash if the entire column is null - Issue #637 by @fraces-h

Maintenance

Remove upper bound for pandas - Issue #633 by @pvk-developer

Contributors

pvk-developer

Assets 2

26 Apr 21:07

amontanez24

v1.4.1

7449a82

v1.4.1 - 2023-04-25

This release patches an issue that prevented the RegexGenerator from working with regexes that had a very large number of possible combinations.

Bugs

RegexGenerator continues to have problems if there are too many possibilities - Issue #635 by @pvk-developer

Contributors

pvk-developer

Assets 2

13 Apr 17:44

amontanez24

v1.4.0

853b775

v1.4.0 - 2023-04-13

This release adds a couple of new features including adding the OrderedLabelEncoder and deprecating the CustomLabelEncoder. It also adds a change that makes all generator type transformers in the HyperTransformer use a different random seed.

Additionally, bugs were patched in the RegexGenerator that caused it to crash or take too long in certain cases. Finally, this release improved the detection of Faker functions in the AnonymizedFaker.

Bugs

Find nested Faker provider submodules - PR #630 by @frances-h
RegexGenerator fails to generate values if there are too many possibilities - Issue #623 by @R-Palazzo
RegexGenerator takes too much time and runs out of memory if there are too many possibilities - Issue #624 by @R-Palazzo

New Features

Choose a different seed for each transformer - Issue #619 by @fealho
Rename CustomLabelEncoder to OrderedLabelEncoder - Issue #621 by @R-Palazzo
Add functionality to find version add-on - Issue #620 by @frances-h

Contributors

frances-h, fealho, and R-Palazzo

Assets 2

18 Jan 20:55

amontanez24

v1.3.0

e821d0c

v1.3.0 - 2023-1-18

This release makes changes to the way that individual transformers are stored in the HyperTransformer. When accessing the config via HyperTransformer.get_config(), the transformers listed in the config are now the actual transformer instances used during fitting and transforming. These instances can now be accessed and used to examine their properties post fitting. For example, you can now view the mapping for a PseudoAnonymizedFaker instance using PseudoAnonymizedFaker.get_mapping() on the instance retrieved from the config.

Additionally, the output of reverse_tranform no longer appends the .value suffix to every unnamed output column. Only output columns that are created from context extracted from the input columns will have suffixes (eg. .normalized in the ClusterBasedNormalizer).

The AnonymizedFaker and RegexGenerator now have an enforce_uniqueness parameter, which controls whether the data returned by reverse_transform should be unique. The HyperTransformer now has a method called create_anonymized_columns that can be used to generate columns that are matched with anonymizing transformers like AnonymizedFaker and RegexGenerator. The method can be used as follows:
HyperTransformer.create_anonymized_columns(num_rows=5, column_names=['email_optin', 'credit_card'])

Another major change in this release is the ability to control randomization. Every time a HyperTransformer is initialized, its randomness will be reset to the same seed, and it will yield the same results for reverse_transform if given the same input. Every subsequent call to reverse_transform yields a different result. If a user desires to reset the seed, they can call HyperTransformer.reset_randomization.

Finally, this release adds support for Python 3.10 and drops support for 3.6.

Bugs

The reset_randomization should also apply to fit and transform - Issue #608 by @amontanez24
Cannot print CustomLabelEncoder: ValueError - Issue #607 by @amontanez24
Float formatter learn_rounding_scheme doesn't work on all digits - Issue #556 by @fealho
Warnings not showing on update_transformers_by_sdtype - Issue #582 by @amontanez24
OneHotEncoder doesn't work with boolean sdtype - Issue #583 by @pvk-developer
Setting config on HyperTransformer does not read supported_sdtypes - Issue #560 by @pvk-developer
#545 - Issue #545 by @pvk-developer
Add error to NullTransformer when data only contains nans - PR #567 by @fealho
Update update_transformers validation - PR #563 by @fealho

Maintenance

Support Python 3.10 - Issue #593 by @pvk-developer
RDT 1.3 Package Maintenance Updates - Issue #594 by @pvk-developer

New Features

Update errors - Issue #599 by @amontanez24
Add ability to control randomness - Issue #584 by @amontanez24
Printing and error improvements - Issue #581 by @amontanez24
Make RegexGenerator not to reset itself - Issue #558 by @pvk-developer
Add a reset_anonymization method - Issue #559 by @pvk-developer
Don't copy instances of tranformer - Issue #541 by @fealho
Remove '.value' suffix - Issue #533 by @fealho
Change the NEXT_TRANSFORMERS logic - Issue #557 by @fealho
Add utility functions to AnonymizedFaker - Issue #561 by @pvk-developer
Update API for update_transformers_by_sdtype to be more explicit about instances vs. copies - Issue #540 by @fealho
Add create_anonymized_columns method to anonymize data from scratch - Issue #546 by @pvk-developer
Add parameter to AnonymizedFaker() and RegexGenerator() to generate only unique values - Issue #542 by @pvk-developer

Contributors

amontanez24, fealho, and pvk-developer

Assets 2

12 Sep 17:42

amontanez24

v1.2.1

1ebfda8

v1.2.1 - 2022-9-12

This release fixes a bug that caused the UnixTimestampEncoder to return data with the incorrect datetime format. It also fixes a bug that caused the null column not to be reverse transformed when using the UnixTimestampEncoder when the missing_value_replacement was not set.

Bugs

Inconsistency in date format after reverse transform - Issue #515 by @pvk-developer
Fix calling null_transformer with model_missing_values. - PR #550 by @pvk-developer

Contributors

pvk-developer

Assets 2

18 Aug 00:04

amontanez24

v1.2.0

403b9d5

v1.2.0 - 2022-8-17

This release adds a new transformer called the PseudoAnonymizedFaker. This transformer enables the pseudo-anonymization of your data by mapping all of a column's original values to fake values that get returned during the reverse transformation process. Each original value is always mapped to the same fake value.

Additionally, this release enables the HyperTransformer to use categorical transformers on boolean columns. It also introduces a new parameter called computer_representation to the FloatFormatter that will allow for values to be clipped to certain bounds based on the computer type used for a numerical column.

Finally, this release patches a bug that caused unpredicatable results from the reverse_transform method of the FrequencyEncoder when add_noise is enabled.

New Features

Add PseudoAnonymizedFaker transformer - Issue #517 by @pvk-developer
Boolean columns should be able to use any of the categorical transformers - Issue#527 by @pvk-developer
Update FloatFormatter with parameters for the computer representation - Issue#521 by @fealho

Bugs

Unpredictable results for FrequencyEncoder(add_noise=True) - Issue #528 by @fealho

Internal

Performance Tests update - Issue #524 by @pvk-developer

Contributors

fealho and pvk-developer

Assets 2

09 Jun 20:39

amontanez24

v1.1.0

386ea30

v1.1.0 - 2022-6-9

This release adds multiple new transformers: the CustomLabelEncoder and the RegexGenerator. The CustomLabelEncoder works similarly to the LabelEncoder, except it allows users to provide the order of the categories. The RegexGenerator allows users to specify a regex pattern and will generate values that match that pattern.

This release also improves current transformers. The LabelEncoder now has a parameter called order_by that allows users to specify the ordering scheme for their data (eg. order numerically or alphabetically). The LabelEncoder also now has a parameter called add_noise that allows users to specify whether or not uniform noise should be added to the transformed data. Performance enhancements were made for the GaussianNormalizer by removing an unnecessary distribution search and the FloatFormatter will no longer round values to any place higher than the ones place by default.

New Features

Add noise parameter to LabelEncoder - Issue #500 by @fealho
Remove parameters related to distribution search and change default for GaussianNormalizer - Issue #499
by @amontanez24
Add order_by parameter to LabelEncoder - Issue #510 by @amontanez24
Only round to decimal places in FloatFormatter - Issue #508 by @fealho
Add CustomLabelEncoder transformer - Issue #507 by @amontanez24
Add RegexGenerator Transformer - Issue #505 by @pvk-developer

Contributors

amontanez24, fealho, and pvk-developer

Assets 2

05 May 21:52

amontanez24

v1.0.0

0bd04ed

v1.0.0 - 2022-5-5

The main update of this release is the introduction of a config, which describes the sdtypes and transformers that will be used by the HyperTransformer for each column of the data, where sdtype stands for the semantic or statistical meaning of a datatype. The user can interact with this config through the newly created methods update_sdtypes, get_config, set_config, update_transformers, update_transformers_by_sdtype and remove_transformer_by_sdtype.

This release also included various new features and updates, including:

Users can now transform subsets of the data using its own methods, transform_subset and reverse_transform_subset.
User validation was added for the following methods: transform, reverse_transform, update_sdtypes, update_transformers, set_config.
Unnecessary warnings were removed from GaussianNormalizer.fit and FrequencyEncoder.transform.
The user can now set a transformers as None.
Transformers that cannot work with missing values will automatically fill them in.
Added support for additional datetime formats.
Setting model_missing_values = False in a transformer was updated to keep track of the percentage of missing values, instead of producing data containing NaN's.
All parameters were removed from the HyperTransformer.
The demo dataset get_demo was improved to be more intuitive.

Finally, a number of transformers were redesigned to be more user friendly. Among them, the following transformers have also been renamed:

BayesGMMTransformer -> ClusterBasedNormalizer
GaussianCopulaTransformer -> GaussianNormalizer
DateTimeRoundedTransformer -> OptimizedTimestampEncoder
DateTimeTransformer -> UnixTimestampEncoder
NumericalTransformer -> FloatFormatter
LabelEncodingTransformer -> LabelEncoder
OneHotEncodingTransformer -> OneHotEncoder
CategoricalTransformer -> FrequencyEncoder
BooleanTransformer -> BinaryEncoder
PIIAnonymizer -> AnonymizedFaker

New Features

Fix using None as transformer when update_transformers_by_sdtype - Issue #496 by @pvk-developer
Rename PIIAnonymizer --> AnonymizedFaker - Issue #483 by @pvk-developer
User validation for reverse_transform - Issue #480 by @amontanez24
User validation for transform - Issue #479 by @fealho\
User validation for set_config - Issue #478 by @fealho
User validation for update_transformers_by_sdtype - Issue #477 by @amontanez24
User validation for update_transformers - Issue #475 by @fealho
User validation for update_sdtypes - Issue #474 by @fealho
Allow columns to not have a transformer - Issue #473 by @pvk-developer
Create methods to transform a subset of the data (& reverse transform it) - Issue #472 by @amontanez24
Throw a warning if you use set_config on a HyperTransformer that's already fit - Issue #466 by @amontanez24
Update README for RDT 1.0 - Issue #454 by @amontanez24
Issue with printing PIIAnonymizer in HyperTransformer - Issue #452 by @pvk-developer
Pretty print get_config - Issue #450 by @pvk-developer
Silence warning for GaussianNormalizer.fit - Issue #443 by @pvk-developer
Transformers that cannot work with missing values should automatically fill them in - Issue #442 by @amontanez24
More descriptive error message in PIIAnonymizer when provider_name and function_name don't align - Issue #440 by @pvk-developer
Can we support additional datetime formats? - Issue #439 by @pvk-developer
Update FrequencyEncoder.transform so that pandas won't throw a warning - Issue #436 by @pvk-developer
Update functionality when model_missing_values=False - Issue #435 by @amontanez24
Create methods for getting and setting a config - Issue #418 by @amontanez24
Input validation & error handling in HyperTransformer - Issue #408 by @fealho and @amontanez24
Remove unneeded params from HyperTransformer - Issue #407 by @pvk-developer
Rename property: _valid_output_sdtypes - Issue #406 by @amontanez24
Add pii as a new sdtype in HyperTransformer - Issue #404 by @pvk-developer
Update transformers by data type (in HyperTransformer) - Issue #403 by @pvk-developer
Update transformers by column name in HyperTransformer - Issue #402 by @pvk-developer
Improve updating field_data_types in HyperTransformer - Issue #400 by @amontanez24
Create method to auto detect HyperTransformer config from data - Issue #399 by @fealho
Update HyperTransformer default transformers - Issue #398 by @fealho
Add PIIAnonymizer - Issue #397 by @pvk-developer
Improve the way we print an individual transformer - Issue #395 by @amontanez24
Rename columns parameter in fit for each individual transformer - Issue #376 by @fealho and @pvk-developer
Create a more descriptive demo dataset - Issue #374 by @fealho
Delete unnecessary transformers - Issue #373 by @fealho
Update NullTransformer to make it user friendly - Issue #372 by @pvk-developer
Update BayesGMMTransformer to make it user friendly - Issue #371 by @amontanez24
Update GaussianCopulaTransformer to make it user friendly - Issue #370 by @amontanez24
Update DateTimeRoundedTransformer to make it user friendly - Issue #369 by @amontanez24
Update DateTimeTransformer to make it user friendly - Issue #368 by @amontanez24
Update NumericalTransformer to make it user friendly - Issue #367 by @amontanez24
Update LabelEncodingTransformer to make it user friendly - Issue #366 by @fealho
Update OneHotEncodingTransformer to make it user friendly - Issue #365 by @fealho
Update CategoricalTransformer to make it user friendly - Issue #364 by @fealho
Update BooleanTransformer to make it user friendly - Issue #363 by @fealho
Update names & functionality for handling missing values - Issue #362 by @pvk-developer

Bugs

Checking keys of config as set - Issue #497 by @amontanez24
Only update transformer used when necessary for update_sdtypes - Issue #469 by @amontanez24
Fix how get_config prints transformers - Issue #468 by @pvk-developer
NullTransformer reverse_transform alters input data due to not copying - Issue #455 by @amontanez24
Attempting to transform a subset of the data should lead to an Error - Issue #451 by @amontanez24
Detect_initial_config isn't detecting sdtype "numerical" - Issue #449 by @pvk-developer
PIIAnonymizer not generating multiple locales - Issue #447 by @pvk-developer
Error when printing ClusterBasedNormalizer and GaussianNormalizer - Issue #441 by @pvk-developer
Datetime reverse transform crashes if datetime_format is specified - Issue #438 by @amontanez24
Correct datetime format is not recovered on reverse_transform - Issue #437 by @pvk-developer
Use numpy NaN values in BinaryEncoder - Issue #434 by @pvk-developer
Duplicate _output_columns during fitting - Issue #423 by @fealho

Internal Improvements

Making methods that aren't part of API private - Issue #489 by @amontanez24
Fix columns missing in config and update transformers to None - Issue #495 by @pvk-developer

Contributors

amontanez24, fealho, and pvk-developer

Assets 2

07 Mar 20:46

amontanez24

v0.6.4

7a17d57

v0.6.4 - 2022-3-7

History

0.6.4 - 2022-3-7

This release fixes multiple bugs concerning the HyperTransformer. One is that the get_transformer_tree_yaml method no longer crashes on every call. Another is that calling the update_field_data_types and update_default_data_type_transformers after fitting no longer breaks the transform method.

The HyperTransformer now sorts its outputs for both transform and reverse_transform based on the order of the input's columns. It is also now possible to create transformers that simply drops columns during transform and don't return any new columns.

New Features

Support dropping a column trough a transformer - Issue #393 by @pvk-developer
HyperTransformer should sort columns after transform and reverse_transform - Issue #405 by @fealho

Bugs

get_transformer_tree_yaml fails - Issue #389 by @amontanez24
HyperTransformer _unfit method not working correctly - Issue #390 by @amontanez24
Blank dataframe after updating the data types - Issue #401 by @amontanez24

Contributors

amontanez24, fealho, and pvk-developer

Assets 2

04 Feb 18:37

amontanez24

v0.6.3

a31eaf5

v0.6.3 - 2022-2-4

This release adds a new module to the RDT library called performance. This module can be used to evaluate the speed and peak memory usage of any transformer in RDT. This release also increases the maximum acceptable version of scikit-learn to make it more compatible with other libraries in the SDV ecosystem. On top of that, it fixes a bug related to a new version of pandas.

New Features

Move profiling functions into RDT library - Issue #353 by @amontanez24

Housekeeping

Increase scikit-learn dependency range - Issue #351 by @amontanez24
pandas 1.4.0 release causes a small error - Issue #358 by @fealho

Bugs

Performance tests get stuck on Unix if multiprocessing is involved - Issue #337 by @amontanez24

Contributors

amontanez24 and fealho

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugs

Maintenance

Contributors

Bugs

Contributors

Bugs

New Features

Contributors

Bugs

Maintenance

New Features

Contributors

Bugs

Contributors

New Features

Bugs

Internal

Contributors

New Features

Contributors

New Features

Bugs

Internal Improvements

Contributors

History

0.6.4 - 2022-3-7

New Features

Bugs

Contributors

New Features

Housekeeping

Bugs

Contributors

Releases: sdv-dev/RDT

v1.4.2 - 2023-05-02

Bugs

Maintenance

Contributors

v1.4.1 - 2023-04-25

Bugs

Contributors

v1.4.0 - 2023-04-13

Bugs

New Features

Contributors

v1.3.0 - 2023-1-18

Bugs

Maintenance

New Features

Contributors

v1.2.1 - 2022-9-12

Bugs

Contributors

v1.2.0 - 2022-8-17

New Features

Bugs

Internal

Contributors

v1.1.0 - 2022-6-9

New Features

Contributors

v1.0.0 - 2022-5-5

New Features

Bugs

Internal Improvements

Contributors

v0.6.4 - 2022-3-7

History

0.6.4 - 2022-3-7

New Features

Bugs

Contributors

v0.6.3 - 2022-2-4

New Features

Housekeeping

Bugs

Contributors