Releases: tensorflow/data-validation
Releases · tensorflow/data-validation
TensorFlow Data Validation 1.8.0
Major Features and Improvements
- From this version we will be releasing python 3.9 wheels.
Bug Fixes and Other Changes
- Adds
get_statistics_html
to the public API. - Fixes several incorrect type annotations.
- Schema inference handles derived features.
StatsOptions.to_json
now raises an error if it encounters unsupported
options.- Depends on
apache-beam[gcp]>=2.38,<3
. - Depends on
tensorflow>=1.15.5,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3
. - Depends on
tensorflow-metadata>=1.8.0,<1.9.0
. - Depends on
tfx-bsl>=1.8.0,<1.9.0
.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 1.7.0
Major Features and Improvements
- Adds the
DetectFeatureSkew
PTransform to the public API, which can be used
to detect feature skew between training and serving examples. - Uses sketch-based top-k/uniques in TFDV inmemory mode.
Bug Fixes and Other Changes
- Fixes a bug in load_statistics that would cause failure when reading binary
protos. - Depends on
pyfarmhash>=0.2,<0.4
. - Depends on
tensorflow>=1.15.5,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3
. - Depends on
tensorflow-metadata>=1.7.0,<1.8.0
. - Depends on
tfx-bsl>=1.7.0,<1.8.0
. - Depends on
apache-beam[gcp]>=2.36,<3
. - Updated the documentation for CombinerStatsGenerator to clarify that the
first accumulator passed to merge_accumulators may be modified. - Added compression type detection when reading csv header.
- Detection of invalid utf8 strings now works regardless of relative frequency.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 1.6.0
Major Features and Improvements
- Introduces a convenience wrapper for handling indexed access to statistics
protos. - String features are checked for UTF-8 validity, and the number of invalid
strings is reported as invalid_utf8_count.
Bug Fixes and Other Changes
- Depends on
numpy>=1.16,<2
. - Depends on
absl-py>=0.9,<2.0.0
. - Depends on
tensorflow>=1.15.5,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,<3
. - Depends on
tensorflow-metadata>=1.6.0,<1.7.0
. - Depends on
tfx-bsl>=1.6.0,<1.7.0
. - Depends on
apache-beam[gcp]>=2.35,<3
.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 1.5.0
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- BasicStatsGenerator is now responsible for setting the global num_examples.
This field will no longer be populated at the DatasetFeatureStatistics level
if default generators are disabled. - Depends on
apache-beam[gcp]>=2.34,<3
. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,<3
. - Depends on
tensorflow-metadata>=1.5.0,<1.6.0
. - Depends on
tfx-bsl>=1.5.0,<1.6.0
.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 1.4.0
Major Features and Improvements
- Float features can now be analyzed as categorical for the purposes of top-k
and unique count using experimental sketch based generators. - Support SQL based slicing in TFDV. This would enable slicing (using SQL) in
TFX OSS and Dataflow environments. SQL based slicing is currently not
supported on Windows.
Bug Fixes and Other Changes
- Variance calculations have been updated to be more numerically stable for
large datasets or large magnitude numeric data. - When running per-example validation against a schema, output of
validate_examples_in_tfrecord and validate_examples_in_csv now optionally
return samples of anomalous examples. - Changes to source code ensures that it can now work with
pyarrow>=3
. - Add
load_anomalies_binary
utility function. - Merge two accumulators at a time instead of batching.
- BasicStatsGenerator is now responsible for setting
FeatureNameStatistics.Type. Previously it was possible for a top-k generator
and BasicStatsGenerator to set different types for categorical numeric
features with physical type STRING. - Depends on
pyarrow>=1,<6
. - Depends on
tensorflow-metadata>=1.4,<1.5
. - Depends on
tfx-bsl>=1.4,<1.5
.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- Deprecated python 3.6 support.
TensorFlow Data Validation 1.3.0
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Fixed bug in JensenShannonDivergence calculation affecting comparisons of
histograms that each contain a single value. - Fixed bug in dataset constraints validation that caused failures with very
large numbers of examples. - Fixed a bug wherein slicing on a feature missing from some batches could
produce slice keys derived from a different feature. - Depends on
apache-beam[gcp]>=2.32,<3
. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<3
. - Depends on
tfx-bsl>=1.3,<1.4
.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 1.2.0
Major Features and Improvements
- Added statistics/generators/mutual_information.py. It estimates AMI using a
knn estimation. It differs from sklearn_mutual_information.py in that this
supports multivalent features/labels (by encoding) and multivariate
features/labels. The plan is to deprecate sklearn_mutual_information.py in
the future. - Fixed NonStreamingCustomStatsGenerator to respect max_batches_per_partition.
Bug Fixes and Other Changes
- Depends on 'scikit-learn>=0.23,<0.24' ("mutual-information" extra only)
- Depends on 'scipy>=1.5,<2' ("mutual-information" extra only)
- Depends on
apache-beam[gcp]>=2.31,<3
. - Depends on
tensorflow-metadata>=1.2,<1.3
. - Depends on
tfx-bsl>=1.2,<1.3
.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 1.1.1
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Depends on
google-cloud-bigquery>=1.28.0,<2.21
. - Depends on
tfx-bsl>=1.1.1,<1.2
. - Fixes error when using tfdv.experimental_get_feature_value_slicer with
- pandas==1.3.0.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 1.1.0
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Optimized certain stats generators that needs to materialize the input
RecordBatches. - Depends on
protobuf>=3.13,<4
. - Depends on
tensorflow-metadata>=1.1,<1.2
. - Depends on
tfx-bsl>=1.1,<1.2
.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 1.0.0
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Increased the threshold beyond which a string feature value is considered
"large" by the experimental sketch-based top-k/unique generator to 1024. - Added normalized AMI to sklearn mutual information generator.
- Depends on
apache-beam[gcp]>=2.29,<3
. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<3
. - Depends on
tensorflow-metadata>=1.0,<1.1
. - Depends on
tfx-bsl>=1.0,<1.1
.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- Removed the following deprecated symbols. Their deprecation was announced
in 0.30.0.
tfdv.validate_instance
tfdv.lift_stats_generator
tfdv.partitioned_stats_generator
tfdv.get_feature_value_slicer
- Removed parameter
compression_type
in
tfdv.generate_statistics_from_tfrecord