Skip to content

Releases: tensorflow/data-validation

TensorFlow Data Validation 1.8.0

16 May 07:40
deec746
Compare
Choose a tag to compare

Major Features and Improvements

  • From this version we will be releasing python 3.9 wheels.

Bug Fixes and Other Changes

  • Adds get_statistics_html to the public API.
  • Fixes several incorrect type annotations.
  • Schema inference handles derived features.
  • StatsOptions.to_json now raises an error if it encounters unsupported
    options.
  • Depends on apache-beam[gcp]>=2.38,<3.
  • Depends on
    tensorflow>=1.15.5,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.
  • Depends on tensorflow-metadata>=1.8.0,<1.9.0.
  • Depends on tfx-bsl>=1.8.0,<1.9.0.

Known Issues

  • N/A

Breaking Changes

  • N/A

Deprecations

  • N/A

TensorFlow Data Validation 1.7.0

02 Mar 20:48
f5e6975
Compare
Choose a tag to compare

Major Features and Improvements

  • Adds the DetectFeatureSkew PTransform to the public API, which can be used
    to detect feature skew between training and serving examples.
  • Uses sketch-based top-k/uniques in TFDV inmemory mode.

Bug Fixes and Other Changes

  • Fixes a bug in load_statistics that would cause failure when reading binary
    protos.
  • Depends on pyfarmhash>=0.2,<0.4.
  • Depends on
    tensorflow>=1.15.5,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.
  • Depends on tensorflow-metadata>=1.7.0,<1.8.0.
  • Depends on tfx-bsl>=1.7.0,<1.8.0.
  • Depends on apache-beam[gcp]>=2.36,<3.
  • Updated the documentation for CombinerStatsGenerator to clarify that the
    first accumulator passed to merge_accumulators may be modified.
  • Added compression type detection when reading csv header.
  • Detection of invalid utf8 strings now works regardless of relative frequency.

Known Issues

  • N/A

Breaking Changes

  • N/A

Deprecations

  • N/A

TensorFlow Data Validation 1.6.0

21 Jan 02:09
5cea766
Compare
Choose a tag to compare

Major Features and Improvements

  • Introduces a convenience wrapper for handling indexed access to statistics
    protos.
  • String features are checked for UTF-8 validity, and the number of invalid
    strings is reported as invalid_utf8_count.

Bug Fixes and Other Changes

  • Depends on numpy>=1.16,<2.
  • Depends on absl-py>=0.9,<2.0.0.
  • Depends on
    tensorflow>=1.15.5,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,<3.
  • Depends on tensorflow-metadata>=1.6.0,<1.7.0.
  • Depends on tfx-bsl>=1.6.0,<1.7.0.
  • Depends on apache-beam[gcp]>=2.35,<3.

Known Issues

  • N/A

Breaking Changes

  • N/A

Deprecations

  • N/A

TensorFlow Data Validation 1.5.0

01 Dec 23:06
c477451
Compare
Choose a tag to compare

Major Features and Improvements

  • N/A

Bug Fixes and Other Changes

  • BasicStatsGenerator is now responsible for setting the global num_examples.
    This field will no longer be populated at the DatasetFeatureStatistics level
    if default generators are disabled.
  • Depends on apache-beam[gcp]>=2.34,<3.
  • Depends on
    tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,<3.
  • Depends on tensorflow-metadata>=1.5.0,<1.6.0.
  • Depends on tfx-bsl>=1.5.0,<1.6.0.

Known Issues

  • N/A

Breaking Changes

  • N/A

Deprecations

  • N/A

TensorFlow Data Validation 1.4.0

27 Oct 23:36
a59b1b5
Compare
Choose a tag to compare

Major Features and Improvements

  • Float features can now be analyzed as categorical for the purposes of top-k
    and unique count using experimental sketch based generators.
  • Support SQL based slicing in TFDV. This would enable slicing (using SQL) in
    TFX OSS and Dataflow environments. SQL based slicing is currently not
    supported on Windows.

Bug Fixes and Other Changes

  • Variance calculations have been updated to be more numerically stable for
    large datasets or large magnitude numeric data.
  • When running per-example validation against a schema, output of
    validate_examples_in_tfrecord and validate_examples_in_csv now optionally
    return samples of anomalous examples.
  • Changes to source code ensures that it can now work with pyarrow>=3.
  • Add load_anomalies_binary utility function.
  • Merge two accumulators at a time instead of batching.
  • BasicStatsGenerator is now responsible for setting
    FeatureNameStatistics.Type. Previously it was possible for a top-k generator
    and BasicStatsGenerator to set different types for categorical numeric
    features with physical type STRING.
  • Depends on pyarrow>=1,<6.
  • Depends on tensorflow-metadata>=1.4,<1.5.
  • Depends on tfx-bsl>=1.4,<1.5.

Known Issues

  • N/A

Breaking Changes

  • N/A

Deprecations

  • Deprecated python 3.6 support.

TensorFlow Data Validation 1.3.0

20 Sep 17:17
fcc81c1
Compare
Choose a tag to compare

Major Features and Improvements

  • N/A

Bug Fixes and Other Changes

  • Fixed bug in JensenShannonDivergence calculation affecting comparisons of
    histograms that each contain a single value.
  • Fixed bug in dataset constraints validation that caused failures with very
    large numbers of examples.
  • Fixed a bug wherein slicing on a feature missing from some batches could
    produce slice keys derived from a different feature.
  • Depends on apache-beam[gcp]>=2.32,<3.
  • Depends on
    tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<3.
  • Depends on tfx-bsl>=1.3,<1.4.

Known Issues

  • N/A

Breaking Changes

  • N/A

Deprecations

  • N/A

TensorFlow Data Validation 1.2.0

28 Jul 23:14
43e4e2b
Compare
Choose a tag to compare

Major Features and Improvements

  • Added statistics/generators/mutual_information.py. It estimates AMI using a
    knn estimation. It differs from sklearn_mutual_information.py in that this
    supports multivalent features/labels (by encoding) and multivariate
    features/labels. The plan is to deprecate sklearn_mutual_information.py in
    the future.
  • Fixed NonStreamingCustomStatsGenerator to respect max_batches_per_partition.

Bug Fixes and Other Changes

  • Depends on 'scikit-learn>=0.23,<0.24' ("mutual-information" extra only)
  • Depends on 'scipy>=1.5,<2' ("mutual-information" extra only)
  • Depends on apache-beam[gcp]>=2.31,<3.
  • Depends on tensorflow-metadata>=1.2,<1.3.
  • Depends on tfx-bsl>=1.2,<1.3.

Known Issues

  • N/A

Breaking Changes

  • N/A

Deprecations

  • N/A

TensorFlow Data Validation 1.1.1

26 Jul 17:21
90c3970
Compare
Choose a tag to compare

Major Features and Improvements

  • N/A

Bug Fixes and Other Changes

  • Depends on google-cloud-bigquery>=1.28.0,<2.21.
  • Depends on tfx-bsl>=1.1.1,<1.2.
  • Fixes error when using tfdv.experimental_get_feature_value_slicer with
  • pandas==1.3.0.

Known Issues

  • N/A

Breaking Changes

  • N/A

Deprecations

  • N/A

TensorFlow Data Validation 1.1.0

22 Jun 20:17
269bab2
Compare
Choose a tag to compare

Major Features and Improvements

  • N/A

Bug Fixes and Other Changes

  • Optimized certain stats generators that needs to materialize the input
    RecordBatches.
  • Depends on protobuf>=3.13,<4.
  • Depends on tensorflow-metadata>=1.1,<1.2.
  • Depends on tfx-bsl>=1.1,<1.2.

Known Issues

  • N/A

Breaking Changes

  • N/A

Deprecations

  • N/A

TensorFlow Data Validation 1.0.0

24 May 18:30
be61605
Compare
Choose a tag to compare

Major Features and Improvements

  • N/A

Bug Fixes and Other Changes

  • Increased the threshold beyond which a string feature value is considered
    "large" by the experimental sketch-based top-k/unique generator to 1024.
  • Added normalized AMI to sklearn mutual information generator.
  • Depends on apache-beam[gcp]>=2.29,<3.
  • Depends on tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<3.
  • Depends on tensorflow-metadata>=1.0,<1.1.
  • Depends on tfx-bsl>=1.0,<1.1.

Known Issues

  • N/A

Breaking Changes

  • N/A

Deprecations

  • Removed the following deprecated symbols. Their deprecation was announced
    in 0.30.0.
  • tfdv.validate_instance
  • tfdv.lift_stats_generator
  • tfdv.partitioned_stats_generator
  • tfdv.get_feature_value_slicer
  • Removed parameter compression_type in
    tfdv.generate_statistics_from_tfrecord