Make column names optional for parquet tfxio #54

martinbomio · 2022-03-23T23:31:13Z

Making the column names optional, reading all fields of the dataset when columns are not specified. This is the same behaviour as the underlying BeamSource.

martinbomio · 2022-03-23T23:33:41Z

@iindyk when implementing the changes in tfx to add to the factory method, I noticed that column names are not passed, and realized that they shouldn't be required by the tfxio

iindyk · 2022-03-24T18:08:58Z

tfx_bsl/tfxio/parquet_tfxio_test.py

@@ -301,6 +301,34 @@ def _AssertFn(record_batch_list):
      record_batch_pcoll = (p | tfxio.BeamSource(batch_size=_NUM_ROWS))
      beam_testing_util.assert_that(record_batch_pcoll, _AssertFn)

+  def testOptionalColumnNames(self):


thanks, could you please also add a test when only a subset of columns is read

@iindyk happy to add one, but isn't the projected test case covering this case?

it does, but through project path, which shouldn't be the default when one just wants to read subset of data (e.g. it's should not be necessary to have the whole schema). Users often take examples from tests, so seeing how it can be done here would be useful imo

makes sense, I'll add it

iindyk · 2022-03-24T18:46:25Z

tfx_bsl/tfxio/parquet_tfxio_test.py

+    tfxio = ParquetTFXIO(
+      file_pattern=self._example_file,
+      column_names=['int_feature'],
+      schema=_SCHEMA)


can this just contain the int_feature?

sure, I was trying to test that even when the schema has all fields, but the requested columns are a subset, the result schema and result records will have just a subset

I added another test case for this

iindyk · 2022-03-25T17:46:50Z

bazel-bin

@@ -0,0 +1 @@
+/private/var/tmp/_bazel_martinbomio/17e5cf616981f27d03c506e3e9f0879d/execroot/tfx_bsl/bazel-out/darwin-opt/bin


could you please revert files outside of the project

oops, removed them in a65d12b

imported internally, under review

martinbomio · 2022-03-28T17:45:54Z

@iindyk I saw that these changes were commited in a3ce0b8. Should I close this one?

iindyk · 2022-03-28T17:55:19Z

yup, this was merged in a3ce0b8

Make column names optional for parquet tfxio

01dd2ad

Add telemetry_descriptors property

09241b9

martinbomio mentioned this pull request Mar 24, 2022

Extend tfxio factory to use parquet-tfxio tensorflow/tfx#4761

Merged

iindyk reviewed Mar 24, 2022

View reviewed changes

Add test case for subset of columns

8d937c7

iindyk reviewed Mar 24, 2022

View reviewed changes

Add test for subset of columns with a projected schema

1edd9eb

iindyk approved these changes Mar 25, 2022

View reviewed changes

iindyk reviewed Mar 25, 2022

View reviewed changes

Remove bazel files

a65d12b

iindyk approved these changes Mar 25, 2022

View reviewed changes

iindyk closed this Mar 28, 2022

copybara-service bot mentioned this pull request Jun 8, 2022

PR #4761: Extend tfxio factory to use parquet-tfxio tensorflow/tfx#4927

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make column names optional for parquet tfxio #54

Make column names optional for parquet tfxio #54

martinbomio commented Mar 23, 2022

martinbomio commented Mar 23, 2022

iindyk Mar 24, 2022 •

edited

Loading

martinbomio Mar 24, 2022

iindyk Mar 24, 2022

martinbomio Mar 24, 2022

iindyk Mar 24, 2022

martinbomio Mar 24, 2022

martinbomio Mar 24, 2022

iindyk Mar 25, 2022

martinbomio Mar 25, 2022

iindyk Mar 25, 2022

martinbomio commented Mar 28, 2022

iindyk commented Mar 28, 2022

		@@ -0,0 +1 @@
		/private/var/tmp/_bazel_martinbomio/17e5cf616981f27d03c506e3e9f0879d/execroot/tfx_bsl/bazel-out/darwin-opt/bin

Make column names optional for parquet tfxio #54

Make column names optional for parquet tfxio #54

Conversation

martinbomio commented Mar 23, 2022

martinbomio commented Mar 23, 2022

iindyk Mar 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martinbomio commented Mar 28, 2022

iindyk commented Mar 28, 2022

iindyk Mar 24, 2022 •

edited

Loading