Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds StorageDescriptor and tests #2109

Merged
merged 2 commits into from
Jan 14, 2025

Conversation

chalmerlowe
Copy link
Collaborator

This PR adds the StorageDescriptor class and the associated tests, plus minor tweaks to support both of those changes.

@chalmerlowe chalmerlowe requested review from a team as code owners January 13, 2025 20:09
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Jan 13, 2025
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jan 13, 2025
@chalmerlowe chalmerlowe assigned tswast and Linchin and unassigned PhongChuong Jan 13, 2025
Comment on lines 653 to 665
inputFormat (Optional[str]): Specifies the fully qualified class name of
the InputFormat (e.g.
"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"). The maximum
length is 128 characters.
locationUri (Optional[str]): The physical location of the table (e.g.
'gs://spark-dataproc-data/pangea-data/case_sensitive/' or
'gs://spark-dataproc-data/pangea-data/'). The maximum length is
2056 bytes.
outputFormat (Optional[str]): Specifies the fully qualified class name
of the OutputFormat (e.g.
"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat"). The maximum
length is 128 characters.
serdeInfo (Union[SerDeInfo, dict, None]): Serializer and deserializer information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inputFormat (Optional[str]): Specifies the fully qualified class name of
the InputFormat (e.g.
"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"). The maximum
length is 128 characters.
locationUri (Optional[str]): The physical location of the table (e.g.
'gs://spark-dataproc-data/pangea-data/case_sensitive/' or
'gs://spark-dataproc-data/pangea-data/'). The maximum length is
2056 bytes.
outputFormat (Optional[str]): Specifies the fully qualified class name
of the OutputFormat (e.g.
"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat"). The maximum
length is 128 characters.
serdeInfo (Union[SerDeInfo, dict, None]): Serializer and deserializer information.
input_format (Optional[str]): Specifies the fully qualified class name of
the InputFormat (e.g.
"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"). The maximum
length is 128 characters.
location_uri (Optional[str]): The physical location of the table (e.g.
'gs://spark-dataproc-data/pangea-data/case_sensitive/' or
'gs://spark-dataproc-data/pangea-data/'). The maximum length is
2056 bytes.
output_format (Optional[str]): Specifies the fully qualified class name
of the OutputFormat (e.g.
"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat"). The maximum
length is 128 characters.
serde_info (Union[SerDeInfo, dict, None]): Serializer and deserializer information.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

self._properties["outputFormat"] = value

@property
def serde_info(self) -> Union[SerDeInfo, dict, None]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd never return a dict though, just SerDeInfo or None, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved.

mypy sometimes gets confused by a setter that accepts A, B, C
when paired with a getter that can only return A, C.

I added a typing.cast() call on ~line 680 to help mypy out and added a comment at that point to explain 'why typing.cast?'


prop = _helpers._get_sub_prop(self._properties, ["serDeInfo"])
if prop is not None:
prop = SerDeInfo("PLACEHOLDER").from_api_repr(prop)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_api_repr should be a class method, so instance shouldn't be required.

Suggested change
prop = SerDeInfo("PLACEHOLDER").from_api_repr(prop)
prop = SerDeInfo.from_api_repr(prop)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

Copy link
Contributor

@Linchin Linchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, LGTM.

@chalmerlowe chalmerlowe merged commit 6be0272 into main Jan 14, 2025
19 checks passed
@chalmerlowe chalmerlowe deleted the feat-b358215039-adds-storagedescriptor-class branch January 14, 2025 20:48
chalmerlowe added a commit that referenced this pull request Jan 22, 2025
* feat: adds StorageDescriptor and tests

* updates attr names, corrects type hinting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants