BigQuery: Support user-overridable dtypes in `to_dataframe` method. #7049

tswast · 2019-01-04T20:58:24Z

With pandas 0.24.0 (unreleased), a new dtype is available to support nullable integer columns. http://pandas-docs.github.io/pandas-docs-travis/integer_na.html#integer-na The default behavior is to convert to float, but this can result in data loss (#6177). This new dtype extension avoids that.

I propose we allow the user to provide a map from column names to dtypes for any columns for which they'd like to override the default behavior. This argument could be called dtype_overrides. This would also be useful for other extension dtypes in the future, such as for GEOGRAPHY columns.

See googleapis/python-bigquery-pandas#242 for additional discussion.

Alternatives

Make the new dtype for nullable integer the default for integer columns.
- Con: Not compatible with older versions of pandas.
- Con: Inconsistent with pandas's default behavior.

The text was updated successfully, but these errors were encountered:

tswast · 2019-01-12T01:02:26Z

I experimented with this in master...tswast:b122674716-bqstorage-types for the BigQuery Storage API. Similar work is needed for the BigQuery API.

I think the dictionary of column names to dtypes works well. I don't see any problems with using pandas Series constructor for type casting.

Should it be an error if a dtype was supplied by the column isn't actually in the DataFrame? I think we might be able to parse the avro_schema to see what columns are available ahead of time.

tswast added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. api: bigquery Issues related to the BigQuery API. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Jan 4, 2019

tswast assigned tswast and alixhami Jan 4, 2019

tseaver removed the priority: p2 Moderately-important priority. Fix may not be included in next release. label Jan 5, 2019

tswast mentioned this issue Jan 15, 2019

Add option to choose dtypes by column in to_dataframe. #7126

Merged

tswast closed this as completed in #7126 Jan 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery: Support user-overridable dtypes in `to_dataframe` method. #7049

BigQuery: Support user-overridable dtypes in `to_dataframe` method. #7049

tswast commented Jan 4, 2019

tswast commented Jan 12, 2019

BigQuery: Support user-overridable dtypes in to_dataframe method. #7049

BigQuery: Support user-overridable dtypes in to_dataframe method. #7049

Comments

tswast commented Jan 4, 2019

tswast commented Jan 12, 2019

BigQuery: Support user-overridable dtypes in `to_dataframe` method. #7049

BigQuery: Support user-overridable dtypes in `to_dataframe` method. #7049