-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for time, datetime, json, real, double types #202
Conversation
The goal of this code is similar as what is implemented for Could you review the existing mapping code and correct any shortcomings that you may have fixed. trino-python-client/trino/client.py Lines 583 to 651 in 951ad82
AFAIK we didn't do unit testing for the types. So I think that's probably something we should keep. |
tests/unit/test_types.py
Outdated
None | ||
] | ||
|
||
def test_types(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please split the tests by type, that will be easier to troubleshoot when things go wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also test using the flag experimental_python_types
set to True
with the TrinoResult
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_map_to_python_type()
works slightly differently: for each single row it analyzes the type and returns the desired value. __col_func()
is instead called once per column and returns a lambda which is then called for each value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually a great optimisation and definitely welcome. Can you merge this optimization with the existing logic of experimental_python_types
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
trino/client.py
Outdated
@@ -226,6 +227,66 @@ def __repr__(self): | |||
) | |||
) | |||
|
|||
def __process_rows(self, rows, columns): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably just be a private function instead of the double underscore of a Python hook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
@property | ||
def response_headers(self): | ||
return self._query.response_headers | ||
|
||
@classmethod | ||
def _map_row(cls, experimental_python_types, row, columns): | ||
def _map_row(cls, experimental_python_types, row, col_mapping_func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All logic from _map_row
and _map_to_python_types
can probably be moved to the col_mapping_func
itself. I think this way, this class doesn't need to know anymore of experimental_python_types
, and just call the mapping function instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could, but then the whole body of _map_to_python_type()
would be repeated inside each mapping function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I meant _map_to_python_type
. You could use the experimental_python_types
in the col_mapping_funcs itself, that way the TrinoResult
class just needs to know about the col_mapping_funcs.
@@ -229,6 +231,84 @@ def __repr__(self): | |||
) | |||
) | |||
|
|||
def _col_func(self, column): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i look at the comments above I read: the status of a query is represented
by TrinoStatus
.
I wonder if it would be better to put the the logic in a RowMapperFactory
class that returns an array of mapper functions. The question is then where this RowMapperFactory should be called and where the rows mapping should happen. Maybe TrinoQuery
is a better candidate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Processing the column
section (to get a list of mapping lambdas) and processing the rows can be done in multiple locations. I performed the former in the TrinoStatus
class because this is where the data is first received. Processing the rows is done downstream, so left it there.
As far as the best candidate to perform both operations, I don't have any string opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By extracting it to its own class it would allow for separation of concern of each class and put it in the most appropriate place (composition).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. But there is still the question about whether the location where mapping function generation and execution should take place.
As a general remark, please rebase instead of merge changes from master, otherwise you will introduce already implemented changes in your PR's commits. |
@@ -294,7 +294,7 @@ def test_datetime_with_utc_time_zone_query_param(trino_connection): | |||
rows = cur.fetchall() | |||
|
|||
assert rows[0][0] == params | |||
assert cur.description[0][1] == "timestamp with time zone" | |||
assert cur.description[0][1] == "timestamp(6) with time zone" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that only milliseconds are currently supported. If you would change the test case to more precise values over ms, eg. params = datetime(2020, 1, 1, 16, 43, 22, 320258)
, you would see that 258 part is lost, because of #42.
IMHO providing a value with a precision that is not actually supported right now seems not so useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the whole 6 digits are preserved now.
return lambda val: Decimal(val) | ||
elif col_type.startswith('double') or col_type.startswith('real'): | ||
return lambda val: float('inf') if val == 'Infinity' \ | ||
else -float('inf') if val == '-Infinity' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the existing constants for performance.
INF = float("inf")
NEGATIVE_INF = float("-inf")
NAN = float("nan")
@@ -339,6 +421,7 @@ def __init__( | |||
self._http_session = self.http.Session() | |||
self._http_session.verify = verify | |||
self._http_session.headers.update(self.http_headers) | |||
self._http_session.headers['x-trino-client-capabilities'] = 'PARAMETRIC_DATETIME' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should probably go into another PR. See #114
Closing to split the PR is smaller chunks |
This PR adds support for time, datetime, json, read and double Trino types when querying for some rows.
The
time
anddatetime
required some data massaging:datetime(9)
) are dropped