-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add clear indication of non-GPU accelerated parameters in read_json docstring #11825
Add clear indication of non-GPU accelerated parameters in read_json docstring #11825
Conversation
Maybe these arguments should be marked
What functionality exists here for the
|
Codecov ReportBase: 87.40% // Head: 88.11% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-22.12 #11825 +/- ##
================================================
+ Coverage 87.40% 88.11% +0.70%
================================================
Files 133 133
Lines 21833 21881 +48
================================================
+ Hits 19084 19280 +196
+ Misses 2749 2601 -148
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions for clarification, no need to take as is though.
python/cudf/cudf/utils/ioutils.py
Outdated
The first number is the offset in bytes, the second number is the range | ||
size in bytes. Set the size to zero to read all data after the offset | ||
location. Reads the row that starts before or at the end of the range, | ||
even if it ends after the end of the range. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pair of `(offset, length)` specifying a subrange of the file to be read, in bytes.
To read from `offset` to the end of the file, set `length=0`. Reads the starting
before or at the end of the range even if it ends past the end of the range.
What does "at the end of the range" mean? I guess the byte range specifies a semi-open interval [offset, offset+length)
does that mean if a row starts at offset + length - 1
then we read the entire row?
Aside: this is not a very ergonomic way of specifying "read from this offset to the end of the file". Could we accept either an offset
int or an (offset, length)
pair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This too needs a bit of re-work like you said. I'll try to address this in #11780
python/cudf/cudf/utils/ioutils.py
Outdated
For on-the-fly decompression of on-disk data. If 'infer', then use | ||
gzip, bz2, zip or xz if path_or_buf is a string ending in | ||
'.gz', '.bz2', '.zip', or 'xz', respectively, and no decompression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we not use https://pypi.org/project/python-magic/ and just detect the appropriate decompression scheme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we dynamically populate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that if infer
is provided one could detect the actual file type not from the extension (brittle) but using magic (reasonably robust).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just read the email with this GH notification without the context of the thread. It sounded super viable.
@gpucibot merge |
Description
This PR moves the "pandas engine only" arguments to the end of the optional argument list of the docstring.
This is the way an
admonition
will look like:Checklist