-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17449: [Python] Better repr for Buffer, MemoryPool, NativeFile and Codec #13921
ARROW-17449: [Python] Better repr for Buffer, MemoryPool, NativeFile and Codec #13921
Conversation
Just passing by, but I think it'd be good to have the address in Buffer's repr (perhaps in hex as well) |
Something like this
|
I would probably vote for the former (I'd guess arrays only do that because their repr() can get long) |
@pitrou what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. I agree the CI failure looked unrelated (I've restarted the job).
python/pyarrow/table.pxi
Outdated
@@ -2013,7 +2013,7 @@ cdef class RecordBatch(_PandasConvertible): | |||
>>> batch = pa.RecordBatch.from_arrays([n_legs, animals], | |||
... names=["n_legs", "animals"]) | |||
>>> batch.serialize() | |||
<pyarrow.lib.Buffer object at ...> | |||
pyarrow.lib.Buffer(address=..., size=..., is_cpu=True, is_mutable=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps:
pyarrow.lib.Buffer(address=..., size=..., is_cpu=True, is_mutable=True) | |
pyarrow.lib.Buffer(address=0x..., size=..., is_cpu=True, is_mutable=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python/pyarrow/memory.pxi
Outdated
return (f"{name}(" | ||
f"backend_name={self.backend_name}, " | ||
f"bytes_allocated={self.bytes_allocated()}, " | ||
f"max_memory={self.max_memory()})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test this repr somewhere? (either a doctest or a pytest unit test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python/pyarrow/io.pxi
Outdated
name = f"{self.__class__.__module__}.{self.__class__.__name__}" | ||
return (f"{name}(" | ||
f"name={self.name}, " | ||
f"compression_level={self.compression_level})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python/pyarrow/io.pxi
Outdated
f"address={hex(self.address)}, " | ||
f"size={self.size}, " | ||
f"is_cpu={self.is_cpu}, " | ||
f"is_mutable={self.is_mutable})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, just for the record, this makes it look like the Buffer
constructor is callable with these arguments (which it is not).
We could instead go for: <pyarrow.Buffer address=0x...>
@jorisvandenbossche What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on keeping the < .. >
(instead of ()
) to not confuse it with an eval-able repr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return frombytes(self.unwrap().compression_level()) | ||
if self.name == 'snappy': | ||
return None | ||
return self.unwrap().compression_level() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add a test for this? (I assume this was raising for Snappy?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, it was failing as-is, compression_level() -> int
and frombytes
would fail trying to decode an int. Also modified snappy variant as that has no compression level and would give invalid integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python/pyarrow/io.pxi
Outdated
@@ -121,6 +121,14 @@ cdef class NativeFile(_Weakrefable): | |||
def __exit__(self, exc_type, exc_value, tb): | |||
self.close() | |||
|
|||
def __repr__(self): | |||
name = f"{self.__class__.__module__}.{self.__class__.__name__}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general those objects are exposed in the top-level pyarrow
namespace, so I would maybe hardcode that here instead of using __module__
which gives pyarrow.lib
(the lib submodule is also considered private)
(and same for Buffer and others)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python/pyarrow/io.pxi
Outdated
f"own_file={self.own_file}, " | ||
f"is_seekable={self.is_seekable}, " | ||
f"is_writable={self.is_writable}, " | ||
f"is_readable={self.is_readable})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to add whether it is closed
or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. 889881e
@milesgranger Please don't hesitate to ping where you're finished. |
Apologies, missed the lint failing. Then this should do it. 🤞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @milesgranger
@jorisvandenbossche Do you want to take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Benchmark runs are scheduled for baseline = bd76850 and contender = 6f302a3. 6f302a3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
…and Codec (apache#13921) Example: ```python In [1]: import io In [2]: import pyarrow as pa In [3]: pa.PythonFile(io.BytesIO()) Out[3]: <pyarrow.PythonFile closed=False own_file=False is_seekable=False is_writable=True is_readable=False> In [4]: pa.Codec('gzip') Out[4]: <pyarrow.Codec name=gzip compression_level=9> In [5]: pool = pa.default_memory_pool() In [6]: pool Out[6]: <pyarrow.MemoryPool backend_name=jemalloc bytes_allocated=0 max_memory=0> In [7]: pa.allocate_buffer(1024, memory_pool=pool) Out[7]: <pyarrow.Buffer address=0x7fd660a08000 size=1024 is_cpu=True is_mutable=True ``` Authored-by: Miles Granger <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…and Codec (apache#13921) Example: ```python In [1]: import io In [2]: import pyarrow as pa In [3]: pa.PythonFile(io.BytesIO()) Out[3]: <pyarrow.PythonFile closed=False own_file=False is_seekable=False is_writable=True is_readable=False> In [4]: pa.Codec('gzip') Out[4]: <pyarrow.Codec name=gzip compression_level=9> In [5]: pool = pa.default_memory_pool() In [6]: pool Out[6]: <pyarrow.MemoryPool backend_name=jemalloc bytes_allocated=0 max_memory=0> In [7]: pa.allocate_buffer(1024, memory_pool=pool) Out[7]: <pyarrow.Buffer address=0x7fd660a08000 size=1024 is_cpu=True is_mutable=True ``` Authored-by: Miles Granger <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…and Codec (apache#13921) Example: ```python In [1]: import io In [2]: import pyarrow as pa In [3]: pa.PythonFile(io.BytesIO()) Out[3]: <pyarrow.PythonFile closed=False own_file=False is_seekable=False is_writable=True is_readable=False> In [4]: pa.Codec('gzip') Out[4]: <pyarrow.Codec name=gzip compression_level=9> In [5]: pool = pa.default_memory_pool() In [6]: pool Out[6]: <pyarrow.MemoryPool backend_name=jemalloc bytes_allocated=0 max_memory=0> In [7]: pa.allocate_buffer(1024, memory_pool=pool) Out[7]: <pyarrow.Buffer address=0x7fd660a08000 size=1024 is_cpu=True is_mutable=True ``` Authored-by: Miles Granger <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
Example: