ARROW-17449: [Python] Better repr for Buffer, MemoryPool, NativeFile and Codec #13921

milesgranger · 2022-08-19T11:11:12Z

Example:

In [1]: import io
In [2]: import pyarrow as pa

In [3]: pa.PythonFile(io.BytesIO())
Out[3]: <pyarrow.PythonFile closed=False own_file=False is_seekable=False is_writable=True is_readable=False>

In [4]: pa.Codec('gzip')
Out[4]: <pyarrow.Codec name=gzip compression_level=9>

In [5]: pool = pa.default_memory_pool()
In [6]: pool
Out[6]: <pyarrow.MemoryPool backend_name=jemalloc bytes_allocated=0 max_memory=0>

In [7]: pa.allocate_buffer(1024, memory_pool=pool)
Out[7]: <pyarrow.Buffer address=0x7fd660a08000 size=1024 is_cpu=True is_mutable=True

github-actions · 2022-08-19T11:11:32Z

https://issues.apache.org/jira/browse/ARROW-17449

lidavidm · 2022-08-19T12:05:31Z

Just passing by, but I think it'd be good to have the address in Buffer's repr (perhaps in hex as well)

milesgranger · 2022-08-19T12:14:08Z

Something like this pyarrow.lib.Buffer(0x7fc0b46a5e30, size=1024, is_cpu=True, is_mutable=True) or
like how arrays are presented with default repr followed by the pretty repr?

<pyarrow.lib.Buffer at 0x7fea5e51e7b0>
pyarrow.lib.Buffer(size=1024, is_cpu=True, is_mutable=True)

lidavidm · 2022-08-19T12:29:23Z

I would probably vote for the former (I'd guess arrays only do that because their repr() can get long)

python/pyarrow/io.pxi

milesgranger · 2022-08-23T07:08:38Z

@pitrou what do you think?
I believe the error in docs testing is unrelated. (seems to run fine locally anyway)

pitrou

Thanks for doing this. I agree the CI failure looked unrelated (I've restarted the job).

pitrou · 2022-08-23T07:15:01Z

python/pyarrow/table.pxi

@@ -2013,7 +2013,7 @@ cdef class RecordBatch(_PandasConvertible):
        >>> batch = pa.RecordBatch.from_arrays([n_legs, animals],
        ...                                     names=["n_legs", "animals"])
        >>> batch.serialize()
-        <pyarrow.lib.Buffer object at ...>
+        pyarrow.lib.Buffer(address=..., size=..., is_cpu=True, is_mutable=True)


Perhaps:

Suggested change

pyarrow.lib.Buffer(address=..., size=..., is_cpu=True, is_mutable=True)

pyarrow.lib.Buffer(address=0x..., size=..., is_cpu=True, is_mutable=True)

pitrou · 2022-08-23T07:15:37Z

python/pyarrow/memory.pxi

+        return (f"{name}("
+                f"backend_name={self.backend_name}, "
+                f"bytes_allocated={self.bytes_allocated()}, "
+                f"max_memory={self.max_memory()})")


Can we test this repr somewhere? (either a doctest or a pytest unit test)

pitrou · 2022-08-23T07:15:52Z

python/pyarrow/io.pxi

+        name = f"{self.__class__.__module__}.{self.__class__.__name__}"
+        return (f"{name}("
+                f"name={self.name}, "
+                f"compression_level={self.compression_level})")


Add a test for this?

pitrou · 2022-08-23T07:17:56Z

python/pyarrow/io.pxi

+                f"address={hex(self.address)}, "
+                f"size={self.size}, "
+                f"is_cpu={self.is_cpu}, "
+                f"is_mutable={self.is_mutable})")


So, just for the record, this makes it look like the Buffer constructor is callable with these arguments (which it is not).
We could instead go for: <pyarrow.Buffer address=0x...>

@jorisvandenbossche What do you think?

+1 on keeping the < .. > (instead of ()) to not confuse it with an eval-able repr

pitrou · 2022-08-23T07:18:16Z

python/pyarrow/io.pxi

-        return frombytes(self.unwrap().compression_level())
+        if self.name == 'snappy':
+            return None
+        return self.unwrap().compression_level()


Should add a test for this? (I assume this was raising for Snappy?)

Good point, it was failing as-is, compression_level() -> int and frombytes would fail trying to decode an int. Also modified snappy variant as that has no compression level and would give invalid integers.

jorisvandenbossche · 2022-08-23T07:31:55Z

python/pyarrow/io.pxi

@@ -121,6 +121,14 @@ cdef class NativeFile(_Weakrefable):
    def __exit__(self, exc_type, exc_value, tb):
        self.close()

+    def __repr__(self):
+        name = f"{self.__class__.__module__}.{self.__class__.__name__}"


In general those objects are exposed in the top-level pyarrow namespace, so I would maybe hardcode that here instead of using __module__ which gives pyarrow.lib (the lib submodule is also considered private)

(and same for Buffer and others)

jorisvandenbossche · 2022-08-23T07:32:56Z

python/pyarrow/io.pxi

+                f"own_file={self.own_file}, "
+                f"is_seekable={self.is_seekable}, "
+                f"is_writable={self.is_writable}, "
+                f"is_readable={self.is_readable})")


Would it be useful to add whether it is closed or not?

Good idea. 889881e

pitrou · 2022-08-24T07:38:05Z

@milesgranger Please don't hesitate to ping where you're finished.

milesgranger · 2022-08-24T08:04:27Z

Apologies, missed the lint failing. Then this should do it. 🤞

pitrou

LGTM, thanks @milesgranger

pitrou · 2022-08-24T08:11:38Z

@jorisvandenbossche Do you want to take another look?

jorisvandenbossche

Looks good!

ursabot · 2022-08-29T13:01:34Z

Benchmark runs are scheduled for baseline = bd76850 and contender = 6f302a3. 6f302a3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Failed ⬇️1.1% ⬆️0.27%] ursa-i9-9960x
[Finished ⬇️0.14% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 6f302a30 ec2-t3-xlarge-us-east-2
[Failed] 6f302a30 test-mac-arm
[Failed] 6f302a30 ursa-i9-9960x
[Finished] 6f302a30 ursa-thinkcentre-m75q
[Finished] bd768506 ec2-t3-xlarge-us-east-2
[Failed] bd768506 test-mac-arm
[Failed] bd768506 ursa-i9-9960x
[Finished] bd768506 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ursabot · 2022-08-29T13:01:48Z

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

…and Codec (apache#13921) Example: ```python In [1]: import io In [2]: import pyarrow as pa In [3]: pa.PythonFile(io.BytesIO()) Out[3]: <pyarrow.PythonFile closed=False own_file=False is_seekable=False is_writable=True is_readable=False> In [4]: pa.Codec('gzip') Out[4]: <pyarrow.Codec name=gzip compression_level=9> In [5]: pool = pa.default_memory_pool() In [6]: pool Out[6]: <pyarrow.MemoryPool backend_name=jemalloc bytes_allocated=0 max_memory=0> In [7]: pa.allocate_buffer(1024, memory_pool=pool) Out[7]: <pyarrow.Buffer address=0x7fd660a08000 size=1024 is_cpu=True is_mutable=True ``` Authored-by: Miles Granger <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>

Better repr for Buffer, MemoryPool, NativeFile and Codec

11d57a9

github-actions bot added the Component: Python label Aug 19, 2022

Add hex address to reprs

f5e6f23

lidavidm reviewed Aug 22, 2022

View reviewed changes

python/pyarrow/io.pxi Outdated Show resolved Hide resolved

milesgranger added 2 commits August 22, 2022 14:22

Use hex(self.address) for Buffer

99257d3

Fix cython doc tests

76f924e

pitrou reviewed Aug 23, 2022

View reviewed changes

jorisvandenbossche reviewed Aug 23, 2022

View reviewed changes

milesgranger added 5 commits August 23, 2022 11:28

Update reprs appearance

febd2ef

Test Codec compression_level attr

6b49d22

Doctest for Codec

eed7303

Doctest for MemoryPool repr

ae55f75

Add closed to NativeFile and Doctest

889881e

Fix whitepacing in lints

440d647

pitrou approved these changes Aug 24, 2022

View reviewed changes

jorisvandenbossche approved these changes Aug 29, 2022

View reviewed changes

jorisvandenbossche merged commit 6f302a3 into apache:master Aug 29, 2022

milesgranger deleted the ARROW-17449_better-reprs branch August 29, 2022 10:29

jorisvandenbossche mentioned this pull request Jan 19, 2023

GH-15195: [C++][FlightRPC][Python] Add ToString/Equals for Flight types #15196

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-17449: [Python] Better repr for Buffer, MemoryPool, NativeFile and Codec #13921

ARROW-17449: [Python] Better repr for Buffer, MemoryPool, NativeFile and Codec #13921

milesgranger commented Aug 19, 2022 •

edited

Loading

github-actions bot commented Aug 19, 2022

lidavidm commented Aug 19, 2022

milesgranger commented Aug 19, 2022

lidavidm commented Aug 19, 2022

milesgranger commented Aug 23, 2022

pitrou left a comment

pitrou Aug 23, 2022

milesgranger Aug 23, 2022

pitrou Aug 23, 2022

milesgranger Aug 23, 2022

pitrou Aug 23, 2022

milesgranger Aug 23, 2022

pitrou Aug 23, 2022

jorisvandenbossche Aug 23, 2022

milesgranger Aug 23, 2022

pitrou Aug 23, 2022

milesgranger Aug 23, 2022 •

edited

Loading

milesgranger Aug 23, 2022

jorisvandenbossche Aug 23, 2022 •

edited

Loading

milesgranger Aug 23, 2022

jorisvandenbossche Aug 23, 2022

milesgranger Aug 23, 2022

pitrou commented Aug 24, 2022

milesgranger commented Aug 24, 2022

pitrou left a comment

pitrou commented Aug 24, 2022

jorisvandenbossche left a comment

ursabot commented Aug 29, 2022

ursabot commented Aug 29, 2022

	pyarrow.lib.Buffer(address=..., size=..., is_cpu=True, is_mutable=True)
	pyarrow.lib.Buffer(address=0x..., size=..., is_cpu=True, is_mutable=True)

ARROW-17449: [Python] Better repr for Buffer, MemoryPool, NativeFile and Codec #13921

ARROW-17449: [Python] Better repr for Buffer, MemoryPool, NativeFile and Codec #13921

Conversation

milesgranger commented Aug 19, 2022 • edited Loading

github-actions bot commented Aug 19, 2022

lidavidm commented Aug 19, 2022

milesgranger commented Aug 19, 2022

lidavidm commented Aug 19, 2022

milesgranger commented Aug 23, 2022

pitrou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

milesgranger Aug 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche Aug 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitrou commented Aug 24, 2022

milesgranger commented Aug 24, 2022

pitrou left a comment

Choose a reason for hiding this comment

pitrou commented Aug 24, 2022

jorisvandenbossche left a comment

Choose a reason for hiding this comment

ursabot commented Aug 29, 2022

ursabot commented Aug 29, 2022

milesgranger commented Aug 19, 2022 •

edited

Loading

milesgranger Aug 23, 2022 •

edited

Loading

jorisvandenbossche Aug 23, 2022 •

edited

Loading