perf: read_ply replace pandas.read_csv engine=python with c; improve read_off header-parsing robustness #352

YodaEmbedding · 2023-08-08T11:05:51Z

UPDATE: I have rebased this PR on top of the latest commit. The revised changes are:

perf: Speed up reading of ASCII PLY files.
feat: improve robustness for OFF headers on e.g. ModelNet40
perf: reuse already open file for reading instead of opening it twice
style: renamed variables for clarity (e.g. color -> has_color; and count -> n_header)

In particular, ModelNet40 has faulty headers:

$ head -n 1 ModelNet40/chair/train/chair_0856.off
OFF6586 5534 0

For reference, the correct format is:

OFF
6586 5534 0

Nonetheless, it is still valuable to parse the faulty header.

(Original text before #353 was merged)

Big performance improvement by removing the need to use the slow engine="python" by reading the sliced file from an in-memory StringIO buffer.

Also fixes bug where OFF files containing more lines than num_points + num_faces tries to read potential edges as faces!

As Wikipedia says, the OFF file may contain:

points
faces (optional)
edges (optional)

Of course, this still does not encompass all possible OFF file variants described by Wikipedia, but it's an improvement.

pyntcloud/io/off.py

Improve robustness of header parsing a bit. In particular, ModelNet40 has faulty headers: ```bash $ head -n 1 ModelNet40/chair/train/chair_0856.off OFF6586 5534 0 ``` For reference, the correct format is: ``` OFF 6586 5534 0 ``` Nonetheless, it is still valuable to parse the faulty header. Also, reuse already open file for reading instead of opening it twice.

YodaEmbedding · 2023-12-24T12:55:15Z

Both this PR and #353 improved pandas performance for *.OFF files with engine=c. Therefore, I rebased this PR on top of #353. This PR still contains some other useful changes, listed above.

Future work:

Once this is reviewed/accepted, I can look into improving compatibility with Wikipedia's description of the *.OFF file format. Of course, perfect compatibility is too slow, but there's still some missing features:

"C" in the header should not be needed to detect the presence of color (see Wikipedia's example).
Edges, and edge colors.

github-advanced-security bot found potential problems Aug 8, 2023

View reviewed changes

pyntcloud/io/off.py Fixed Show fixed Hide fixed

YodaEmbedding force-pushed the perf/pandas_c_engine branch from 001ce2c to bd7fabb Compare August 8, 2023 12:12

YodaEmbedding force-pushed the perf/pandas_c_engine branch from bd7fabb to 9d2a3fb Compare December 24, 2023 11:52

perf: read_ply replace pandas.read_csv engine=python with c

248fd0f

YodaEmbedding force-pushed the perf/pandas_c_engine branch 2 times, most recently from fa353ac to 4c5211b Compare December 24, 2023 12:31

YodaEmbedding force-pushed the perf/pandas_c_engine branch from 4c5211b to 12ee9f2 Compare December 24, 2023 12:45

YodaEmbedding changed the title ~~perf: read_off replace pandas.read_csv engine=python with c~~ perf: read_ply replace pandas.read_csv engine=python with c; improve read_off header-parsing robustness Dec 24, 2023

YodaEmbedding mentioned this pull request Feb 2, 2024

Point cloud compression InterDigitalInc/CompressAI#270

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: read_ply replace pandas.read_csv engine=python with c; improve read_off header-parsing robustness #352

perf: read_ply replace pandas.read_csv engine=python with c; improve read_off header-parsing robustness #352

YodaEmbedding commented Aug 8, 2023 •

edited

Loading

YodaEmbedding commented Dec 24, 2023 •

edited

Loading

perf: read_ply replace pandas.read_csv engine=python with c; improve read_off header-parsing robustness #352

Are you sure you want to change the base?

perf: read_ply replace pandas.read_csv engine=python with c; improve read_off header-parsing robustness #352

Conversation

YodaEmbedding commented Aug 8, 2023 • edited Loading

YodaEmbedding commented Dec 24, 2023 • edited Loading

YodaEmbedding commented Aug 8, 2023 •

edited

Loading

YodaEmbedding commented Dec 24, 2023 •

edited

Loading