perf: read_ply replace pandas.read_csv engine=python with c; improve read_off header-parsing robustness #352
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
UPDATE: I have rebased this PR on top of the latest commit. The revised changes are:
color
->has_color
; andcount
->n_header
)In particular, ModelNet40 has faulty headers:
For reference, the correct format is:
Nonetheless, it is still valuable to parse the faulty header.
(Original text before #353 was merged)
Big performance improvement by removing the need to use the slow
engine="python"
by reading the sliced file from an in-memory StringIO buffer.Also fixes bug where OFF files containing more lines than
num_points + num_faces
tries to read potential edges as faces!As Wikipedia says, the OFF file may contain:
Of course, this still does not encompass all possible OFF file variants described by Wikipedia, but it's an improvement.