Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-236: Bridging IO interfaces under the hood in pyarrow #104

Closed
wants to merge 7 commits into from

Conversation

wesm
Copy link
Member

@wesm wesm commented Jul 12, 2016

No description provided.

wesm added 2 commits July 11, 2016 23:54
…ow::io::RandomAccessFile

Change-Id: I7982556541a60ca03b3064a333b207fd45e323c3
Change-Id: I86e43b42582276302332eb3c61afffd6f7187c40
@@ -99,7 +106,7 @@ class ARROW_EXPORT FileReader {
virtual ~FileReader();

private:
class Impl;
class PARQUET_NO_EXPORT Impl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ARROW_NO_EXPORT?

wesm added 2 commits July 12, 2016 13:52
…apping C++ file interfaces

Change-Id: I4a3d0c4d2a763abb02ca546df35b9556f1060c0e
Change-Id: I3f6329e159431df781959d6266b4e016e4f6fa2c
@wesm wesm changed the title WIP ARROW-236: Bridging IO interfaces under the hood in pyarrow ARROW-236: Bridging IO interfaces under the hood in pyarrow Jul 12, 2016
@wesm
Copy link
Member Author

wesm commented Jul 12, 2016

Miraculously, I was able to get this working mere minutes before my talk at Data Science Summit. Let me know any comments on the general approach.

Change-Id: I8e3a1f90907357d138d875b2761a7833b069b86f
@xhochy
Copy link
Member

xhochy commented Jul 13, 2016

The general approach looks good, +1.

@wesm
Copy link
Member Author

wesm commented Jul 15, 2016

Thanks -- I'll get the build passing and merge. I am not able to read very many flat Parquet files right now (for example: Impala's Parquet files do not have the UTF8 annotation for strings, similarly timestamps are stored in Int96), so will create a bunch of JIRAs to track these.

In the absence of a separate metadata (like the Hive metastore), we'll have to make some default guesses about the actual schema and eventually provide some options to set the column logical types explicitly when there is ambiguity.

wesm added 2 commits July 16, 2016 20:39
Change-Id: I2df54d0dc25457055011cd8a2b798fc28b1640d1
Change-Id: Icf2093b4a379bf159b3b1ecce119c7fde77c96ef
@wesm
Copy link
Member Author

wesm commented Jul 18, 2016

+1

@asfgit asfgit closed this in 59e5f98 Jul 18, 2016
@wesm wesm deleted the ARROW-236 branch July 18, 2016 23:25
zhouyuan added a commit to zhouyuan/arrow that referenced this pull request May 23, 2022
…Bytes of ArrowBuf (apache#104)

We have BOUNDS_CHECKING_SKIP in ArrowBuf.setByte or ArrowBuf.getByte, it helps to remove unexpected bounds checks. However, it doesn't exists in ArrowBuf.setBytes or ArrowBuf.getBytes, which makes 10% cpu time cost for checking bounds in our environment.

Closes apache#13161 from jackylee-ch/skip_bounds_check_for_set_or_get_bytes

Authored-by: stczwd <[email protected]>
Signed-off-by: David Li <[email protected]>

Co-authored-by: stczwd <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants