Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-1489: Assign a writer id to CUDF #1594

Closed
wants to merge 4 commits into from

Conversation

guiyanakuang
Copy link
Member

What changes were proposed in this pull request?

This pr is aimed at assigning a writer id to the CUDF.

Why are the changes needed?

This helps to locate the writer of a specific orc file, and it also helps the reader to do some special reads for files created by different writers.

How was this patch tested?

Added UT

@wgtmac
Copy link
Member

wgtmac commented Aug 23, 2023

I assume there is a follow-up action to adopt this in the CUDF community?

@guiyanakuang
Copy link
Member Author

I'm going to make a pr for CUDF later to set the id. You've reminded me not to merge this pr until it's accepted by the CUDF community.

@guiyanakuang guiyanakuang marked this pull request as draft August 23, 2023 06:17
@wgtmac
Copy link
Member

wgtmac commented Aug 23, 2023

I'm going to make a pr for CUDF later to set the id. You've reminded me not to merge this pr until it's accepted by the CUDF community.

BTW, they cannot set this value until it is released.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little confused about this comment. Do you mean CUDF release or Apache ORC release with ORC-1489, @wgtmac ?

BTW, they cannot set this value until it is released.

@dongjoon-hyun
Copy link
Member

One more comment to @guiyanakuang . I'd like to recommend to have an umbrella JIRA for all CUDA works, @guiyanakuang . :)

@omalley
Copy link
Contributor

omalley commented Aug 23, 2023

You also need to update the specification at

// 4 = Trino
.

This PR should be merged first since it effectively reserves the number for them.

@wgtmac
Copy link
Member

wgtmac commented Aug 24, 2023

I was a little confused about this comment. Do you mean CUDF release or Apache ORC release with ORC-1489, @wgtmac ?

BTW, they cannot set this value until it is released.

Sorry for the confusion. I mean we need to make an official release to let cudf set the value.

@dongjoon-hyun
Copy link
Member

Thanks, @wgtmac . Then, let's make it happen at 1.9.2 to help them because 2.0.0 is scheduled in 2024.

@dongjoon-hyun dongjoon-hyun modified the milestones: 2.0.0, 1.9.2 Aug 24, 2023
@guiyanakuang guiyanakuang marked this pull request as ready for review August 28, 2023 02:25
@guiyanakuang
Copy link
Member Author

One more comment to @guiyanakuang . I'd like to recommend to have an umbrella JIRA for all CUDA works, @guiyanakuang . :)

I have created an umbrella JIRA for all CUDF work. Please let me know if any tasks are missing.

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -136,6 +136,7 @@ message Footer {
// 2 = Presto
// 3 = Scritchley Go from https://github.com/scritchley/orc
// 4 = Trino
// 5 = CUDF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update ORCv2.md as well?

Copy link
Member Author

@guiyanakuang guiyanakuang Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I've already updated ORCv2.md.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree with @wgtmac 's comment about ORCv2.md.

+1, LGTM except that.

dongjoon-hyun pushed a commit that referenced this pull request Aug 30, 2023
### What changes were proposed in this pull request?

This pr is aimed at assigning a writer id to the CUDF.

### Why are the changes needed?

This helps to locate the writer of a specific orc file, and it also helps the reader to do some special reads for files created by different writers.

### How was this patch tested?

Added UT

Closes #1594 from guiyanakuang/ORC-1489.

Lead-authored-by: zhangyiqun <[email protected]>
Co-authored-by: Yiqun Zhang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 5d163d2)
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Aug 30, 2023
### What changes were proposed in this pull request?

This pr is aimed at assigning a writer id to the CUDF.

### Why are the changes needed?

This helps to locate the writer of a specific orc file, and it also helps the reader to do some special reads for files created by different writers.

### How was this patch tested?

Added UT

Closes #1594 from guiyanakuang/ORC-1489.

Lead-authored-by: zhangyiqun <[email protected]>
Co-authored-by: Yiqun Zhang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 5d163d2)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

Merged to main/1.9/1.8. Thank you, @guiyanakuang and @wgtmac .

BTW, to @guiyanakuang , I used dev/merge_orc_pr.py to merge to main/1.9/1.8 branches and update JIRA together.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Aug 30, 2023

Additionally, GitHub Milestone feature doesn't allow multiple versions on a single PR. So, I created a clone of this for milestone 1.9.2 separately. It's #1603.

It's a kind of temporary GitHub Milestone hack to have at the same time in two milestones.

@wgtmac
Copy link
Member

wgtmac commented Aug 30, 2023

Thanks @dongjoon-hyun! I will start the work of v1.8.5 release preparation this weekend.

@dongjoon-hyun
Copy link
Member

@wgtmac Thank you so much always! Yes, it will be a meaningful release.

@dongjoon-hyun
Copy link
Member

Apache ORC website is updated first.

Screenshot 2023-08-30 at 9 35 00 AM

dongjoon-hyun pushed a commit that referenced this pull request Nov 5, 2023
### What changes were proposed in this pull request?

This pr is aimed at assigning a writer id to the CUDF.

### Why are the changes needed?

This helps to locate the writer of a specific orc file, and it also helps the reader to do some special reads for files created by different writers.

### How was this patch tested?

Added UT

Closes #1594 from guiyanakuang/ORC-1489.

Lead-authored-by: zhangyiqun <[email protected]>
Co-authored-by: Yiqun Zhang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 5d163d2)
Signed-off-by: Dongjoon Hyun <[email protected]>
cxzl25 pushed a commit to cxzl25/orc that referenced this pull request Jan 11, 2024
### What changes were proposed in this pull request?

This pr is aimed at assigning a writer id to the CUDF.

### Why are the changes needed?

This helps to locate the writer of a specific orc file, and it also helps the reader to do some special reads for files created by different writers.

### How was this patch tested?

Added UT

Closes apache#1594 from guiyanakuang/ORC-1489.

Lead-authored-by: zhangyiqun <[email protected]>
Co-authored-by: Yiqun Zhang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants