Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2650][SQL] Try to partially fix SPARK-2650 by adjusting initial buffer size and reducing memory allocation #1769

Closed
wants to merge 2 commits into from

Conversation

liancheng
Copy link
Contributor

JIRA issue: SPARK-2650

Please refer to comments of SPARK-2650 for some other details.

This PR adjusts the initial in-memory columnar buffer size to 1MB, same as the default value of Shark's shark.column.partitionSize.mb property when running in local mode. Will add Shark style partition size estimation in another PR.

Also, before this PR, NullableColumnBuilder copies the whole buffer to add the null positions section, and then CompressibleColumnBuilder copies and compresses the buffer again, even if compression is disabled (PassThrough compression scheme is used to disable compression). In this PR the first buffer copy is eliminated to reduce memory consumption.

@SparkQA
Copy link

SparkQA commented Aug 4, 2014

QA tests have started for PR 1769. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17866/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 4, 2014

QA tests have started for PR 1769. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17868/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 4, 2014

QA results for PR 1769:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17866/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 4, 2014

QA results for PR 1769:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17868/consoleFull

@marmbrus
Copy link
Contributor

marmbrus commented Aug 6, 2014

Thanks, merged into master and 1.1.

asfgit pushed a commit that referenced this pull request Aug 6, 2014
…l buffer size and reducing memory allocation

JIRA issue: [SPARK-2650](https://issues.apache.org/jira/browse/SPARK-2650)

Please refer to [comments](https://issues.apache.org/jira/browse/SPARK-2650?focusedCommentId=14084397&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14084397) of SPARK-2650 for some other details.

This PR adjusts the initial in-memory columnar buffer size to 1MB, same as the default value of Shark's `shark.column.partitionSize.mb` property when running in local mode. Will add Shark style partition size estimation in another PR.

Also, before this PR, `NullableColumnBuilder` copies the whole buffer to add the null positions section, and then `CompressibleColumnBuilder` copies and compresses the buffer again, even if compression is disabled (`PassThrough` compression scheme is used to disable compression). In this PR the first buffer copy is eliminated to reduce memory consumption.

Author: Cheng Lian <[email protected]>

Closes #1769 from liancheng/spark-2650 and squashes the following commits:

88a042e [Cheng Lian] Fixed method visibility and removed dead code
001f2e5 [Cheng Lian] Try fixing SPARK-2650 by adjusting initial buffer size and reducing memory allocation

(cherry picked from commit d0ae3f3)
Signed-off-by: Michael Armbrust <[email protected]>
@asfgit asfgit closed this in d0ae3f3 Aug 6, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
…l buffer size and reducing memory allocation

JIRA issue: [SPARK-2650](https://issues.apache.org/jira/browse/SPARK-2650)

Please refer to [comments](https://issues.apache.org/jira/browse/SPARK-2650?focusedCommentId=14084397&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14084397) of SPARK-2650 for some other details.

This PR adjusts the initial in-memory columnar buffer size to 1MB, same as the default value of Shark's `shark.column.partitionSize.mb` property when running in local mode. Will add Shark style partition size estimation in another PR.

Also, before this PR, `NullableColumnBuilder` copies the whole buffer to add the null positions section, and then `CompressibleColumnBuilder` copies and compresses the buffer again, even if compression is disabled (`PassThrough` compression scheme is used to disable compression). In this PR the first buffer copy is eliminated to reduce memory consumption.

Author: Cheng Lian <[email protected]>

Closes apache#1769 from liancheng/spark-2650 and squashes the following commits:

88a042e [Cheng Lian] Fixed method visibility and removed dead code
001f2e5 [Cheng Lian] Try fixing SPARK-2650 by adjusting initial buffer size and reducing memory allocation
@liancheng liancheng deleted the spark-2650 branch September 24, 2014 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants