Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve configuration and resource use of MemoryManager and DiskManager #1668

Merged
merged 3 commits into from
Jan 25, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 24, 2022

Which issue does this PR close?

Resolves #1636

Rationale for this change

Creation of an RuntimeEnv creates a temporary directory, even when it is not used. This is unecessary per-query overhead

Also, since there is no way to share MemoryManger and DiskManagers across RuntimeEnv's it is not possible to get a "global" view of memory / disk use across multiple queries running concurrently.

Also, certain invalid memory configuration values will panic, rather than error when set incorrectly

What changes are included in this PR?

Changes:

  1. Add DiskManagerConfig and MemoryManagerConfig for configuring how disk and memory are managed
  2. Adds proper error checking (rather than asserts/ errors) to ensure reasonable values for memory config
  3. Allow re-using existing MemoryManager and DiskManager rather than always creating them new

Are there any user-facing changes?

Will be a change to anyone who was using the RuntimeConfig but since it was introduced recently I don't think that is very many

@alamb alamb added the api change Changes the API exposed to users of the crate label Jan 24, 2022
@github-actions github-actions bot added ballista datafusion Changes in the datafusion crate labels Jan 24, 2022
@@ -1057,6 +1062,42 @@ impl ExecutionConfig {
self.runtime = config;
self
}

/// Use an an existing [MemoryManager]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the new public API

Ok(())

#[tokio::test]
#[should_panic(expected = "invalid max_memory. Expected greater than 0, got 0")]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously such config errors would panic


let ctx2 = ExecutionContext::with_config(config);

assert!(std::ptr::eq(Arc::as_ptr(&memory_manager), Arc::as_ptr(&ctx1.runtime_env().memory_manager)));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the tests for the primary usecase we have in IOx -- sharing DiskManager and MemoryManagers across plans

@alamb alamb force-pushed the alamb/memory_and_disk_config branch from dfbd077 to 87c541a Compare January 24, 2022 18:19
@alamb
Copy link
Contributor Author

alamb commented Jan 24, 2022

FYI @tustvold and @yjshen

Copy link
Member

@yjshen yjshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great config design! I really like the introduced enum configs.

@alamb
Copy link
Contributor Author

alamb commented Jan 25, 2022

Thanks @yjshen -- I think since this basically only affects you and I for the time being I will update this PR and merge it in

@alamb alamb merged commit 7153fac into apache:master Jan 25, 2022
@alamb alamb removed the api change Changes the API exposed to users of the crate label Feb 10, 2022
@alamb alamb deleted the alamb/memory_and_disk_config branch August 8, 2023 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide RuntimeEnv to ExecutionContext
2 participants