Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the impact of block methods on tokio runtime #1335

Closed
fengys1996 opened this issue Apr 6, 2023 · 10 comments · Fixed by #1454
Closed

Reduce the impact of block methods on tokio runtime #1335

fengys1996 opened this issue Apr 6, 2023 · 10 comments · Fixed by #1454
Labels
C-enhancement Category Enhancements
Milestone

Comments

@fengys1996
Copy link
Contributor

What type of enhancement is this?

Performance

What does the enhancement do?

We have some scenarios where we need to call an asynchronous method in a synchronous method in tokio runtime.

example: https://github.com/GreptimeTeam/greptimedb/blob/develop/src/catalog/src/remote/manager.rs#L532

At present, our solution is to create a new thread every time a synchronous method is called, and then call block_on on this thread to execute the asynchronous method.

There are two problems:

  1. Every time a synchronous method is called, a thread will be created. It is too expensive.
  2. The execution time of the synchronous method is too long, which may cause the tokio runtime blocking.

There are two ideas:

  1. use tokio::task::block_in_place, But it doesn't work with the current_thread runtime.
  2. use a thread pool, but only problem one is solved, problem 2 is not resolved.

Implementation challenges

No response

@fengys1996 fengys1996 added the C-enhancement Category Enhancements label Apr 6, 2023
Copy link
Collaborator

I think we can completely resolve this issue by not using DataFusion's catalog list

@killme2008
Copy link
Contributor

Looks like there is already an async version of catalog in datafusion

apache/datafusion#3777

apache/datafusion#4607

Maybe we need to upgrade datafusion.

@waynexia
Copy link
Member

waynexia commented Apr 6, 2023

Maybe we need to upgrade datafusion.

They are already available in our deps

@MichaelScofield
Copy link
Collaborator

@killme2008 now datafusion only has async version of getting table. we need other methods to be async too, like iterating tables names.

@killme2008
Copy link
Contributor

@killme2008 now datafusion only has async version of getting table. we need other methods to be async too, like iterating tables names.

Got it, i think we can submit an issue to datafusion for it.

@MichaelScofield
Copy link
Collaborator

MichaelScofield commented Apr 6, 2023 via email

@evenyag
Copy link
Contributor

evenyag commented Apr 6, 2023

We could cache the catalog data and refresh them in the async context. I remember that @v0y4g3r has some ideas on it.

@MichaelScofield
Copy link
Collaborator

MichaelScofield commented Apr 6, 2023 via email

@evenyag
Copy link
Contributor

evenyag commented Apr 11, 2023

Anyway, a cache is necessary if we don't want to issue a remote call to the meta each time we access the remote catalog. But I think we don't need to do an all-to-all comparison, we only need to replace the local cache with the remote data.

We can use different cache policies for catalogs, schemas, tables, and tables' metadata. e.g.

  • For catalogs, schemas, and tables, we update them periodically
  • For metadata of tables, we can use a fixed capacity cache (with TTL?) and update the individual table entry on a cache miss

@fengjiachun
Copy link
Collaborator

metasrv has push capability, which can achieve incremental updates of cache based on push.

@fengjiachun fengjiachun added this to the v0.3 milestone Apr 12, 2023
@v0y4g3r v0y4g3r mentioned this issue Apr 25, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category Enhancements
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants