-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the impact of block methods on tokio runtime #1335
Comments
I think we can completely resolve this issue by not using DataFusion's catalog list |
Looks like there is already an async version of catalog in datafusion Maybe we need to upgrade datafusion. |
They are already available in our deps |
@killme2008 now datafusion only has async version of getting table. we need other methods to be async too, like iterating tables names. |
Got it, i think we can submit an issue to datafusion for it. |
Well, seeing the discussions in apache/datafusion#3777 , I don’t think the async version of catalog list could be easily done, too many breaking changes to APIs including the major one, SessionContext.
I think the best strategy is to get rid of the table management of datafusion. We can just use datafusion as a pure query engine.
Btw, datafusion itself also realize this issue, see apache/datafusion#5291
… 2023年4月6日 11:34,dennis zhuang ***@***.***> 写道:
@killme2008 <https://github.com/killme2008> now datafusion only has async version of getting table. we need other methods to be async too, like iterating tables names.
Got it, i think we can submit an issue to datafusion for it.
—
Reply to this email directly, view it on GitHub <#1335 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHR2DYCOE63NLHHHWVKVYTW7Y2UHANCNFSM6AAAAAAWUZL7AU>.
You are receiving this because you commented.
|
We could cache the catalog data and refresh them in the async context. I remember that @v0y4g3r has some ideas on it. |
Ballista shares the same idea, it has a “refresh” method that update all catalogs cached in local. However, the “refresh” is an all-to-all comparison, not efficient. If we were to implement the same refreshment, I think we have to do the same: fetching all catalogs from remote and then compare(or replace) with local cache. Not good.
… 2023年4月6日 12:13,Yingwen ***@***.***> 写道:
We could cache the catalog data and refresh them in the async context. I remember that @v0y4g3r <https://github.com/v0y4g3r> has some ideas on it.
—
Reply to this email directly, view it on GitHub <#1335 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHR2D6GIMCTC4SF5GGO7YLW7Y7FJANCNFSM6AAAAAAWUZL7AU>.
You are receiving this because you commented.
|
Anyway, a cache is necessary if we don't want to issue a remote call to the meta each time we access the remote catalog. But I think we don't need to do an We can use different cache policies for catalogs, schemas, tables, and tables' metadata. e.g.
|
|
What type of enhancement is this?
Performance
What does the enhancement do?
We have some scenarios where we need to call an asynchronous method in a synchronous method in tokio runtime.
example: https://github.com/GreptimeTeam/greptimedb/blob/develop/src/catalog/src/remote/manager.rs#L532
At present, our solution is to create a new thread every time a synchronous method is called, and then call block_on on this thread to execute the asynchronous method.
There are two problems:
There are two ideas:
Implementation challenges
No response
The text was updated successfully, but these errors were encountered: