Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ineffeciencies when listing dataset flows via GraphQL query #856

Open
zaychenko-sergei opened this issue Sep 26, 2024 · 0 comments
Open
Labels
performance rust Pull requests that update Rust code

Comments

@zaychenko-sergei
Copy link
Contributor

A casual dataset flows view that lists about 10 flows runs for ~1.36s, and performs highly ineffecient repository access operations.
(see Grafana trace)

There are over 7000 spans, including numerous access to get_active_polling_source for the very same dataset (the only one). Internally this is causing a lot of metadata chain iteration activity, reading multiple S3 files, then re-using the cached version.

Possible solutions:

  • general improvement of SetPollingSource access (via database materialization or summary extensions)
  • improving how flow GraphQL objects are organized, so that dataset query is issued only once for N flows

In addition, the same trace in Grafana uncovered need in #850

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance rust Pull requests that update Rust code
Projects
None yet
Development

No branches or pull requests

1 participant