Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist Performance #197

Open
1 task
ManApart opened this issue Mar 27, 2020 · 4 comments
Open
1 task

Persist Performance #197

ManApart opened this issue Mar 27, 2020 · 4 comments
Labels
performance This is a performance improvement service Involves a service application
Milestone

Comments

@ManApart
Copy link
Contributor

ManApart commented Mar 27, 2020

Given 10 million small messages, Persist processed them at around 2,000 messages a second. This is 1/5 the speed of receive. While Persist does not bottleneck the system, this does mean it will get backed up over time if it can't keep up with other parts of the system. This also could reflect on the number of presto workers etc

AC

  • Persist should be at least as fast as the next slowest part of the upstream system so that it does not back up over time

Tech Notes

  • This is probably more a presto tuning / adding presto workers more than a Persist thing
@jdenen jdenen added the performance This is a performance improvement label Mar 27, 2020
@jeffgrunewald
Copy link
Member

Persist shouldn’t be writing via presto at all anymore. It’s a direct write to S3 as json and then select * from json_stage to orc_permanent

@ManApart
Copy link
Contributor Author

Isn't it presto that's doing that table copy?

@jdenen
Copy link
Member

jdenen commented Mar 27, 2020

It is a presto query that moves staged data to the permanent table.

@jdenen jdenen added this to the v1.0.0 milestone Mar 27, 2020
@jessie-morris
Copy link

And per brian this happens for every batch. So while we don't write an insert statement that inserts 1MB of rows, we write 1MB of rows, then run an insert into select from of the staging table, which practically means presto is hit almost as many times per my understanding.

@jdenen jdenen added the service Involves a service application label Apr 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance This is a performance improvement service Involves a service application
Projects
None yet
Development

No branches or pull requests

4 participants