Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importing planet.pbf postgres optimisations n specs #38

Closed
peldhose opened this issue Feb 7, 2018 · 16 comments
Closed

importing planet.pbf postgres optimisations n specs #38

peldhose opened this issue Feb 7, 2018 · 16 comments

Comments

@peldhose
Copy link

peldhose commented Feb 7, 2018

Hi ,
can somebody explain performance of tegola-osm to perform import of planet.pbf to postgres database ??
how much time it may take for plannet data to push into postgres??

@ARolek
Copy link
Member

ARolek commented Feb 7, 2018

In my experience, with a pretty beefy AWS compute optimized instance (I don't recall how big, we should document this) with a high IOPS drive (the process is disk write intensive) we were able to get the data from the planent.pbf file into PostGIS with the generalized tables and the indexes in about 12-14 hours. It does depend on your config file but that's using the one we have documented in this repo.

@peldhose
Copy link
Author

peldhose commented Feb 8, 2018

thankyou @ARolek for these info . .

@peldhose
Copy link
Author

peldhose commented Feb 8, 2018

hey @ARolek ,
we imported planet osm within 11.30 hours around with a Google cloud server
spec : 8 vCPU, 30GB RAM and 1TB ( 400GB only used )
Awesome performance
Thank you for an simple awesome tileserver Opensource

@peldhose
Copy link
Author

peldhose commented Feb 8, 2018

is there any redis support for better caching of tiles ? or any workaround to implement ???

@ARolek
Copy link
Member

ARolek commented Feb 8, 2018

@peldhose nice! thanks for the performance numbers. I going to add those to the Readme.

tegola does not currently have Redis support. For caching tegola currently supports s3 and writing to a filesystem. Implementing a Redis cache would not be too difficult though. You can open an issue on the tegola repo for the request. If you're interested in tackling the implementation I can give you a tour of the way the Cacher interface works. Fairly straight forward.

@peldhose
Copy link
Author

peldhose commented Feb 13, 2018

WOW ... Thanks bro. awesome..
Yes of course, I need to track redis caching implementation ( #40 ).Also, i would like to inform you that tegola performs FAR far better than Mapzen(tilezen) in all ways.Also like the way customizing layers using toml conf file.
Mapzen took around 2.5days wit 8vcpu n 64GB ram to push entire data into database
Meanwhile, tegola-osm took only 11.30 hrs with 8vcpu +30GB ram to push same data into database.
also, Mazen is too slow for delivering tiles even with above spec server.They tried to solve this by giving 2 levels of caching (tilequeue s3,redis) but i don't think that is the right solution to that.

@ARolek
Copy link
Member

ARolek commented Feb 14, 2018

@peldhose thanks for the positive feedback! We have plans to make tegola even faster. v0.6.0 is close to release which also comes with several rendering improvements. You can watch the tegola repo as new versions are released. We're going to need some help testing the pre-release if you're interested. ;-)

Thanks for chiming in with your import results.

@ARolek ARolek closed this as completed Feb 14, 2018
@adamakhtar
Copy link
Contributor

Hi @ARolek

As part of my PR to enhance the docs #60 I wanted to expand on the "How long does it take to import" section. I found this issue and noticed your comment above:

In my experience, with a pretty beefy AWS compute optimized instance (I don't recall how big, we should document this) with a high IOPS drive (the process is disk write intensive) we were able to get the data from the planent.pbf file into PostGIS with the generalized tables and the indexes in about 12-14 hours. It does depend on your config file but that's using the one we have documented in this repo.

What do you consider to be a high IOPS on AWS? Can you give a figure?
Would a high IOPS be required for both the RDS instance and the server's main volume?
Typically how space does the DB need?
And how much does the main volume need (for Imposm3 to prepare the data)?

If you can remember I'll update the PR.

@ARolek
Copy link
Member

ARolek commented Jul 21, 2020

@adamakhtar I don't recall how high the IOPS were, I just remember I provisioning the IOPS. Ideally, you would have a high IOPS volume in your database as well.

Typically how space does the DB need?
The database will need about 160GB if you don't use the import schema and just deploy production. If you want to use the import schema then you will need around 320GB, so I would suggest around 400GB to provide some padding.

And how much does the main volume need (for Imposm3 to prepare the data)?
I don't recall this either, it's less than the database requirements though as it does not have the generalized tables or the indexes.

@adamakhtar
Copy link
Contributor

adamakhtar commented Apr 4, 2022

I just tried to do an import with the suggestions made in the above comments and it went really slow. I'm troubleshooting now and will share what I have currently found in case anybody else is looking for ideal specs.

Imposm3's README states the following:

It's recommended that the memory size of the server is roughly twice the size of the PBF extract you are importing. For example: You should have 64GB RAM or more for a current (2017) 36GB planet file, 8GB for a 4GB regional extract, etc. Imports without SSDs will take longer.

So it seems if you choose a server will too little ram compared to your pbf's file size you are going to be bottlenecked by IO. A full planet pbf is now around 53gb, about 60~70% bigger than what it was at the time the above comments were made, so 100gb of ram now seems to be the recommendation.

I'll try again with more memory in a few days.

@ARolek
Copy link
Member

ARolek commented Apr 4, 2022

@adamakhtar how slow was the planet import for you?

@adamakhtar
Copy link
Contributor

@ARolek 12 hours later I was only 2% into the read phase. I assumed at that rate I was looking at least 48 hours to complete so I aborted.

You can see my full server spec, htop and imposm output here https://gis.stackexchange.com/questions/427821/steps-to-troubleshoot-slow-imposm-performance

Unfortunately I didn't consider IO to be the bottleneck at the time so never checked it but I'm assuming it's the problem.

I'll try again closer to the weekend but this time go for 16cpu and 124gb server.

@ARolek
Copy link
Member

ARolek commented Apr 5, 2022

12 hours later I was only 2% into the read phase. I assumed at that rate I was looking at least 48 hours to complete so I aborted.

Wow that's insanely slow. Is this on the M1 mac? I wonder if Rosetta is being used. In my experience the x86 virtualization on the M1 is very slow.

@adamakhtar
Copy link
Contributor

@ARolek no this was on an EC2 Intel Xeon instance (c6i.4xlarge) with 16 vCPU, 32 GB mem and 1000GB SSD storage.

I can only assume the 32gb ram was not enough for the 53gb planet data size and so IO became the bottleneck. I'll try again in a couple of days and will let you know how the rerun goes.

@ARolek
Copy link
Member

ARolek commented Apr 6, 2022

@adamakhtar ok wow, that seems really odd. I will give this a run soon too. What version of imposm are you using?

@adamakhtar
Copy link
Contributor

@ARolek I'm using version 0.11.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants