Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make calculating spatial/datetime extents optional #143

Merged
merged 3 commits into from
Nov 28, 2023

Conversation

hrodmn
Copy link
Contributor

@hrodmn hrodmn commented Nov 22, 2023

As discussed in #139, Calculating the extent of spatial/datetime columns can be slow enough that serverless deployments of tipg are not possible.

Skipping the extent calculations is one way to make application startup almost instantaneous! With this change, the user can set TIPG_DB_SPATIAL_EXTENT=False and TIPG_DB_DATETIME_EXTENT=False to skip these calculations.

I am not sure if passing these arguments all the way through to pg_temp.tipg_properties is the best way to accomplish this, but I got it working. If there is a cleaner way that you can think of please let me know!

@vincentsarago vincentsarago requested review from bitner and vincentsarago and removed request for bitner November 22, 2023 10:40
@vincentsarago
Copy link
Member

Approach sounds 👌

Let's add some tests to validate 🙏

@hrodmn
Copy link
Contributor Author

hrodmn commented Nov 22, 2023

Approach sounds 👌

The one thing I was wondering is if we should skip spatial/temporal columns altogether instead of passing those variables through to that function, but it seems like we need other attributes from the spatial columns besides bounds so I did it this way.

Let's add some tests to validate 🙏

I added a few tests!

@vincentsarago
Copy link
Member

The one thing I was wondering is if we should skip spatial/temporal columns altogether instead of passing those variables through to that function, but it seems like we need other attributes from the spatial columns besides bounds so I did it this way.

I think it's fine to have them separate. Most people will have spatial index (hopefully) so them might want to avoid only the temporal calculation

@vincentsarago
Copy link
Member

Let's wait for @bitner 👍 👎

Copy link
Contributor

@bitner bitner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give my approval here, but I do have some concerns with going down this road.

  • Having the spatial / temporal extents will be required if (when) we implement tooling to create PgSTAC items from the catalog of TiPG layers.
  • This is basically kicking the can down the road. If appropriate indexes are not available on spatial/temporal columns, while the creation of the catalog may be faster, any usage of tables without those indexes is going to bog down and be incredibly slow anyway.

Currently, TiPG is stateless. It makes no changes (outside of the temporary schema) to the database at all and other than the CatalogUpdateMiddleware does not provide any caching or other state management within the application level.

In my opinion TiPG is meant to expose a Postgres/PostGIS database as-is and I don't think that TiPG should ever modify the structure of tables/functions it is exposing (ie renaming columns or adding indexes etc). I do think that in order to actually be scalable, we do need to enable options to manage/store state beyond the scope of a single instance. I do want to continue to have a version of TiPG that (similar to pg_featureserv) that does not touch the database at all - but, of course, has the caveat that there will be limitations to how well it can scale. I do think that it would be good to enable broadened ability to store cached state beyond the single instance by adding additional backends that can store state. Currently, the catalog creation on startup and the CatalogControlMiddleware just use the App State. In order to scale for something like a Serverless environment, we should allow for different ways to store this metadata - this could be storing it in a tipg schema within the database (or even possibly a different database), storing the catalog in an S3 bucket as json, and/or storing using something like Redis/Memcache.

I will note that when deploying TiPG in a serverless environment, you will most certainly want to make sure that you are using some type of connection pooling (ie pgbouncer) as without, it can be very very easy to hit the max_connections of the database, which if you do not have your database configured with appropriate statement_timeout or idle_in_transaction_timeout could fully lock your database up from any further use.

@vincentsarago vincentsarago merged commit 76956ab into developmentseed:main Nov 28, 2023
7 checks passed
@vincentsarago
Copy link
Member

I do think that in order to actually be scalable, we do need to enable options to manage/store state beyond the scope of a single instance.

@bitner I think tipg is almost in a good shape in order to support both case. I've started https://github.com/vincentsarago/tipg-aurora and https://github.com/vincentsarago/tipgstac to showcase how to use tipg with static functions.

tipg won't be able to support all the use case but I think right now we have a nice python package which people can use or extent to fit their use case

@hrodmn hrodmn deleted the extents-optional branch November 28, 2023 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants