make calculating spatial/datetime extents optional #143

hrodmn · 2023-11-22T02:54:20Z

As discussed in #139, Calculating the extent of spatial/datetime columns can be slow enough that serverless deployments of tipg are not possible.

Skipping the extent calculations is one way to make application startup almost instantaneous! With this change, the user can set TIPG_DB_SPATIAL_EXTENT=False and TIPG_DB_DATETIME_EXTENT=False to skip these calculations.

I am not sure if passing these arguments all the way through to pg_temp.tipg_properties is the best way to accomplish this, but I got it working. If there is a cleaner way that you can think of please let me know!

vincentsarago · 2023-11-22T10:42:13Z

Approach sounds 👌

Let's add some tests to validate 🙏

hrodmn · 2023-11-22T12:00:48Z

Approach sounds 👌

The one thing I was wondering is if we should skip spatial/temporal columns altogether instead of passing those variables through to that function, but it seems like we need other attributes from the spatial columns besides bounds so I did it this way.

Let's add some tests to validate 🙏

I added a few tests!

vincentsarago · 2023-11-27T16:26:49Z

The one thing I was wondering is if we should skip spatial/temporal columns altogether instead of passing those variables through to that function, but it seems like we need other attributes from the spatial columns besides bounds so I did it this way.

I think it's fine to have them separate. Most people will have spatial index (hopefully) so them might want to avoid only the temporal calculation

vincentsarago · 2023-11-27T17:22:04Z

Let's wait for @bitner 👍 👎

bitner

I'll give my approval here, but I do have some concerns with going down this road.

Having the spatial / temporal extents will be required if (when) we implement tooling to create PgSTAC items from the catalog of TiPG layers.
This is basically kicking the can down the road. If appropriate indexes are not available on spatial/temporal columns, while the creation of the catalog may be faster, any usage of tables without those indexes is going to bog down and be incredibly slow anyway.

Currently, TiPG is stateless. It makes no changes (outside of the temporary schema) to the database at all and other than the CatalogUpdateMiddleware does not provide any caching or other state management within the application level.

In my opinion TiPG is meant to expose a Postgres/PostGIS database as-is and I don't think that TiPG should ever modify the structure of tables/functions it is exposing (ie renaming columns or adding indexes etc). I do think that in order to actually be scalable, we do need to enable options to manage/store state beyond the scope of a single instance. I do want to continue to have a version of TiPG that (similar to pg_featureserv) that does not touch the database at all - but, of course, has the caveat that there will be limitations to how well it can scale. I do think that it would be good to enable broadened ability to store cached state beyond the single instance by adding additional backends that can store state. Currently, the catalog creation on startup and the CatalogControlMiddleware just use the App State. In order to scale for something like a Serverless environment, we should allow for different ways to store this metadata - this could be storing it in a tipg schema within the database (or even possibly a different database), storing the catalog in an S3 bucket as json, and/or storing using something like Redis/Memcache.

I will note that when deploying TiPG in a serverless environment, you will most certainly want to make sure that you are using some type of connection pooling (ie pgbouncer) as without, it can be very very easy to hit the max_connections of the database, which if you do not have your database configured with appropriate statement_timeout or idle_in_transaction_timeout could fully lock your database up from any further use.

vincentsarago · 2023-11-28T11:31:12Z

I do think that in order to actually be scalable, we do need to enable options to manage/store state beyond the scope of a single instance.

@bitner I think tipg is almost in a good shape in order to support both case. I've started https://github.com/vincentsarago/tipg-aurora and https://github.com/vincentsarago/tipgstac to showcase how to use tipg with static functions.

tipg won't be able to support all the use case but I think right now we have a nice python package which people can use or extent to fit their use case

make calculating spatial/datetime extents optional

188e54d

vincentsarago requested review from bitner and vincentsarago and removed request for bitner November 22, 2023 10:40

add test fixtures and tests

d2993da

properly test datetime extents

c870158

vincentsarago approved these changes Nov 27, 2023

View reviewed changes

bitner approved these changes Nov 27, 2023

View reviewed changes

vincentsarago merged commit 76956ab into developmentseed:main Nov 28, 2023
7 checks passed

hrodmn deleted the extents-optional branch November 28, 2023 12:11

hrodmn mentioned this pull request Jan 11, 2024

use spatial_extent and datetime_extent everywhere #164

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make calculating spatial/datetime extents optional #143

make calculating spatial/datetime extents optional #143

hrodmn commented Nov 22, 2023

vincentsarago commented Nov 22, 2023

hrodmn commented Nov 22, 2023

vincentsarago commented Nov 27, 2023

vincentsarago commented Nov 27, 2023

bitner left a comment

vincentsarago commented Nov 28, 2023

make calculating spatial/datetime extents optional #143

make calculating spatial/datetime extents optional #143

Conversation

hrodmn commented Nov 22, 2023

vincentsarago commented Nov 22, 2023

hrodmn commented Nov 22, 2023

vincentsarago commented Nov 27, 2023

vincentsarago commented Nov 27, 2023

bitner left a comment

Choose a reason for hiding this comment

vincentsarago commented Nov 28, 2023