Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geoprojbase: do not allocate projections with a static map #63969

Closed
otan opened this issue Apr 21, 2021 · 11 comments · Fixed by #74126
Closed

geoprojbase: do not allocate projections with a static map #63969

otan opened this issue Apr 21, 2021 · 11 comments · Fixed by #74126
Assignees
Labels
A-spatial Spatial work that is *not* related to builtins. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-spatial Spatial Team

Comments

@otan
Copy link
Contributor

otan commented Apr 21, 2021

Currently https://github.com/cockroachdb/cockroach/blob/master/pkg/geo/geoprojbase/projections.go is a ~99k line file that generates ~10MB to the release binary.

We can reduce this by storing this metadata as a compressed string, which can be initialized on the first instantiation of Projection. Alternatively, we should probahly use go:embed now that that's out.

Potentially a good first issue, but a meaty one.

@otan otan added the A-spatial Spatial work that is *not* related to builtins. label Apr 21, 2021
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Apr 21, 2021
@otan otan added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 21, 2021
craig bot pushed a commit that referenced this issue Apr 26, 2021
63970: geoprojbase: refactor everything to use the Projection abstraction r=sumeerbhola a=otan

Previously, we use the projections map directly, which violates
abstraction bounds as we probably want to keep that map private. We
remedy this by moving everything behind the a Projection function
abstraction which prevents this leakage.

Furthermore, make Projection return an error directly. This is marked,
so we are able to reword error messages in other places.

Makes progress on #63969

Release note: None

Co-authored-by: Oliver Tan <[email protected]>
@jlinder jlinder added the T-spatial Spatial Team label Jun 16, 2021
@RaduBerinde
Copy link
Member

This shouldn't be too hard since we can just modify the template to generate whatever we want. Where does one find the srids.csv file that was used? (feels like it should be checked in with generate-spatial-ref-sys)

@otan
Copy link
Contributor Author

otan commented Dec 20, 2021

Where does one find the srids.csv file that was used?

would've been smart but may have been lost to time. may have to generate anything off the existing map :|

@RaduBerinde
Copy link
Member

Ah, too bad. Should I nuke cmd/generate-spatial-ref-sys/ then?

@otan
Copy link
Contributor Author

otan commented Dec 20, 2021

actually, you should be able to do copy spatial_ref_sys to '/tmp/srids.csv' DELIMITER ';' CSV; (don't think you need HEADER) in psql after CREATE EXTENSION POSTGIS;

@RaduBerinde
Copy link
Member

I was able to generate a json from the hardcoded constants, but it would be nice if we could generate it directly from the csv. I don't have PostGIS installed and having some trouble making it work, I asked for some help with that on the spatial slack.

@RaduBerinde RaduBerinde self-assigned this Dec 21, 2021
@RaduBerinde
Copy link
Member

Unfortunately, the cockroach binary links in randgen which has some global constants that use geo.MustParseGeography. I was hoping we wouldn't even decode this stuff until and unless geospatial stuff is being used.

@otan
Copy link
Contributor Author

otan commented Dec 21, 2021

csv: https://gist.github.com/otan/73f201cf89aeec458db6e19957567ba6

which has some global constants that use geo.MustParseGeography

:( yeah ...

i mean if you really wanted to, we could make Projection lazily load the embedded file and make a variation of MustParseGeography that doesn't check the SRID exists in the projection map.

@RaduBerinde
Copy link
Member

Thank you! I don't think randgen should be linked at all, hopefully we'll fix as part of #74120. If we really need it, we should move all those datum creations under an initialization function.

@RaduBerinde
Copy link
Member

RaduBerinde commented Dec 21, 2021

csv: https://gist.github.com/otan/73f201cf89aeec458db6e19957567ba6

This results in about 20% more stuff (I'm guessing it's from a newer version). I will move forward with the data that we have already, but I will include the updated generate-spatial-ref-sys which can be used to generate an updated file.

@otan
Copy link
Contributor Author

otan commented Dec 21, 2021

more stuff is ok imo, more in line with postgis. but your call!

@RaduBerinde
Copy link
Member

Sure, but that should be separate, I don't want the data and the mechanism to change together.

craig bot pushed a commit that referenced this issue Dec 21, 2021
73941: roachtest: make crdb crash on span-use-after-Finish r=andreimatei a=andreimatei

This patch makes roachtest pass an env var to crdb asking it to panic on
mis-use of tracing spans. I've been battling such bugs, which become
more problematic as I'm trying to introduce span reuse. In production
we'll probably continue tolerating such bugs for the time being, but I
want tests to yell. Unit tests are already running with this
use-after-Finish detection, and so far so good. I've done a manual run
of all the roachtests in this configuration and nothing crashed, so I
don't expect a tragedy.

Release note: None

74109: kv: rename gcQueue to mvccGCQueue r=tbg a=nvanbenschoten

This commit renames the "GC queue" to the "MVCC GC queue" (which GC's old MVCC
versions) to avoid confusion with the "replica GC queue" (which GC's abandoned
replicas). We've already been using this terminology in various other contexts
to avoid confusion, so this refactor updates the code to reflect this naming.

This comes in response to #73838, which found a bug that had survived for three
years and was a direct consequence of this ambiguous naming.

The commit doesn't go quite as far as renaming the `pkg/kv/kvserver/gc` package,
but that could be a follow-up to this commit.

74126: geo: move projection data to embedded compressed file r=RaduBerinde a=RaduBerinde

The geoprojbase package embeds projection info as constants, leading
to a 6MB code file. Large code files are undesirable especially from
the perspective of static analysis tools, IDEs, etc.

This change moves the projections data to an embedded json.gz file. We
define the schema of this file in a new `embeddedproj` subpackage. The
data is loaded lazily.

The data file was obtained by modifying the existing constants to fill
out an `embeddedproj.Data`:
https://github.com/RaduBerinde/cockroach/blob/geospatial-proj-data/pkg/geo/geoprojbase/embeddedproj/data_test.go

The `generate-spatial-ref-sys` command is also updated to generate
this file from the `.csv`.

The `make buildshort` binary size is decreased by ~7MB.

Fixes #63969.

Release note: None

74128: cockroach: don't import randgen in binary r=RaduBerinde a=RaduBerinde

The `sql/randgen` package creates a lot of global datums, some of
which use geospatial and require the loading of geospatial data. This
package is meant for testing and should not be part of the cockroach
binary.

This change removes the non-testing uses of randgen.

Tested via `go list -deps ./pkg/cmd/cockroach`. Note that the updated
test is ineffectual for now (tracked by #74119).

Informs #74120.

Release note: None

74159: sql: default index recommendations to be off for logic tests  r=nehageorge a=nehageorge

**sql: refactor GlobalDefault for session variables**

This commit refactors pkg/sql/vars.go to use globalFalse and
globalTrue as the setting GlobalDefault where possible.

Release note: None

**sql: default index recommendations to be off for logic tests**

This commit configures index recommendations to be off for logic tests.
This is to avoid flaky tests, as the index recommendation output can
vary depending on the best plan chosen by the optimizer.

Fixes: #74069.

Release note: None

Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
Co-authored-by: Neha George <[email protected]>
@craig craig bot closed this as completed in 5ea5ef4 Dec 21, 2021
gustasva pushed a commit to gustasva/cockroach that referenced this issue Jan 4, 2022
The geoprojbase package embeds projection info as constants, leading
to a 6MB code file. Large code files are undesirable especially from
the perspective of static analysis tools, IDEs, etc.

This change moves the projections data to an embedded json.gz file. We
define the schema of this file in a new `embeddedproj` subpackage. The
data is loaded lazily.

The data file was obtained by modifying the existing constants to fill
out an `embeddedproj.Data`:
https://github.com/RaduBerinde/cockroach/blob/geospatial-proj-data/pkg/geo/geoprojbase/embeddedproj/data_test.go

The `generate-spatial-ref-sys` command is also updated to generate
this file from the `.csv`.

The `make buildshort` binary size is decreased by ~7MB.

Fixes cockroachdb#63969.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-spatial Spatial work that is *not* related to builtins. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-spatial Spatial Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants