Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a bundled flag for gdal-sys #517

Merged
merged 20 commits into from
Sep 25, 2024

Conversation

weiznich
Copy link
Contributor

This commit introduces a new gdal-src crate which bundles gdal and builds a minimal version from source via build.rs

Fixes #465

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated


[patch.crates-io]
proj-sys = { git = "https://github.com/GiGainfosystems/proj", rev = "54716dd8955d4f0561ce9bf8a83610b605e3c007" }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently relays on a non-accepted patch to proj-sys: georust/proj#190

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 27c2a5e which changes this to use the master branch of proj-sys instead as the relevant patch is now merged there. There is still no release that contains this change yet.

Copy link
Contributor Author

@weiznich weiznich May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we still need georust/proj#192

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's finally released with proj-sys 0.24 🎉

.define("BUILD_GMOCK", "OFF")
.define("PROJ_INCLUDE_DIR", format!("{proj_root}/include"))
.define("PROJ_LIBRARY", format!("{proj_root}/lib/{proj_lib}"))
// enable the gpkg driver
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I require the geopackage driver for my usecase that's why I added these configurations. In the end we already depend on libsqlite3 via proj, so that should hopefully be fine.

That written: Anyone that tries to actually use that driver to create a geopackage dataset will likely run into: OSGeo/gdal#9135 (which hopefully will be fixed soon upstream, so it might be worth to pull in the fix into the bundled version afterwards.)

@weiznich weiznich marked this pull request as draft January 26, 2024 07:59
@weiznich
Copy link
Contributor Author

Marked as draft as I need to perform a final set of tests on MacOS and Windows.

gdal-sys/Cargo.toml Outdated Show resolved Hide resolved
.gitmodules Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
@@ -1,6 +1,7 @@
# Changes

## Unreleased
- Add a `bundled` feature for `gdal-sys` that allows to build and statically link a minimal bundled version of gdal during `cargo build`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably mention that it only supports SQLite, GPKG, GeoTIFF, some other less popular formats, and probably no compression. I'm afraid that users will see this and expect it to work for COG, JPEG2000, NetCDF and so on.

Also see https://www.mail-archive.com/[email protected]/msg40172.html.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following 19 drivers are supported by this configuration according to DriverManager::all():

Virtual Raster
Derived datasets using VRT pixel functions
GeoTIFF
Cloud optimized GeoTIFF generator
Erdas Imagine Images (.img)
In Memory Raster
Geographic Network generic file based model
Geographic Network generic DB based model
ESRI Shapefile
MapInfo File
VRT - Virtual Datasource
Memory
Keyhole Markup Language (KML)
GeoJSON
GeoJSON Sequence
ESRIJSON
TopoJSON
GeoPackage
SQLite / Spatialite

"libproj.a"
};

let res = cmake::Config::new("source")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone wondering, cmake automatically sets CMAKE_BUILD_TYPE.

@lnicola
Copy link
Member

lnicola commented Jan 26, 2024

This looks all right, but as I mentioned in the related issue, I'm worried it's only going to be useful to you.

GDAL is pretty large, and its users might have a lot of expectations, but a version that only supports GPKG is going to be very surprising. Even your use case might be better served by https://docs.rs/geozero.

CC @rouault long-term, how do you feel about ExternalProject (SuperBuild)?

@weiznich weiznich force-pushed the feature/bundled_build branch from a30248e to 1803cfd Compare January 26, 2024 09:16
@weiznich
Copy link
Contributor Author

This looks all right, but as I mentioned in the related issue, I'm worried it's only going to be useful to you.

GDAL is pretty large, and its users might have a lot of expectations, but a version that only supports GPKG is going to be very surprising. Even your use case might be better served by https://docs.rs/geozero.

I agree with the point that users might have expectations that are not covered (yet) by the proposed bundling support, but I disagree with with the point that it only supports GPKG and therefore is not useful. See that comment for the list of supported drives. That list contains a few more useful drivers like that one for shapefiles, geojson, geotiff or kml. In addition gdal is more than just a tool that provides support for certain file formats. It also exposes other functionality that is helpful in certain situations, like support for spatial projections.

As for other drivers: It is certainly possible to enable other drivers as well. That mostly requires going through the list of supported drivers and classify them if they need external dependencies or not. The later ones can easily be enabled, while for the former ones more work is required. Depending on the required external dependency that might require adding another sys crate as dependency and enabling the bundling support there (like for example for netcdf, which already has bundling support in rust) or it might require writing that bundling support for that other sys crate first. I figured out to start with a minimal subset to have something working first.

Even your use case might be better served by https://docs.rs/geozero.

Trust me if I say that no geozero won't solve my use-case as it has to much assumptions around which geometry types exist and how they should be handled for my use-case. (It's specifically not about translating from one geometry format to another).

@lnicola
Copy link
Member

lnicola commented Jan 26, 2024

Yes, I think it's all right. We should probably document our features anyway, and that's a good place where we can explain the limitations of the bundled GDAL.

@weiznich
Copy link
Contributor Author

I just checked the documentation. GDAL already provides a list of dependencies for each driver:

https://gdal.org/drivers/vector/index.html
https://gdal.org/drivers/raster/index.html

Anything that lists "Built-in by default" as dependency can probably just be enabled without problems.

I also know that bundling support exists for the following other dependencies:

So yes, it's probably possible to just enable most of the expected drives, although I personally would prefer having all the native dependencies behind separate feature flags.

@weiznich weiznich force-pushed the feature/bundled_build branch from 1803cfd to 1315e16 Compare January 26, 2024 09:45
@rouault
Copy link
Contributor

rouault commented Jan 26, 2024

CC @rouault long-term, how do you feel about ExternalProject (SuperBuild)?

I miss some contextual elements to understand your question

@lnicola
Copy link
Member

lnicola commented Jan 26, 2024

If you're not familiar with it, ExternalProject is a CMake component that can download libraries, apply patches, then link your project with them. So it can be useful for people who want a custom build but don't want to build every dependency manually.

If GDAL adds that in the future, we can integrate with it without creating Rust libraries for that bundle the source code of the GDAL dependencies.

If it doesn't, that's fine, people can still use GeoTIFF, GPKG, SHP and a a few other formats with the approach here.

@rouault
Copy link
Contributor

rouault commented Jan 26, 2024

ok, I see. That's something I've considered, but I'd be hesitant to go into that business, as it equates to creating yet another packaging system, and for GDAL, with all its transitive dependencies, that can mean ~ 100 libraries for a full build... That said I do recognize that might be a recurring need for people in "unstandard" environments (Android, WASM, etc.) to have to rebuild the whole world. I'm not sure if that would belong to the OSGeo/GDAL repo itself, or if it might be a side repository where people from different teams can collaborate, and be independent of GDAL development itself. Might be worth raising the idea on gdal-dev.

@lnicola
Copy link
Member

lnicola commented Jan 26, 2024

Yeah, that makes sense. You've already got a full plate with GDAL and the other libraries you're maintaining, you don't need to get into a whole new packaging system unless you really want to.

It would only make a difference for this PR if you were already planning to add it.

@weiznich weiznich force-pushed the feature/bundled_build branch from 1315e16 to 8db29ac Compare January 30, 2024 08:27
Cargo.toml Outdated Show resolved Hide resolved
@weiznich weiznich force-pushed the feature/bundled_build branch from 8db29ac to e1d7faf Compare January 30, 2024 08:30
cmake = "0.1.50"

[features]
default = []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's up to discussion which drivers should be build by default as cargo feature flags. We cannot easily say build everything and allow users to opt out as cargo feature flags are additive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess NetCDF and Zarr could be interesting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zarr driver so currently not not supported by the bundling. It would require figuring out how to bundle all the dependencies for that driver as well.

Netcdf is supported, as there is already a rust crate which provides bundling support and exposes the relevant information. Therefore it should be possible to enable it by default.

@weiznich
Copy link
Contributor Author

I've pushed a new version of the build script that adds feature flags for a lot of the built in drivers. If I enable all of them DriverManager::all reports support for the following 162 drivers:

List with 162 driver names
Virtual Raster
Derived datasets using VRT pixel functions
GeoTIFF
Cloud optimized GeoTIFF generator
Erdas Imagine Images (.img)
CEOS SAR Image
CEOS Image
JAXA PALSAR Product Reader (Level 1.1/1.5)
Ground-based SAR Applications Testbed File Format (.gff)
ELAS
Arc/Info Binary Grid
Arc/Info ASCII Grid
GRASS ASCII Grid
International Service for the Geoid
SDTS Raster
DTED Elevation Raster
Portable Network Graphics
JPEG JFIF
In Memory Raster
Japanese DEM (.mem)
Graphics Interchange Format (.gif)
Graphics Interchange Format (.gif)
Envisat Image Format
Maptech BSB Nautical Charts
X11 PixMap Format
MS Windows Device Independent Bitmap
SPOT DIMAP
AirSAR Polarimetric Image
RadarSat 2 XML Product
Sentinel-1 SAR SAFE Product
PCIDSK Database File
PCRaster Raster File
ILWIS Raster Map
SGI Image File Format 1.0
SRTMHGT File Format
Leveller heightfield
Terragen heightfield
Network Common Data Format
EarthWatch .TIL
ERMapper .ers Labelled
NOAA Polar Orbiter Level 1b Data Set
FIT Image
GRIdded Binary (.grb, .grb2)
Raster Matrix Format
OGC Web Coverage Service
OGC Web Map Service
EUMETSAT Archive native (.nat)
Idrisi Raster A.1
Golden Software ASCII Grid (.grd)
Golden Software Binary Grid (.grd)
Golden Software 7 Binary Grid (.grd)
COSAR Annotated Binary Matrix (TerraSAR-X)
TerraSAR-X Product
DRDC COASP SAR Processor Raster
R Object Data Store
OziExplorer .MAP
Kml Super Overlay
Planet Labs Mosaics API
CALS (Type 1)
OGC Web Map Tile Service
Sentinel 2
Meta Raster Format
Portable Pixmap Format (netpbm)
USGS DOQ (Old Style)
USGS DOQ (New Style)
PCI .aux Labelled
Vexcel MFF Raster
Vexcel MFF2 (HKV) Raster
GSC Geogrid
EOSAT FAST Format
VTP .bt (Binary Terrain) 1.3 Format
Erdas .LAN/.GIS
Convair PolGASP
NLAPS Data Format
Erdas Imagine Raw
DIPEx
FARSITE v.4 Landscape File (.lcp)
NOAA Vertical Datum .GTX
NADCON .los/.las Datum Grid Shift
NTv2 Datum Grid Shift
CTable2 Datum Grid Shift
ACE2
Snow Data Assimilation System
KOLOR Raw
ROI_PAC raster
R Raster
Natural Resources Canada's Geoid
NOAA GEOCON/NADCON5 .b format
NSIDC Sea Ice Concentrations binary (.bin)
Swedish Grid RIK (.rik)
USGS Optional ASCII DEM (and CDED)
GeoSoft Grid Exchange Format
Bathymetry Attributed Grid
S-102 Bathymetric Surface Product
Hierarchical Data Format Release 5
HDF5 Dataset
Northwood Numeric Grid Format .grd/.tab
Northwood Classified Grid Format .grc/.tab
ARC Digitized Raster Graphics
Standard Raster Product (ASRP/USRP)
Magellan topo (.blx)
SAGA GIS Binary Grid (.sdat, .sg-grd-z)
ASCII Gridded XYZ
HF2/HFZ heightfield raster
OziExplorer Image File
USGS LULC Composite Theme Grid
ZMap Plus Grid
NOAA NGS Geoid Height Grids
IRIS data (.PPI, .CAPPi etc)
Racurs PHOTOMOD PRF
Scaled Integer Gridded DEM .sigdem
TGA/TARGA Image File Format
OGCAPI
Spatio-Temporal Asset Catalog Tiled Assets
Spatio-Temporal Asset Catalog Items
Geographic Network generic file based model
Geographic Network generic DB based model
ESRI Shapefile
MapInfo File
UK .NTF
SDTS
IHO S-57 (ENC)
Microstation DGN
VRT - Virtual Datasource
Memory
Comma Separated Value (.csv)
Keyhole Markup Language (KML)
GeoJSON
GeoJSON Sequence
ESRIJSON
TopoJSON
GMT ASCII Vectors (.gmt)
GeoPackage
SQLite / Spatialite
WAsP .map format
PostgreSQL/PostGIS
ESRI FileGDB
AutoCAD DXF
AutoCAD Driver
FlatGeobuf
Geoconcept
Czech Cadastral Exchange Data Format
PostgreSQL SQL dump
French EDIGEO exchange format
Idrisi Vector (.vct)
Elastic Search
Carto
AmigoCloud
Storage and eXchange Format
Selafin
VDV-451/VDV-452/INTREST Data Format
NextGIS Web
MapML
General Transit Feed Specification
OGC Features and Geometries JSON
U.S. Census TIGER/Line
Arc/Info Binary Coverage
Arc/Info E00 (ASCII) Coverage
Generic Binary (.hdr Labelled)
ENVI .hdr Labelled
ESRI .hdr Labelled
ISCE raster

Notably this is still missing support for anything that depends on libexpat or libmysqlclient and likely a few other more specific dependencies. For libexpat there does not seem to be a crate with up to date rust bindings at all, the libmysqlclient-sys crate still misses bundling support. (With my diesel hat on, the last thing is something I want to address at some point).

It's up to discussion which of these drivers should be enabled by default.

@weiznich weiznich force-pushed the feature/bundled_build branch from e1d7faf to 54ef44f Compare January 30, 2024 13:27
@weiznich weiznich marked this pull request as ready for review January 30, 2024 13:40
@weiznich
Copy link
Contributor Author

I confirmed that this works on linux (x86_86), windows (msvc, x86_64) and macos (aarch64).

@weiznich weiznich force-pushed the feature/bundled_build branch from 7db4695 to 4b494d2 Compare September 18, 2024 13:36
@weiznich
Copy link
Contributor Author

@jdroenner Thanks for the response ❤️

However, I am really worried about maintaining it. If the submodule is the only viable way to got then let's try it.

I still would like to hear some details on this.
Independently from your answer: I can offer to do a submodule update from time to time after this is merged. I personally wouldn't expect much more trouble there than bumping the commit from time to time, at least as long as nobody finds bugs in gdal itself.

@lnicola
Copy link
Member

lnicola commented Sep 18, 2024

IMO, the submodule is the most boring part of this. I don't like them, but the alternatives are worse (as you've clearly shown above).

I'm more worried about the CMake config and proliferation of features, but we'll see how problematic they are.

@Isaac-Leonard
Copy link

I'm attempting to use this PR in a project that also uses the geo package with the "use-proj" feature and getting version confilicts for proj_sys with the key part of the error being:
the package proj-sys links to the native library proj, but it conflicts with a previous package which links to proj as well:

Is there a way to specify which proj version to link to?

@weiznich
Copy link
Contributor Author

@Isaac-Leonard cargo only allows one version of a *-sys tree in your dependency tree as these crates always link to native dependencies and those cannot appear multiple times. This change needs a very recent proj-sys version as we needed to adjust the build script there as well to work correctly. It seems like geo just depends on an older version, so they would need to update to the new version to be compatible. Maybe open a PR there to bump the version?

@Isaac-Leonard
Copy link

Isaac-Leonard commented Sep 20, 2024

I was able to fix it by patching dependencies to the proj repo and it works grate.
Another question I had though I'm not sure if this is the best place to ask but don't know anywhere better, is if it is possible to build the apps as well, currently they're disabled in the build.rs file but if enabled is there a reasonably easy way to access them?
I've got an app I'm trying to package up and would like to make it so people don't need to install gdal themselves but I'm using the commandline tools where the bindings are lacking.
I'm not familiar enough with the build process to know if I can extract files from dependancies build outputs though I suspect I can't. If I can't then I may have to end up pulling the build.rs from this into my own build process and do it that way.

@weiznich
Copy link
Contributor Author

@Isaac-Leonard It might be possible to do something like that, but at least for me that's clearly out of scope for this PR as it is additional functionality. I suggest that you check with the rust-gdal maintainers if they are comfortable with adding a new feature flag for building the binaries as well. After that you would need to change the BUILD_APPS flag here to ON depending on whether the feature flag is set or not. Afterwards you would need to reexport a environment variable containing the build dir root so that other cares can consume this that and find the binaries.

@lnicola
Copy link
Member

lnicola commented Sep 25, 2024

Regarding the apps, I think we should instead offer bindings for them. It's not too complicated, just that there's a ton of API surface to bikeshed. And, as weiznich said, it's a bit unusual to build apps together with a library crate.

That said, can we merge this now?

@jdroenner
Copy link
Member

yes lets merge it

@lnicola lnicola merged commit cd8efcd into georust:master Sep 25, 2024
14 checks passed
@weiznich
Copy link
Contributor Author

I'm naturally interested in seeing a release that contains this new feature. Is there anything that I could do to make this happen faster?

@lnicola
Copy link
Member

lnicola commented Sep 27, 2024

Let's pick a reasonable cut-off date (say 2-4 weeks from now), in case we can get in any other PRs until then. Would that time frame work for you?

@weiznich
Copy link
Contributor Author

That would be really nice. It's also fine for me if it takes a few weeks longer, but I would prefer not to have wait months for this, as cloning the submodule everytime in CI is a possible source of network failures.

@lnicola
Copy link
Member

lnicola commented Sep 27, 2024

Or let's just publish it as -alpha.0 or something.

@weiznich weiznich mentioned this pull request Oct 3, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide a bundled feature
7 participants