-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document TorchGeo alternatives #1742
Conversation
Hello maintainers of the following libraries:
I'm doing a related works survey and wanted to document basic stats on features supported by other geospatial ML libraries. I included all of your libraries in my list due to their popularity. I'm hoping this will make it easier for us to advertise other libraries and help users decide which library is right for them. If you're willing, could you check the above preview and make sure I didn't get anything wrong? I'm especially not sure about which libraries support exactly which features. I'll merge this PR in about a week, but if anything ever changes, feel free to submit a PR to update this table, and feel free to share this table in your own documentation! P.S. I couldn't get all of the PaddleRS or GeoTorchAI tests to pass no matter what I tried, so if you can tell me the test coverage when all tests pass, I can update that. |
Hey, thank you for doing this! Regarding Raster Vision, the only change I would suggest is to add |
Worth also considering:
I'm sure you are also aware of the longer list I maintain at https://github.com/satellite-image-deep-learning/software#deep-learning-projects--frameworks but as a word of caution, keeping these lists fresh is quite time consuming! |
Yes, I definitely agree it can be time consuming. I started from:
then removed all projects that:
I'm hoping that this shorter, more focused list will be easier to maintain. I'm planning to keep TorchGeo up-to-date as frequently as I can, but will update the rest of the list less frequently (a couple times per year). The GitHub/Download stuff is easy to scrape, the rest was much harder to determine. |
@adamjstewart I will provide feedback regarding As for the datasets, please consider that Please also note that the examples of the on-line book are on Kaggle (https://www.kaggle.com/esensing/code). As for the models, it depends on how you count. The online book provides many examples. If you count each chapter of the book as providing a model, we would have 11 models. As for CLI, Here are some suggested changes regarding (a) spatial backend: Again, many thanks for reaching out and congratulations on your effort and you respect and concern for the work of the community. Best |
Thanks @gilbertocamara! Here are my thoughts: (a): spatial backend: this is intended to be the thing that is used as a database for spatial joins and intersection checking. In the Python ecosystem, R-tree is an example of this. GDAL handles file I/O and warping, but isn't really a spatial database. Not sure if that distinction makes sense or is just arbitrary. Do any of those R dependencies fit this description? I'm happy to add some or all of these. So basically, I think I agree with your proposed changes for a-c, but not d-e. |
Points taken. I understand your reasoning for points d-e. One also needs to consider the target user community of each package. Most users of geospatial data take Google Earth Engine as their reference. GEE's API provides a benchmark for all implementations of ML methods on Earth observation data. Users love GEE's API. That is why we designed Many good packages require their users to have programming experience. This leads to examples that use many more commands than GEE to achieve the same tasks. Xarray-based software such as Open Data Cube or Pangeo is a case in point. In terms of usability to real users (i.e. non-programmers), a good measure would be to count the number of API calls required to perform a task in GEE and those required by other packages. In this respect, consider including a further item in your comparison chart. Such an item could be called "GEE-style", which would measure on a 1..10 scale how close to GEE will be the user experience when using the software. Best |
I think GEE-style would end up being very subjective. TorchGeo is modeled after torchvision, so we would also need a torchvision-style scale for users familiar with that API. I'd rather not get into API design debates, so I stuck with what features they do or do not support. All APIs are valid, as long as they are intuitive and useful for large groups of users. We could possibly have an "API style" field with values like torchvision, GEE, etc., but I don't know whether or not all packages on the list are modeled after another API. |
Hi @adamjstewart We have different mileages and constituencies, and the field is big enough to have space for everyone. I have one question regarding torchgeo. The package provides many relevant models (i.e., FCN, ResNet). However, I did not find DL models explicitly designed for image time series (e.g., Temporal CNN, Temporal Attention Encoder). Could you please provide me with more information on how torchgeo deals with image time series? |
Time-series modeling in TorchGeo is still admittedly immature. We almost have three categories for time-series support: ✅ full support: Raster Vision, SITS By partial support I mean that these libraries explicitly model the time dimension in their data cubes and offer ways to sample based on time or perform change detection but otherwise have no builtin models for time-series analysis. Do you think this is a fair evaluation? I can update the table with this split. |
P.S. Which of terra, sf, stars, gdalcubes, rstac would you describe as the spatial database you use? |
In |
Hi @adamjstewart As for the time series support, I welcome your suggestion of updating the table. |
How is the metadata loaded? If you want to find files that have spatial or temporal overlap, how do you do that? Excuse my ignorance of the entire R ecosystem, I've never used it before. |
Under that rubric, Raster Vision should qualify as "partial support" as well. |
Considering your question:
We don't. We assume users know which collections are available in each major cloud provider. The package is designed to extract image collections from cloud repositories (e.g., AWS, Microsoft Planetary Computer, Digital Earth Africa) accessing the content of those collections with STAC. The chosen image collection can be regularized for ML analysis. In simple terms, The We are focused on our mission and on helping other users with similar aims, especially users in developing countries, which do not have the same programming expertise as Python experts in the North. Thus, |
Raster Vision at least has a tutorial specifically for working with time-series data and a SeriesResNet model: https://docs.rastervision.io/en/latest/usage/tutorials/temporal.html. I think that's a step above the rest. |
@gilbertocamara in that case maybe SITS should be listed as having no spatial backend. I'm not sure about the other libraries either. I'm confident about TorchGeo and GeoTorchAI but less confident about the rest. I just picked dependencies that looked relevant for spatial intersection/union-related computations. I don't know too many other libraries that behave like TorchGeo and let you combine arbitrary datasets based on spatiotemporal metadata. |
Ok, now that I understand your points better, I agree. We do not rely on Thanks for your patience. |
But GEE has the ability to combine multiple datasets based on spatial metadata. Does SITS offer that or does it require a user to do that themselves? Or does it query cloud repos and request data from a certain region/CRS, so the cloud repos themselves are the spatial backend? |
Not exactly. the combination provided by GEE is based on some sort of temporal averaging. The final image which is used for processing in GEE is a multispectral band image. GEE's combination usually results in an information loss. |
Image time series classification is done on a time-first, space-later basis. Each pixel of a data cube is associated to a time series. Methods such LTAE classify each time series. After the classification, then we apply a spatial smoothing filter based on Bayesian statistics. That's not how GEE works. |
But how is your data cube constructed? How do you, for example, combine Landsat and CDL in such a way that each pixel of Landsat lines up with each pixel of CDL geospatially? |
The data cube starts from an image collection. Take an MGRS tile used in Sentinel-2 ARD data. Consider, for example, a one-year period. In this period, there are typically anywhere from 50 to 100 S2 images (Sentinel-2 and Sentinel-2A). Then, select a reference period (say, 16 days). Within each 16 day period, S2 images are combined to get the best cloud-free composite. This produces a regular data-cube. Areas with persistent cloud cover are flagged as missing data. When the time series is to be classified, such missing data is estimated by temporal interpolation. |
We do not combine Landsat and Sentinel ourselves. We use NASA's HLS product. |
Hi all, Speaking for DeepForest, this table seems fair. I want to underline that i'd like DeepForest to play nicely with torchgeo models in the future (following https://torchgeo.readthedocs.io/en/stable/api/trainers.html#torchgeo.trainers.ObjectDetectionTask). I am retraining the tree backbone now. Both packages use pytorch lightning as a wrapper. DeepForest aims and user base seems to be much more focused on drone UAV imagery and less on satellite, but i'd like to compare the utility of pre-trained weights on transfer learning among resolutions. Stay tuned for that. Thanks everyone for their great open source work, i'm all ears on how to better integrate efforts. Best, Ben |
Thanks, @adamjstewart ! Regarding PaddleRS, I would suggest the following changes: Features:
GitHub:
|
Thanks @Bobholamovic! My thoughts:
|
|
|
Thanks for the updates! Actually, there are 12 pre-processing scripts, but 5 of them are empty (not implemented yet), so I guess the count should be 7. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done @adamjstewart! I unfortunately do not know how to make the tables look nice under our current theme (if anything it would be a nasty CSS hack).
This PR adds a list of alternatives to TorchGeo that users can explore. You can preview the docs here.
@calebrob6 any idea how to make tables wider or scrollable? It might be a limitation of our current Sphinx theme or it might be something we can override. Either way, I want to look into replacing our current sphinx theme with something that is properly maintained.