The ODC VA Cube Notebooks are Jupyter Notebooks for the Virginia Open Data Cube project.
For an overview of the available notebooks, see the Notebooks Overview document.
For an overview of the available datasets, see the Datasets Overview
First follow the instructions in the Docker Installation Guide if you do not have Docker installed yet.
Follow the instructions in the Open Data Cube Database Installation Guide to setup the Open Data Cube (ODC) database.
Follow the instructions in the Operation Manual to set up and operate the Jupyter Notebook environment.
Some notebooks - generally prefixed with _GEE
- make use of Google Earth Engine (GEE) data. You must be registered as an Earth Engine developer. If not, you may submit an application to Google. These notebooks make use of the CEOS ODC-GEE project which can be found here: https://github.com/ceos-seo/odc-gee.
You will also need GEE service account credentials - specifically the private key JSON file.
To use a GEE dataset from the [Earth Engine Data Catalog[(https://developers.google.com/earth-engine/datasets/), a new product must be created using the new_product
command. Format: new_product --asset <asset_id> <product_name.yaml>
where the asset_id
is provided in the "Earth Engine Snippet" string on the dataset's page on the catalog and product_name.yaml
is the path to the output YAML file containing the ODC product definition. For example, to index Landsat 8 Level 2 Collection 2 Tier 1 data: new_product --asset LANDSAT/LC08/C02/T1_L2 ls8_l2_c2_gee.yaml
. The full process is as follows:
- Run the
new_product
command to create the product definition. - Reformat the product definition to match a standard format, such as this one.
- Change the
aliases
field for the measurements as desired. Do NOT change thename
field of any measurement - creating the product will fail if thename
fields are changed. - Run
datacube product add <path-to-product-definition-file>
to add the product.
Notably, you will need to add a storage
section with crs
and resolution
entries to avoid having to specify the output_crs
and resolution
each time data is loaded from the product.
After adding the product, it is a non-indexed GEE product. It must be loaded using a modified version of the datacube.Datacube
class, as will be shown later.
Alternatively, the data can be indexed using the index_gee
command, making it an indexed GEE product, but this is deprecated. Format: index_gee --asset <asset_id> --product <product_name> [--latitude (lat1, lat2) --longitude (lon1, lon2) --time (YYYY-MM-DD, YYYY-MM-DD) --region <region_name>]
(for information on the optional arguments and others not listed here, run index_gee --help
). For example, to index Landsat 8 Level 2 Collection 2 Tier 1 data for the United States: index_gee --asset LANDSAT/LC08/C02/T1_L2 --product <product_name> --latitude (25.3168, 49.4885) --longitude (-125.2052, -66.6657)
. Data for these products can be loaded by the normal datacube.Datacube
class.
To load data from non-GEE products, use the datacube.Datacube
class as always:
from datacube import Datacube
dc = datacube.Datacube()
To load data from GEE products, use the odc_gee.earthengine.Datacube
class:
from odc_gee.earthengine import Datacube as GEE_Datacube
dc = GEE_Datacube()
To load data from indexed GEE products (remember this is deprecated), use the datacube.Datacube
class as shown above.
These are the benefits and penalties of loading data from Google Earth Engine through the ODC-GEE module instead of from other datasources such as S3.
Benefits:
- Data does not need to be indexed before loading, which allows new datasets to be added and queried quickly, which allows faster prototyping. This also results in a much smaller ODC index database.
- There is no cost to loading data from GEE.
Penalties:
- Data has a very low throughput - just a few MiB per second.
- (Advanced) Using
dask_chunks
in theload()
call does not work as expected. Normally, loads specifying thedask_chunks
parameters will not immediately load data - instead creating the plan to load data with Dask. Instead, the data immediately begins loading as ifdask_chunks
was not specified. This can be problematic for datasets that are larger than the amount of available memory. This problem does not occur when loading data using the normaldatacube.Datacube
class.