Skip to content

4.1 init.R

Mateusz Żółtak edited this page Oct 16, 2019 · 3 revisions

The init step creates the index of the source Sentinel 2 L2A data matching given spatial and temporal range (also set of bands and maximal cloud coverage).

All other scripts refer to this index while computing their input/output files list.

You can think about running the init step as of defining a dataset for a workflow.

The dataset is identified by the combination of five parameters - regionName, startDate, endDate (command line ones) as well as bands and cloudCov (config file ones).

These parameters are taken by all other scripts which allows them to refer to the index of a proper dataset.

If you want to refresh the dataset simply run the init.R script again with the same combination of parameters.

Rationale

Having a dedicated source data index is important for two reasons:

  • Allowing near real time processing and running many parallel workflows sharing parts of the data.
  • Assuring all source data have corresponding end products.

In both cases it comes down to having a one, quickly accessible source which is listing data to be processed by a given workflow.

It allows to easily tell if some data are missing because something went wrong on a previous processing step or because they come from a different workflow and we shouldn't bother about them (e.g. more recent data download or a workflow producing different target indicator but sharing some intermediate data, e.g. computing NDVI at some point).

Connection with the BOKU Sentinel 2 processing service

To prepare L2A data for the Landsupport project BOKU is making use of its own Senntinel 2 processing platform - https://s2.boku.eodc.eu. The platform allows to query available Sentinel 2 data, schedule their processing to the L2A level and download L2A data at the single band granularity level.

Scheduling L2A data processing on the platform is performed with a so called regions of intereset with each region having a spatial geometry and being identified by a name. To prepare data for the Landsupport project regions corresponding to all European countries have been created on the platform as well as the global region spanning over whole Europe.

While already using the platform to prepare L2A data for the project it was a natural choice to couple this package with it. And as the easiest way to provide spatial-based data searches on the platform is to filter by a region of interest coverage and all the necessary regions of interest already existed on the platform it was decided to denote a desired area in the init.R script (and other scripts) using corresponding region of interest name.

If you want to define your own area of interest, just go to the https://s2.boku.eodc.eu service, log in using the Landsupport project account, define a new region of interest and then use its name as the regionName command line parameter passed to this package scripts.

If in the future another source of L2A data is considered the init step (and also download one) should be rewritten to make use of the other source data discovery API and to denote area of interest in a way compatible with that API.

Command line arguments

  • A standard set of configFilePath, regionName, startDate and endDate.
    • The regionName parameter is the name of the region of interest in the https://s2.boku.eodc.eu service. It is used to define the spatial coverage of the data. See the discussion about the BOKU Sentinel 2 processing service above.
  • user and pswd credentials for the https://s2.boku.eodc.eu service - see the discussion above.

Data input/output

Data index is written into location specified by the cacheTmpl configuration property.

It consists of two files:

  • The csv file listing all matching Sentinel 2 L2A images.
  • The geojson file storing spatial coverage of the data.

Performance

Performance of this step fully depends on the remote API speed.

Configuration

  • cacheTmpl location and index file name template. Feel free to adjust the directory but rather leave the default value of the file name template.
  • bands a list of Sentinel 2 bands included in the index. Keep it as short as possible to avoid spending much time on downloading files in the download step.
  • cloudCov maximum accepted cloud coverage (it doesn't make sense to download and process very cloudy scenes).