-
Notifications
You must be signed in to change notification settings - Fork 15
Reading and writing datasets with the CLI
This page is no longer in use. Its content has been input to the following more up-to-date pages
We still have to work out a way to let users conveniently pass in dataset references that are opened by the CLI, and also specify a file output for any workflow or operation output.
For example
- If we have an input argument
NAME=VALUE
and the type of the input NAME isxarray.Dataset
, then we could accept a command argument VALUE of the formDATASOURCE-ID,DATE1,DATE2
which would evaluate something likeworkflow.set_input(NAME, read_dataset(DATASOURCE-ID,DATE1,DATE2))
. - If we have an input argument
NAME=VALUE
and the type of the input NAME isxarray.DataArray
orxarray.Variable
, then we could accept a command argument VALUE of the formVARIABLE-NAME,DATASOURCE-ID,DATE1,DATE2
which would evaluate something likeworkflow.set_input(NAME, read_dataset(DATASOURCE-ID,DATE1,DATE2)[VARIABLE-NAME])
. - We could also accept
URL[?QUERY]
for web service hosted datasets. - If we have an output option argument
--output NAME=VALUE
and the type of the output NAME isDataset
, then we could accept a command argument VALUE of the formFILE-PATH[,FORMAT-NAME]
which would evaluate something likewrite_dataset(workflow.get_output(NAME))
-- Norman
This makes sense. However, it wouldn't really solve the reason for being creative. The visualization op now takes in a filepath in which to save the plot. So, having --output of type Dataset really does not solve it. I'm not exactly sure how to go around this, but we will have multiple types of workflow outputs, such as *.nc files for datasets, *.txt for comma separated values, or tables, *.png, *.jpeg. *.pdf, whatever, for plots.
This could of course be solved by having a giant command line invocation, such as "wflow.json --input1 SOME_DATASET --input2 ANOTHER_DATASET --startTime XXXX-XX-XX --endTime YYY-YY-YY --output1 /home/user/Desktop/fig1.png --output2 /home/user/Desktop/fig2.png --output3 /home/user/Desktop/fig3.png --output4 /home/user/Desktop/fig4.png --output5 /home/user/Desktop/fig5.png --output6 /home/user/Desktop/correlation_parameters.txt"
Having a single '--outputFolder /home/user/Desktop' parameter for the command line invocation and then somehow be able to use this to construct the actual output names would be preferred. This can be done by using this one folder parameter to construct actual names using 'expression' nodes. This has to be done because the only way how to provide an input to a node is by connecting it to another node. So, either the name comes in from the command line invocation, or it is created in another node.
There could be default values in operations. But this again does not really solve the problem, as invoking an operation that write an output twice would result in the output being overwritten. I'm not sure, maybe having a giant CLI invocation is actually the way to go.
-- Jānis
Note it is not --input1 SOME_DATASET
but input1=SOME_DATASET
instead. I proposed --output NAME=FILE
or -o NAME=FILE
to explicitly provide a FILE sink for the output named NAME
. NAME
must of course not necessarily be a Dataset
.
What about out_dir=... -o output1=$out_dir/fig1.png -o output2=$out_dir/fig2.png
to shorten things? We can also register data_writers so we have a mapping from format name, file extension, or data type to a function write_data(data, file)
to make it more convenient for users.
Note that you could also combine the analysis in a Python function correlation_analysis
that would take two datasets and pairs of variables plus a few options as inputs and just the output directory as output. Use case 9 is generic enough to live in its own function!
-- Norman