Import .CSV files with custom cycler definition metadata file #97

BradyPlanden · 2023-06-12T08:41:35Z

Is your feature request related to a problem? Please describe.
Currently, importing .CSV files from non-supported cyclers isn't supported.

Describe the solution you'd like
A method to import .CSV files with a corresponding JSON file that defines a custom cycler data standard. For example, a user with an Arbin exported .CSV file could provide a JSON file that defines the header names for import into Galv. This could also be used to import virtual data gained from predictive models, as long as the metadata file had the corresponding information.

Additional context
To integrate with #95, if the JSON exported (as per #95) contained the required information to reimport into Galv that would enable users to share between Galv instances. Perhaps, there might be a better method for this though.

BradyPlanden · 2023-06-12T08:42:13Z

If needed, I have Arbin .CSV files that can be used for testing.

martinjrobins · 2023-06-12T10:03:54Z

On the parser side, this would involve implementing a new parser that uses the JSON file to map columns in the CSV to our standard columns.

The (perhaps bigger) piece of work would be to allow users to supply this JSON file. Perhaps this would be a field in the harvester @mjaquiery? We'd also need to determin the format of this JSON, but seems like it would just be a dictionary that maps column names to our standard column names. In this case, we'd have to tell the user that they need to provide a csv with the 1st row being header names

mjaquiery · 2023-06-12T12:04:21Z

I'd suggest we have two header rows, one with column names and one with data type.
Is it easier for end users to provide csv files with a particular structure, or write mapping files? I guess the latter is more shareable between users. Another alternative is that we hack file extensions, so e.g. the Arbin files get converted from .csv to e.g. .arb, and we teach a harvester how to interpret those files itself...?

BradyPlanden · 2023-06-22T10:39:36Z

Hmm, can we infer the data-type from the data itself? We could then compare the data-type to an approved list or single value and throw an error if mismatch. I think it's probable that asking an average user to define data-type is too much.

I think it's easiest for end users to provide CSV files with a corresponding cycler definition structure. This could either be in JSON format (we could provide some example for users to copy).

mjaquiery · 2023-06-27T10:03:30Z

Perhaps a better alternative is to maintain internal structures ourselves, and let users upload .csv and select where it comes from from a list? We can allow advanced users to create new mappings (in JSON or something) where they provide an example .csv file and add metadata for the columns.

My concern is that data types aren't always simply parsable from data: datetime strings are difficult to automatically recognise, for example, and sometimes datasets use string values (e.g. "NA") to represent missing numerical data. Maybe I'm not quite seeing where these data are coming from and what they look like in their original form.

We might benefit from a real-time discussion about this.

BradyPlanden added the enhancement New feature or request label Jun 12, 2023

mjaquiery mentioned this issue Feb 6, 2024

Import arbitrary .CSV files galv-team/galv-harvester#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import .CSV files with custom cycler definition metadata file #97

Import .CSV files with custom cycler definition metadata file #97

BradyPlanden commented Jun 12, 2023

BradyPlanden commented Jun 12, 2023

martinjrobins commented Jun 12, 2023

mjaquiery commented Jun 12, 2023

BradyPlanden commented Jun 22, 2023 •

edited

Loading

mjaquiery commented Jun 27, 2023

Import .CSV files with custom cycler definition metadata file #97

Import .CSV files with custom cycler definition metadata file #97

Comments

BradyPlanden commented Jun 12, 2023

BradyPlanden commented Jun 12, 2023

martinjrobins commented Jun 12, 2023

mjaquiery commented Jun 12, 2023

BradyPlanden commented Jun 22, 2023 • edited Loading

mjaquiery commented Jun 27, 2023

BradyPlanden commented Jun 22, 2023 •

edited

Loading