Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Import .CSV files with custom cycler definition metadata file #97

Open
BradyPlanden opened this issue Jun 12, 2023 · 5 comments
Open

Import .CSV files with custom cycler definition metadata file #97

BradyPlanden opened this issue Jun 12, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@BradyPlanden
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Currently, importing .CSV files from non-supported cyclers isn't supported.

Describe the solution you'd like
A method to import .CSV files with a corresponding JSON file that defines a custom cycler data standard. For example, a user with an Arbin exported .CSV file could provide a JSON file that defines the header names for import into Galv. This could also be used to import virtual data gained from predictive models, as long as the metadata file had the corresponding information.

Additional context
To integrate with #95, if the JSON exported (as per #95) contained the required information to reimport into Galv that would enable users to share between Galv instances. Perhaps, there might be a better method for this though.

@BradyPlanden BradyPlanden added the enhancement New feature or request label Jun 12, 2023
@BradyPlanden
Copy link
Collaborator Author

If needed, I have Arbin .CSV files that can be used for testing.

@martinjrobins
Copy link
Collaborator

On the parser side, this would involve implementing a new parser that uses the JSON file to map columns in the CSV to our standard columns.

The (perhaps bigger) piece of work would be to allow users to supply this JSON file. Perhaps this would be a field in the harvester @mjaquiery? We'd also need to determin the format of this JSON, but seems like it would just be a dictionary that maps column names to our standard column names. In this case, we'd have to tell the user that they need to provide a csv with the 1st row being header names

@mjaquiery
Copy link
Collaborator

I'd suggest we have two header rows, one with column names and one with data type.
Is it easier for end users to provide csv files with a particular structure, or write mapping files? I guess the latter is more shareable between users. Another alternative is that we hack file extensions, so e.g. the Arbin files get converted from .csv to e.g. .arb, and we teach a harvester how to interpret those files itself...?

@BradyPlanden
Copy link
Collaborator Author

BradyPlanden commented Jun 22, 2023

Hmm, can we infer the data-type from the data itself? We could then compare the data-type to an approved list or single value and throw an error if mismatch. I think it's probable that asking an average user to define data-type is too much.

I think it's easiest for end users to provide CSV files with a corresponding cycler definition structure. This could either be in JSON format (we could provide some example for users to copy).

@mjaquiery
Copy link
Collaborator

Perhaps a better alternative is to maintain internal structures ourselves, and let users upload .csv and select where it comes from from a list? We can allow advanced users to create new mappings (in JSON or something) where they provide an example .csv file and add metadata for the columns.

My concern is that data types aren't always simply parsable from data: datetime strings are difficult to automatically recognise, for example, and sometimes datasets use string values (e.g. "NA") to represent missing numerical data. Maybe I'm not quite seeing where these data are coming from and what they look like in their original form.

We might benefit from a real-time discussion about this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants