Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on database type: Is sqlite sufficient? #1

Open
vincentvanhees opened this issue Dec 17, 2018 · 3 comments
Open

Decide on database type: Is sqlite sufficient? #1

vincentvanhees opened this issue Dec 17, 2018 · 3 comments

Comments

@vincentvanhees
Copy link
Member

No description provided.

@vincentvanhees
Copy link
Member Author

Possible requirements:

  • Ability to put the data set on a server and allow for multi-user interaction.
  • Data size may grow, so scaling is a concern.
  • Considering that not all data can be fully open access, some means of controlling data access per user is required.
  • Anticipated data characteristics:
    • 5 million rows: 100 (participants) x 5400 seconds per person x 10 value per person (aggregated from real raw data)
    • 70 columns, most likely split across multiple tables (30 from video, 30 from audio, 10 text related)
    • Data types: mostly time series in double, although some channels may be binary / boolean.

@vincentvanhees
Copy link
Member Author

Conclusion for now after talking to Carlos and Jisk: Start with SQLite for prototyping and switch to something else (e.g. postgresql) later on.

@vincentvanhees
Copy link
Member Author

  • Data characteristics:
    o Probably less than 100 participants
    o 50 GB per person raw data:
     Video .mp4
     Audio .wav
     Labels and descriptions xml / txt
     Text derived from audio ??
    o Student says 100 GB derived data?
    o We have some software to extract informative data features:
     1 GB per person derived data in csv files.
     Possibly aggregate to lower resolution.
  • Aims:
    o Search data based on labels or data characteristics.
    o Visualize query results
    o Train/apply machine learning to query results.
    o Export aggregated values for use outside DB
    o Share ‘data’ with community to facilite re-use
  • Plan:
    o SQLite for now -> need to visualize schema
    o Simply pilot in jupyter notebook, no Django framework
    o Data sharing via surfdrive for now, but not ideal => other options?
    o How to go back to video files? Can we create specific links in DB to specific fragments of video files?
     How to host this?
     Are there similar projects?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant