Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2. basic ops #13

Open
ashlinrichardson opened this issue Mar 30, 2021 · 0 comments
Open

2. basic ops #13

ashlinrichardson opened this issue Mar 30, 2021 · 0 comments

Comments

@ashlinrichardson
Copy link
Collaborator

Basic operations to support minimal data quality assessment, make life more live-able, and increase the ease and effectiveness for data-science swat-team deployments, all in the large-tabular-data context

  • path normalization for interop between environments (classify path format by OS and translate to native format)
  • data type detect: nominal, numeric, date, geo
  • date detect and format validation
  • data dictionary vs file matching
  • data dict normalization plus recovery from multiline cells
  • metadata: fields search, description search, w support for fuzzy matching
  • semantic matching
  • autodetect and application of human-readable lookups present in other tables
  • flatfile parsing -- all sets
  • dataset identification and integration
  • redundant records detection -- large data
  • lossless data compression
  • windowing for multitemporal analysis
  • low memory (large data) sorting, incl. but not limited to: by date!
  • not require specific install location
  • allow people to select versions for data
  • parse and filter largest files bypassing RAM memory limitation restrictions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant