Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[export] reduce numeric precision to reduce dataset size by ~30% #1512

Merged
merged 5 commits into from
Jul 15, 2024

Commits on Jul 15, 2024

  1. [export] restrict sig figs for confidence/entropy

    Using fewer sig figs (or decimal places, as appropriate) is just fine
    for Auspice's usage and helps reduce the output JSON size. Testing on
    the zika dataset this reduces the (minified, gzipped) JSON by 20% from
    175kB to 139kB.
    
    This refactor also uses more thorough error checking and enforcement
    that entropy values are found together with confidence values where
    appropriate. (Note that the previous usage of `is_valid` was bogus as it
    only works for string values.)
    jameshadfield committed Jul 15, 2024
    Configuration menu
    Copy the full SHA
    d82d5bf View commit details
    Browse the repository at this point in the history
  2. [export] reduce sig figs on numeric attrs

    This extends the work in the previous commit to reduce the sig. figs /
    decimal places of all numeric node attrs. The file size of the zika
    dataset is reduced to 120kB which, when combined with the previous
    commit, is a 31% reduction cf. Augur 24.4.0
    jameshadfield committed Jul 15, 2024
    Configuration menu
    Copy the full SHA
    f2a9f9f View commit details
    Browse the repository at this point in the history
  3. [traits] include all confidences over 0.1%

    The previous restriction to the highest 4 values was motivated by
    keeping the eventual Auspice dataset small. That restriction is now part
    of `augur export v2` so we can now report them all here. This results in
    slightly larger node-data files (a 1.5% increase for the zika analysis)
    but will produce more thorough data for any scripts / non-Auspice usage.
    jameshadfield committed Jul 15, 2024
    Configuration menu
    Copy the full SHA
    f4decc0 View commit details
    Browse the repository at this point in the history
  4. [export] refactor types

    Based on PR feedback in <#1512 (comment)>
    
    This better conveys that confidences for numeric traits must have 2 and
    only 2 elements.
    jameshadfield committed Jul 15, 2024
    Configuration menu
    Copy the full SHA
    0d866c2 View commit details
    Browse the repository at this point in the history
  5. changelog

    jameshadfield committed Jul 15, 2024
    Configuration menu
    Copy the full SHA
    5f1a538 View commit details
    Browse the repository at this point in the history