Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Main user-facing change: We now match the source column order in
transformed_{table}.csv
andsynth_{table}.csv
output files.Internally, we now use
list
more regularly when working with columns; there are still some places where we useset
, but only when we genuinely want set semantics (i.e. we don't want dupes when checking which columns are safe for ancestral seeding or determining which columns to drop for independent pre-processing).We only specify the
columns
parameter ondf.to_csv(...)
calls when writing the final output files from transforms or synthetics. For interim files (mainly synthetics stuff: pre-processed data sources for training, and seeds for ancestral generation) we do not care about the order, nor can we specify it easily if we wanted to because the columns in those files are not identical to the source (independent omits some, ancestral omits some and adds others and renames, etc.)