Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column order #123

Merged
merged 5 commits into from
Jun 14, 2023
Merged

Column order #123

merged 5 commits into from
Jun 14, 2023

Conversation

mikeknep
Copy link
Contributor

Main user-facing change: We now match the source column order in transformed_{table}.csv and synth_{table}.csv output files.

Internally, we now use list more regularly when working with columns; there are still some places where we use set, but only when we genuinely want set semantics (i.e. we don't want dupes when checking which columns are safe for ancestral seeding or determining which columns to drop for independent pre-processing).

We only specify the columns parameter on df.to_csv(...) calls when writing the final output files from transforms or synthetics. For interim files (mainly synthetics stuff: pre-processed data sources for training, and seeds for ancestral generation) we do not care about the order, nor can we specify it easily if we wanted to because the columns in those files are not identical to the source (independent omits some, ancestral omits some and adds others and renames, etc.)

@mikeknep mikeknep merged commit 6b450d6 into main Jun 14, 2023
@mikeknep mikeknep deleted the column-order branch June 14, 2023 18:35
@mikeknep mikeknep mentioned this pull request Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants