Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support extracting multiple subsets from one table #142

Closed
wants to merge 2 commits into from

Conversation

reidab
Copy link
Contributor

@reidab reidab commented Aug 20, 2022

This is a more in-depth approach to #139 than the --data-only flag I proposed in #140. That flag may still be helpful in other cases, so I'd consider that PR separately.


This introduces the concept of table Subsets, which allow multiple Filter, Anonymise, and Relationships configurations to be attached to a single table. All functionality related to reading table data is refactored from operating on an entire table to operating on a single subset (ReadTable becomes ReadSubset).

For compatibility, Filter, Anonymise, and Relationships blocks defined at the root Table level are copied into a _deafult Subset when the config file is parsed.

So, a config like this:

[[Tables]]
  Name = "grains"

  [Tables.Filter]
    Match = "grains.size = 'large'"
    Limit = 10

  [Tables.Anonymise]
    weight = "Digits"

Will be parsed into something like this behind the scenes:

{
  Name: "grains",
  Subsets: [
    {
      Name: "_default",
      Filter: {
        Match: "grains.size = 'large'",
        Limit: 10
      },
      Anonymise: {
        weight: "Digits"
      }
    },
  ]
}

One area that I'm somewhat questioningis how things should be handled when Filter config exists both at the root level and in Subsets

For example:

[[Tables]]
  Name = "grains"

  [Tables.Filter]
    Match = "grains.size = 'large'"
    Limit = 10

  [Tables.Anonymise]
    weight = "Digits"

  [[Tables.Subsets]]
    Name = "starchy"

    [Tables.Subsets.Filter]
      Match = "grains.starchy = TRUE"
    [Tables.Subsets.Anonymise]
      name = "FirstName"

This config will currently be parsed into a tree that looks like this:

{
  Name: "grains",
  Subsets: [
    {
      Name: "_default",
      Filter: {
        Match: "grains.size = 'large'",
        Limit: 10
      },
      Anonymise: {
        weight: "Digits"
      }
    },
    {
      Name: "starchy",
      Filter: {
        Match: "grains.starchy = TRUE"
      },
      Anonymise: {
        name: "FirstName"
      }
    }
  ]
}

I think this is a reasonable approach overall — it treats the root level as its own separate subset — but I can also see how it could be confusing if people thought that the Filter, Anonymise, and Relationships blocks defined at the root level would be inherited by all subsets.

The alternative is to make this scenario throw an error, so a table can either use a root-level configuration, or it can define Subsets, but it cannot do both.

Thoughts?

@reidab reidab marked this pull request as draft August 23, 2022 01:27
@reidab
Copy link
Contributor Author

reidab commented Aug 23, 2022

Converted to a draft as there are issues with tables that aren't present in the config. Working on a fix.

This introduces the concept of table Subsets, which allow multiple Filter, Anonymise, and Relationships configurations to be attached to a single table. All functionality related to reading table data is refactored from operating on an entire table to operating on a single subset (ReadTable becomes ReadSubset).

For compatibility, Filter, Anonymise, and Relationships blocks defined at the root Table level are copied into a `_deafult` Subset when the config file is parsed.
@reidab reidab marked this pull request as ready for review August 23, 2022 21:43
@reidab
Copy link
Contributor Author

reidab commented Aug 23, 2022

Cleaned up handling for undefined tables — should be ready for review again.

@lucasmdrs
Copy link
Contributor

Hi @reidab 👋

I'll need to spend a bit more time with this one, the configuration became a bit more complex as you mentioned
and I want to make sure that there are no better alternatives before

@lucasmdrs
Copy link
Contributor

Hi @reidab 👋

I've wrote an alternative to this PR on #145 which don't require changes to the toml structure.
Since you brought it up the issue in #139 and wrote this solution, I would like to get your feedback on that, if it would solve the problem for you described.

@stale
Copy link

stale bot commented Apr 7, 2023

This PR has been automatically marked as stale because it has not had any activity in the last 14 days. It will be closed if no further activity occurs, thank you for your contributions.

@stale stale bot added the stale label Apr 7, 2023
@stale
Copy link

stale bot commented Apr 14, 2023

This PR has been automatically closed because it has not had any activity in the last 21 days. Feel free to re-open in case you would like to follow up.

@stale stale bot closed this Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants