-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Accept ignore
glob patterns as standard tap config
#1240
Comments
ignore
glob patterns as standard tap config
@aaronsteers this is great 🙌 From a Meltano perspective, I think it would be preferable to maintain syntax parity between 'filter' (ignore/include) features in the SDK and those produced/expected by E.g. accepting the patterns as produced from the
Does that make sense? |
@kgpayne - Agreed: if we can use the same syntax as The passthrough is also a more performant implementation, exactly because it short-circuits the discovery process on those streams when supported, also reducing the size of the generated catalog. |
@aaronsteers great 👏 If we follow the Meltano select convention, I think your examples in the issue description reverse 🤔 I.e. "*" would become "select all" and On naming, would it make sense to call these more generically |
Sorry. I did not mean to suggest to use My point was just that we can use the syntax of the rules, but applied to |
@aaronsteers ah, ok. Thanks for clarifying. How do you see the common ask of "limit discovery to selected streams" working with I agree that So by 'push down' I imagined that the select patterns, supported in the same format by both Meltano and the SDK, could be injected verbatim from Does that make sense? Maybe "limit discovery to selected streams" isn't a perfect fit for |
I don't think it's as duplicative as that - specifying the rule either in select or in exclude should be sufficient. I don't think you need anything in the select rules except '*' - and that is only needed if the tap does not select its streams by default. So, the ignore rule would just be A better-matched use case would be if we have a tap with a three-part stream name containing That's very similar to the default behavior for excluding |
This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the |
Still relevant, in so far as it relates to per-stream config (#1350). |
This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the |
Feature scope
Taps (catalog, state, stream maps, etc.)
Description
This proposal would introduce a standard tap config option lik
ignored_patterns
orignored_streams
, or justignore
, which would accept glob-like input similar to.gitignore
. This would operate similar to--exclude
in Meltano as the first-order, highest priority (de)selection logic.While this technically affects "selection" and "deselection", it actually would operate differently from both, and so we should avoid conflating them in discussion.
Like (de)selection logic:
user
table or deselect it, either way it will not by synced to the target."ignore": [ "*", "!users", "!customers" ]
, that is logically equivalent to deselecting all tables except 'users' and 'customers'. (Same as.gitignore
convention.)Unlike (de)selection logic:
"ignore": [ "addresses.*", "*.*email*" ]
, then I can be 100% sure that no selection logic will later be introduced that pulls in any tables starting with "addresses*" or any columns containing the text "email". (Those physically would not be in the catalog to be selected.)"ignore": [ "information_schema-*" ]
, then my tap doesn't need to waste time analyzing any tables withininformation_schema
.A few nice things about accepting patterns and phrasing in the negative:
selection
/deselection
logic. That logic still functions exactly according to Singer Spec.ignore
logic and save time during discovery - while also reducing the size of the generated catalog artifact.When to use
ignore
instead ofselection
.Challenges or reasons not to build
The biggest challenge is that there is not an obvious parser or glob pattern for stream and property ignore rules to follow. The easiest path would be to mimic the glob expressions that Meltano uses today for
--select
and--exclude
. But escaping is always something to consider, and there may be other alternatives out there based on jsonpath or similar, which are more standards-based, even if less inherintly readable.Another challenge is that by removing streams and properties from the catalog entirely, we miss an opportunity to document what exclusions have taken place. We could mitigate this by adding some annotations within to the catalog, such as a top-level
"ignored_streams": ["stream-a", "stream-b"]
and a stream-level"ignored_properties": ["property-a", "property-b"]
.Related to:
The text was updated successfully, but these errors were encountered: