-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promoting LookupExtractor state and LookupExtractorFactory to be a first class druid state object. #2291
Conversation
@JsonCreator | ||
public MapLookupExtractor( | ||
@JsonProperty("map") Map<String, String> map | ||
@JsonProperty("map") Map<String, String> map, | ||
@JsonProperty("isOneToOne") boolean isOneToOne |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any tests included that use MapLookupExtractor with isOneToOne == true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, would it be better to call this property "isInjective", to be more consistent with LookupExtractionFn and such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just injective
is appropriate. isInjective() is the proper bean convention for extracting value of boolean field injective
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jon-wei and @drcrallen i think it is more clear to use OneToOne
:
First the actual property of extraction type is already called ExtractionType.ONE_TO_ONE
Second i think OneToOne
is more clear than the mathematical term injective.
Hi b-slim can you add more flavor to the master comment on how this differs from the dimension extraction approach, and what the performance ramifications are? |
"type":"lookup" | ||
"dimension":"dimensionName" | ||
"outputName":"dimensionOutputName" | ||
"lookup":{"type": "map", "map":{"key":"value"}, "isOneToOne":false} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not "injective" like the extraction function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah its injective in LookupExtractionFn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drcrallen we decided to move away from LookupExtractionFn
in favor of implementing DimensionSpec
and use the lookup delegator to do the apply/unapply.
This has couple of advantages, lookups become less verbose, and optimaztion more easy to check for.
The fact that LookupExtractor
exposes methods that are not included at the ExtractionFn
API it doesn't make sense to use lookups via ExtractionFN
API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
{ | ||
lookupDimSpec = new LookupDimensionSpec("dimName", "outputName", new MapLookupExtractor( | ||
ImmutableMap.<String, String>of("key", "value"), | ||
true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jon-wei the test is here
@drcrallen this need to merged ASAP, it is blocking the development of QTL. Even tho i think QTL won't be done for 0.9 but having this merged is very important. |
b0f6908
to
55d5186
Compare
@@ -105,6 +107,8 @@ public void configure(Binder binder) | |||
Jerseys.addResource(binder, ClientInfoResource.class); | |||
LifecycleModule.register(binder, QueryResource.class); | |||
LifecycleModule.register(binder, DruidBroker.class); | |||
binder.bind(LookupReferencesManager.class).in(ManageLifecycle.class); | |||
LifecycleModule.register(binder, LookupReferencesManager.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is binder.bind(..) necessary given that you are doing LifecycleModule.register(..) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's what docs said.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm wondering why it is not needed for DruidBroker.class then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@himanshug as you can see here registering is not enough to create the object, you need to either bind it to a scope or explicitly ask for it. maybe @cheddar can give a better explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@himanshug after testing via my IDE you are actually right, i hope @cheddar can provide a clarification ...
Is there a reason this needs to be in druid core and not an extension? |
private final ExtractionFn extractionFn; | ||
|
||
@JsonCreator | ||
public LookupDimensionSpec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about making this
public LookupDimensionSpec(
@JsonProperty("dimension") String dimension,
@JsonProperty("outputName") String outputName,
@JsonProperty("lookup") LookupExtractor lookupInput,
@JsonProperty("name") String name,
@JsonProperty("retainMissingValues") final boolean retainMissingValues,
@JsonProperty("replaceMissingWith") final String replaceMissingWith,
@JacksonInject LookupReferencesManager lookupReferencesManager,
)
Where it is expected that you either specify "name" or "lookup", the one you do not specify is null.
If you specify lookup, then said lookup is used when getExtractionFn()
is called. If you specify name, then said name is looked up in the manager when getExtractionFn()
is called.
Structuring it this way should avoid issues with the lookup not existing on the router and completely eliminate the need for the LookupDelegator
@drcrallen can you please check this one more time ? i have changed the way to create lookups as @cheddar suggested. |
{ | ||
this.retainMissingValues = retainMissingValues; | ||
this.replaceMissingWith = Strings.emptyToNull(replaceMissingWith); | ||
Preconditions.checkArgument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just use FunctionalExtraction
to do these checks?
I left a few tidying up comments, once addressed I'm 👍 |
@Override | ||
public byte[] getCacheKey() | ||
{ | ||
return lookupExtractor.getCacheKey(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should include parameters for the query time parameters.
oh that's a weird failure:
|
@@ -252,7 +252,7 @@ It is illegal to set `retainMissingValue = true` and also specify a `replaceMiss | |||
|
|||
A property of `injective` specifies if optimizations can be used which assume there is no combining of multiple names into one. For example: If ABC123 is the only key that maps to SomeCompany, that can be optimized since it is a unique lookup. But if both ABC123 and DEF456 BOTH map to SomeCompany, then that is NOT a unique lookup. Setting this value to true and setting `retainMissingValue` to FALSE (the default) may cause undesired behavior. | |||
|
|||
A property `optimize` can be supplied to allow optimization of lookup based extraction filter (by default `optimize = false`). | |||
A property `optimize` can be supplied to allow optimization of lookup based extraction filter (by default `optimize = true`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an incompatible change for configs that had relied on defaulting to false. Can you explain more on why this won't impact configs that were relying on that behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drcrallen lookups is an experimental feature, so changes like that are expected to happen.
I have set this to true, after had been tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LookupExtractor is not listed as experimental, and neither is the "optimize" flag (as far as I can tell).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it was added in 032d3bf which is in 0.9.0 As such it can change just fine, but the default for an experimental feature should be legacy behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well... it can be changed in 0.9.0 (which is not yet released). Does the default behavior need to change before we release 0.9.0?
@drcrallen please check the new changes ! |
c9d4d96
to
ee904ff
Compare
@drcrallen more comments ? |
@b-slim I'm still not sure about #2291 (diff) and if changing the default from false to true is going to be problematic. I'd like a second opinion from one of the other committers (maybe @cheddar since he's already in this PR) and will stand by whichever way they decide. 👍 after second opinion. |
@drcrallen if [https://github.com//pull/2291#discussion-diff-52407458R74] is a blocker i will reverted and send a separate PR to make it true by default. What do you think ? |
Fwiw, I think that given that the lookups are still pretty experimental, the change from the default to optimize doesn't seem so bad to me. The worst risk is that the optimization is broken and it breaks people who move up to this version. Hopefully their tests will show that it is broken and we will have a chance to fix before it's too horrible. So, given the risks and the fact that this is experimental, I'm fine with the change. |
cool, 👍 then |
ee904ff
to
4e119b7
Compare
@drcrallen and @cheddar thanks for the review i will merge after build pass.! |
Promoting LookupExtractor state and LookupExtractorFactory to be a first class druid state object.
Currently druid doesn't have any reference manager to register or delete
LookupExtractor
objects.Also Currently the only way to use
Lookup
extraction type user has to wrap it around anExtractionFn
, this is very verbose and make optimization very painful (Lookup exposes unapply and extraction function does not).This PR:
1 - Introduces a
LookupExtractorFactory
instance manager calledLookupReferencesManager
allowing basic operations to register/un-register/listAll or removeLookupExtractorFactory
instances.2 - Provides an implementation of
LookupExtractor
that delegates the lookup functionality to a registered lookup. This implementation is set to be by default, so any query that comes with actual namespace it will try to use theLookupReferencesManager
3 - Defines a new way to use Lookup directly via an implementation of
DimensionSpec
calledLookupDimensionSpec
4 -
LookupExtractorFactory
will manage the lifecycle and the state of LookupExtractor.5 - Adds to
LookupExtractor
the propertyisOneToOne
to enable optimization at the broker level.6 - Does not introduce any performance changes.
FYI: We decided to move away from
LookupExtractionFn
in favor of implementingDimensionSpec
and use the lookup delegator to do the apply/unapply.This has couple of advantages, lookups become less verbose, and optimaztion more easy to check for.
The fact that
LookupExtractor
exposes methods that are not included at theExtractionFn
API it doesn't make sense to use lookups viaExtractionFN
API.Here is a overview of the overall roadmap of QTL development