Promoting LookupExtractor state and LookupExtractorFactory to be a first class druid state object. #2291

b-slim · 2016-01-19T21:34:33Z

Currently druid doesn't have any reference manager to register or delete LookupExtractor objects.
Also Currently the only way to use Lookup extraction type user has to wrap it around an ExtractionFn, this is very verbose and make optimization very painful (Lookup exposes unapply and extraction function does not).
This PR:
1 - Introduces a LookupExtractorFactory instance manager called LookupReferencesManager allowing basic operations to register/un-register/listAll or remove LookupExtractorFactory instances.
2 - Provides an implementation of LookupExtractor that delegates the lookup functionality to a registered lookup. This implementation is set to be by default, so any query that comes with actual namespace it will try to use the LookupReferencesManager
3 - Defines a new way to use Lookup directly via an implementation of DimensionSpec called LookupDimensionSpec
4 - LookupExtractorFactory will manage the lifecycle and the state of LookupExtractor.
5 - Adds to LookupExtractor the property isOneToOne to enable optimization at the broker level.
6 - Does not introduce any performance changes.
FYI: We decided to move away from LookupExtractionFn in favor of implementing DimensionSpec and use the lookup delegator to do the apply/unapply.
This has couple of advantages, lookups become less verbose, and optimaztion more easy to check for.
The fact that LookupExtractor exposes methods that are not included at the ExtractionFn API it doesn't make sense to use lookups via ExtractionFN API.
Here is a overview of the overall roadmap of QTL development

jon-wei · 2016-01-19T22:06:49Z

processing/src/main/java/io/druid/query/extraction/MapLookupExtractor.java

  @JsonCreator
  public MapLookupExtractor(
-      @JsonProperty("map") Map<String, String> map
+      @JsonProperty("map") Map<String, String> map,
+      @JsonProperty("isOneToOne") boolean isOneToOne


Are there any tests included that use MapLookupExtractor with isOneToOne == true?

Minor comment, would it be better to call this property "isInjective", to be more consistent with LookupExtractionFn and such?

just injective is appropriate. isInjective() is the proper bean convention for extracting value of boolean field injective

@jon-wei and @drcrallen i think it is more clear to use OneToOne:
First the actual property of extraction type is already called ExtractionType.ONE_TO_ONE
Second i think OneToOne is more clear than the mathematical term injective.

drcrallen · 2016-01-19T22:57:54Z

Hi b-slim can you add more flavor to the master comment on how this differs from the dimension extraction approach, and what the performance ramifications are?

drcrallen · 2016-01-19T22:58:13Z

docs/content/querying/dimensionspecs.md

+  "type":"lookup"
+  "dimension":"dimensionName"
+  "outputName":"dimensionOutputName"
+  "lookup":{"type": "map", "map":{"key":"value"}, "isOneToOne":false}


why not "injective" like the extraction function?

@drcrallen it is called ONE_TO_ONE https://github.com/druid-io/druid/blob/3459a202ce751cb60884519ef902e35280550895/processing/src/main/java/io/druid/query/extraction/ExtractionFn.java#L98-L98

Ah its injective in LookupExtractionFn

@drcrallen we decided to move away from LookupExtractionFn in favor of implementing DimensionSpec and use the lookup delegator to do the apply/unapply.
This has couple of advantages, lookups become less verbose, and optimaztion more easy to check for.
The fact that LookupExtractor exposes methods that are not included at the ExtractionFn API it doesn't make sense to use lookups via ExtractionFN API

b-slim · 2016-01-20T18:58:41Z

processing/src/test/java/io/druid/query/dimension/LookupDimensionSpecTest.java

+  {
+    lookupDimSpec = new LookupDimensionSpec("dimName", "outputName", new MapLookupExtractor(
+        ImmutableMap.<String, String>of("key", "value"),
+        true


@jon-wei the test is here

drcrallen · 2016-01-21T02:32:10Z

@b-slim / @cheddar This is not tagged for 0.9.0 yet. As such it'll probably be lower priority in review compared to getting the 0.9.0 stuff merged.

b-slim · 2016-01-21T15:34:34Z

@drcrallen this need to merged ASAP, it is blocking the development of QTL. Even tho i think QTL won't be done for 0.9 but having this merged is very important.

himanshug · 2016-01-21T21:22:13Z

services/src/main/java/io/druid/cli/CliBroker.java

@@ -105,6 +107,8 @@ public void configure(Binder binder)
            Jerseys.addResource(binder, ClientInfoResource.class);
            LifecycleModule.register(binder, QueryResource.class);
            LifecycleModule.register(binder, DruidBroker.class);
+            binder.bind(LookupReferencesManager.class).in(ManageLifecycle.class);
+            LifecycleModule.register(binder, LookupReferencesManager.class);


is binder.bind(..) necessary given that you are doing LifecycleModule.register(..) ?

yes, that's what docs said.

i'm wondering why it is not needed for DruidBroker.class then?

@himanshug as you can see here registering is not enough to create the object, you need to either bind it to a scope or explicitly ask for it. maybe @cheddar can give a better explanation.

@himanshug after testing via my IDE you are actually right, i hope @cheddar can provide a clarification ...

drcrallen · 2016-01-22T19:47:27Z

Is there a reason this needs to be in druid core and not an extension?

cheddar · 2016-02-08T23:07:07Z

processing/src/main/java/io/druid/query/dimension/LookupDimensionSpec.java

+  private final ExtractionFn extractionFn;
+
+  @JsonCreator
+  public LookupDimensionSpec(


What do you think about making this

public LookupDimensionSpec( @JsonProperty("dimension") String dimension, @JsonProperty("outputName") String outputName, @JsonProperty("lookup") LookupExtractor lookupInput, @JsonProperty("name") String name, @JsonProperty("retainMissingValues") final boolean retainMissingValues, @JsonProperty("replaceMissingWith") final String replaceMissingWith, @JacksonInject LookupReferencesManager lookupReferencesManager, )

Where it is expected that you either specify "name" or "lookup", the one you do not specify is null.

If you specify lookup, then said lookup is used when getExtractionFn() is called. If you specify name, then said name is looked up in the manager when getExtractionFn() is called.

Structuring it this way should avoid issues with the lookup not existing on the router and completely eliminate the need for the LookupDelegator

b-slim · 2016-02-09T21:19:07Z

@drcrallen can you please check this one more time ? i have changed the way to create lookups as @cheddar suggested.

drcrallen · 2016-02-09T21:38:15Z

processing/src/main/java/io/druid/query/dimension/LookupDimensionSpec.java

+  {
+    this.retainMissingValues = retainMissingValues;
+    this.replaceMissingWith = Strings.emptyToNull(replaceMissingWith);
+    Preconditions.checkArgument(


Can this just use FunctionalExtraction to do these checks?

cheddar · 2016-02-09T21:45:59Z

I left a few tidying up comments, once addressed I'm 👍

drcrallen · 2016-02-09T21:46:13Z

processing/src/main/java/io/druid/query/dimension/LookupDimensionSpec.java

+      @Override
+      public byte[] getCacheKey()
+      {
+        return lookupExtractor.getCacheKey();


Should include parameters for the query time parameters.

drcrallen · 2016-02-09T23:43:07Z

oh that's a weird failure:

Running io.druid.segment.data.IncrementalIndexTest
Tests run: 10, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.122 sec <<< FAILURE! - in io.druid.segment.data.IncrementalIndexTest
testConcurrentAddRead[1](io.druid.segment.data.IncrementalIndexTest)  Time elapsed: 0.067 sec  <<< ERROR!
java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException: Index: 549, Size: 549
    at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
    at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
    at io.druid.segment.data.IncrementalIndexTest.testConcurrentAddRead(IncrementalIndexTest.java:528)
Caused by: java.lang.IndexOutOfBoundsException: Index: 549, Size: 549
    at java.util.ArrayList.rangeCheck(ArrayList.java:653)
    at java.util.ArrayList.get(ArrayList.java:429)
    at io.druid.segment.incremental.OffheapIncrementalIndex.getMetricLongValue(OffheapIncrementalIndex.java:304)
    at io.druid.segment.incremental.IncrementalIndexStorageAdapter$1$1$6.get(IncrementalIndexStorageAdapter.java:479)
    at io.druid.query.aggregation.LongSumAggregator.aggregate(LongSumAggregator.java:60)
    at io.druid.query.timeseries.TimeseriesQueryEngine$1.apply(TimeseriesQueryEngine.java:70)
    at io.druid.query.timeseries.TimeseriesQueryEngine$1.apply(TimeseriesQueryEngine.java:54)
    at io.druid.query.QueryRunnerHelper$1.apply(QueryRunnerHelper.java:80)
    at io.druid.query.QueryRunnerHelper$1.apply(QueryRunnerHelper.java:75)
    at com.metamx.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:39)
    at com.metamx.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:39)
    at com.metamx.common.guava.YieldingAccumulators$1.accumulate(YieldingAccumulators.java:32)
    at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:104)
    at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:81)
    at com.metamx.common.guava.BaseSequence.accumulate(BaseSequence.java:67)
    at com.metamx.common.guava.MappedSequence.accumulate(MappedSequence.java:40)
    at com.metamx.common.guava.MappedSequence.accumulate(MappedSequence.java:40)
    at com.metamx.common.guava.FilteredSequence.accumulate(FilteredSequence.java:42)
    at com.metamx.common.guava.MappedSequence.accumulate(MappedSequence.java:40)
    at io.druid.segment.data.IncrementalIndexTest$4.run(IncrementalIndexTest.java:489)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

drcrallen · 2016-02-10T02:18:29Z

docs/content/querying/dimensionspecs.md

@@ -252,7 +252,7 @@ It is illegal to set `retainMissingValue = true` and also specify a `replaceMiss

 A property of `injective` specifies if optimizations can be used which assume there is no combining of multiple names into one. For example: If ABC123 is the only key that maps to SomeCompany, that can be optimized since it is a unique lookup. But if both ABC123 and DEF456 BOTH map to SomeCompany, then that is NOT a unique lookup. Setting this value to true and setting `retainMissingValue` to FALSE (the default) may cause undesired behavior.

-A property `optimize` can be supplied to allow optimization of lookup based extraction filter (by default `optimize = false`). 
+A property `optimize` can be supplied to allow optimization of lookup based extraction filter (by default `optimize = true`). 


this is an incompatible change for configs that had relied on defaulting to false. Can you explain more on why this won't impact configs that were relying on that behavior?

@drcrallen lookups is an experimental feature, so changes like that are expected to happen.
I have set this to true, after had been tested.

LookupExtractor is not listed as experimental, and neither is the "optimize" flag (as far as I can tell).

Looks like it was added in 032d3bf which is in 0.9.0 As such it can change just fine, but the default for an experimental feature should be legacy behavior.

well... it can be changed in 0.9.0 (which is not yet released). Does the default behavior need to change before we release 0.9.0?

b-slim · 2016-02-10T15:03:15Z

@drcrallen please check the new changes !

b-slim · 2016-02-11T13:24:45Z

@drcrallen more comments ?

drcrallen · 2016-02-11T15:26:13Z

@b-slim I'm still not sure about #2291 (diff) and if changing the default from false to true is going to be problematic. I'd like a second opinion from one of the other committers (maybe @cheddar since he's already in this PR) and will stand by whichever way they decide. 👍 after second opinion.

b-slim · 2016-02-11T15:35:59Z

@drcrallen if [https://github.com//pull/2291#discussion-diff-52407458R74] is a blocker i will reverted and send a separate PR to make it true by default. What do you think ?

cheddar · 2016-02-11T17:19:01Z

Fwiw, I think that given that the lookups are still pretty experimental, the change from the default to optimize doesn't seem so bad to me. The worst risk is that the optimization is broken and it breaks people who move up to this version.

Hopefully their tests will show that it is broken and we will have a chance to fix before it's too horrible. So, given the risks and the fact that this is experimental, I'm fine with the change.

drcrallen · 2016-02-11T17:44:17Z

cool, 👍 then

b-slim · 2016-02-11T18:12:55Z

@drcrallen and @cheddar thanks for the review i will merge after build pass.!

Promoting LookupExtractor state and LookupExtractorFactory to be a first class druid state object.

b-slim added Feature Discuss labels Jan 19, 2016

jon-wei reviewed Jan 19, 2016
View reviewed changes

drcrallen reviewed Jan 19, 2016
View reviewed changes

b-slim assigned cheddar Jan 20, 2016

b-slim reviewed Jan 20, 2016
View reviewed changes

b-slim mentioned this pull request Jan 20, 2016

Query time lookups need a tutorial #1573

Closed

b-slim closed this Jan 20, 2016

b-slim reopened this Jan 20, 2016

b-slim force-pushed the lookupManager branch from 026703e to 4b6c461 Compare January 20, 2016 21:25

b-slim removed the Discuss label Jan 20, 2016

b-slim added this to the 0.9.0 milestone Jan 21, 2016

b-slim force-pushed the lookupManager branch 2 times, most recently from b0f6908 to 55d5186 Compare January 21, 2016 19:55

himanshug reviewed Jan 21, 2016
View reviewed changes

cheddar reviewed Feb 8, 2016
View reviewed changes

drcrallen reviewed Feb 9, 2016
View reviewed changes

drcrallen closed this Feb 9, 2016

drcrallen reopened this Feb 9, 2016

drcrallen reviewed Feb 10, 2016
View reviewed changes

b-slim force-pushed the lookupManager branch from c9d4d96 to ee904ff Compare February 10, 2016 16:23

Adding lookup ref manager and lookup dimension spec impl

4e119b7

b-slim force-pushed the lookupManager branch from ee904ff to 4e119b7 Compare February 11, 2016 18:12

b-slim added a commit that referenced this pull request Feb 11, 2016

Merge pull request #2291 from druid-io/lookupManager

368988d

Promoting LookupExtractor state and LookupExtractorFactory to be a first class druid state object.

b-slim merged commit 368988d into master Feb 11, 2016

drcrallen deleted the lookupManager branch February 12, 2016 18:25

navis mentioned this pull request Feb 16, 2016

delete System.out.println() in test code #2474

Merged

clambertus unassigned cheddar Jul 6, 2018

Promoting LookupExtractor state and LookupExtractorFactory to be a first class druid state object. #2291

Promoting LookupExtractor state and LookupExtractorFactory to be a first class druid state object. #2291

Conversation

b-slim commented Jan 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drcrallen commented Jan 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drcrallen commented Jan 21, 2016

b-slim commented Jan 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drcrallen commented Jan 22, 2016

Choose a reason for hiding this comment

b-slim commented Feb 9, 2016

Choose a reason for hiding this comment

cheddar commented Feb 9, 2016

Choose a reason for hiding this comment

drcrallen commented Feb 9, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b-slim commented Feb 10, 2016

b-slim commented Feb 11, 2016

drcrallen commented Feb 11, 2016

b-slim commented Feb 11, 2016

cheddar commented Feb 11, 2016

drcrallen commented Feb 11, 2016

b-slim commented Feb 11, 2016