Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promoting LookupExtractor state and LookupExtractorFactory to be a first class druid state object. #2291

Merged
merged 1 commit into from
Feb 11, 2016

Conversation

b-slim
Copy link
Contributor

@b-slim b-slim commented Jan 19, 2016

Currently druid doesn't have any reference manager to register or delete LookupExtractor objects.
Also Currently the only way to use Lookup extraction type user has to wrap it around an ExtractionFn, this is very verbose and make optimization very painful (Lookup exposes unapply and extraction function does not).
This PR:
1 - Introduces a LookupExtractorFactory instance manager called LookupReferencesManager allowing basic operations to register/un-register/listAll or remove LookupExtractorFactory instances.
2 - Provides an implementation of LookupExtractor that delegates the lookup functionality to a registered lookup. This implementation is set to be by default, so any query that comes with actual namespace it will try to use the LookupReferencesManager
3 - Defines a new way to use Lookup directly via an implementation of DimensionSpec called LookupDimensionSpec
4 - LookupExtractorFactory will manage the lifecycle and the state of LookupExtractor.
5 - Adds to LookupExtractor the property isOneToOne to enable optimization at the broker level.
6 - Does not introduce any performance changes.
FYI: We decided to move away from LookupExtractionFn in favor of implementing DimensionSpec and use the lookup delegator to do the apply/unapply.
This has couple of advantages, lookups become less verbose, and optimaztion more easy to check for.
The fact that LookupExtractor exposes methods that are not included at the ExtractionFn API it doesn't make sense to use lookups via ExtractionFN API.
Here is a overview of the overall roadmap of QTL development

@JsonCreator
public MapLookupExtractor(
@JsonProperty("map") Map<String, String> map
@JsonProperty("map") Map<String, String> map,
@JsonProperty("isOneToOne") boolean isOneToOne
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any tests included that use MapLookupExtractor with isOneToOne == true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, would it be better to call this property "isInjective", to be more consistent with LookupExtractionFn and such?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just injective is appropriate. isInjective() is the proper bean convention for extracting value of boolean field injective

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jon-wei and @drcrallen i think it is more clear to use OneToOne:
First the actual property of extraction type is already called ExtractionType.ONE_TO_ONE
Second i think OneToOne is more clear than the mathematical term injective.

@drcrallen
Copy link
Contributor

Hi b-slim can you add more flavor to the master comment on how this differs from the dimension extraction approach, and what the performance ramifications are?

"type":"lookup"
"dimension":"dimensionName"
"outputName":"dimensionOutputName"
"lookup":{"type": "map", "map":{"key":"value"}, "isOneToOne":false}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not "injective" like the extraction function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah its injective in LookupExtractionFn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drcrallen we decided to move away from LookupExtractionFn in favor of implementing DimensionSpec and use the lookup delegator to do the apply/unapply.
This has couple of advantages, lookups become less verbose, and optimaztion more easy to check for.
The fact that LookupExtractor exposes methods that are not included at the ExtractionFn API it doesn't make sense to use lookups via ExtractionFN API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

{
lookupDimSpec = new LookupDimensionSpec("dimName", "outputName", new MapLookupExtractor(
ImmutableMap.<String, String>of("key", "value"),
true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jon-wei the test is here

@drcrallen
Copy link
Contributor

@b-slim / @cheddar This is not tagged for 0.9.0 yet. As such it'll probably be lower priority in review compared to getting the 0.9.0 stuff merged.

@b-slim b-slim added this to the 0.9.0 milestone Jan 21, 2016
@b-slim
Copy link
Contributor Author

b-slim commented Jan 21, 2016

@drcrallen this need to merged ASAP, it is blocking the development of QTL. Even tho i think QTL won't be done for 0.9 but having this merged is very important.

@b-slim b-slim force-pushed the lookupManager branch 2 times, most recently from b0f6908 to 55d5186 Compare January 21, 2016 19:55
@@ -105,6 +107,8 @@ public void configure(Binder binder)
Jerseys.addResource(binder, ClientInfoResource.class);
LifecycleModule.register(binder, QueryResource.class);
LifecycleModule.register(binder, DruidBroker.class);
binder.bind(LookupReferencesManager.class).in(ManageLifecycle.class);
LifecycleModule.register(binder, LookupReferencesManager.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is binder.bind(..) necessary given that you are doing LifecycleModule.register(..) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's what docs said.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm wondering why it is not needed for DruidBroker.class then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@himanshug as you can see here registering is not enough to create the object, you need to either bind it to a scope or explicitly ask for it. maybe @cheddar can give a better explanation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@himanshug after testing via my IDE you are actually right, i hope @cheddar can provide a clarification ...

@drcrallen
Copy link
Contributor

Is there a reason this needs to be in druid core and not an extension?

private final ExtractionFn extractionFn;

@JsonCreator
public LookupDimensionSpec(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making this

public LookupDimensionSpec(
  @JsonProperty("dimension") String dimension,
  @JsonProperty("outputName") String outputName,
  @JsonProperty("lookup") LookupExtractor lookupInput,
  @JsonProperty("name") String name,
  @JsonProperty("retainMissingValues") final boolean retainMissingValues,
  @JsonProperty("replaceMissingWith") final String replaceMissingWith,
  @JacksonInject LookupReferencesManager lookupReferencesManager,
)

Where it is expected that you either specify "name" or "lookup", the one you do not specify is null.

If you specify lookup, then said lookup is used when getExtractionFn() is called. If you specify name, then said name is looked up in the manager when getExtractionFn() is called.

Structuring it this way should avoid issues with the lookup not existing on the router and completely eliminate the need for the LookupDelegator

@b-slim
Copy link
Contributor Author

b-slim commented Feb 9, 2016

@drcrallen can you please check this one more time ? i have changed the way to create lookups as @cheddar suggested.

{
this.retainMissingValues = retainMissingValues;
this.replaceMissingWith = Strings.emptyToNull(replaceMissingWith);
Preconditions.checkArgument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just use FunctionalExtraction to do these checks?

@cheddar
Copy link
Contributor

cheddar commented Feb 9, 2016

I left a few tidying up comments, once addressed I'm 👍

@Override
public byte[] getCacheKey()
{
return lookupExtractor.getCacheKey();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should include parameters for the query time parameters.

@drcrallen
Copy link
Contributor

oh that's a weird failure:

Running io.druid.segment.data.IncrementalIndexTest
Tests run: 10, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.122 sec <<< FAILURE! - in io.druid.segment.data.IncrementalIndexTest
testConcurrentAddRead[1](io.druid.segment.data.IncrementalIndexTest)  Time elapsed: 0.067 sec  <<< ERROR!
java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException: Index: 549, Size: 549
    at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
    at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
    at io.druid.segment.data.IncrementalIndexTest.testConcurrentAddRead(IncrementalIndexTest.java:528)
Caused by: java.lang.IndexOutOfBoundsException: Index: 549, Size: 549
    at java.util.ArrayList.rangeCheck(ArrayList.java:653)
    at java.util.ArrayList.get(ArrayList.java:429)
    at io.druid.segment.incremental.OffheapIncrementalIndex.getMetricLongValue(OffheapIncrementalIndex.java:304)
    at io.druid.segment.incremental.IncrementalIndexStorageAdapter$1$1$6.get(IncrementalIndexStorageAdapter.java:479)
    at io.druid.query.aggregation.LongSumAggregator.aggregate(LongSumAggregator.java:60)
    at io.druid.query.timeseries.TimeseriesQueryEngine$1.apply(TimeseriesQueryEngine.java:70)
    at io.druid.query.timeseries.TimeseriesQueryEngine$1.apply(TimeseriesQueryEngine.java:54)
    at io.druid.query.QueryRunnerHelper$1.apply(QueryRunnerHelper.java:80)
    at io.druid.query.QueryRunnerHelper$1.apply(QueryRunnerHelper.java:75)
    at com.metamx.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:39)
    at com.metamx.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:39)
    at com.metamx.common.guava.YieldingAccumulators$1.accumulate(YieldingAccumulators.java:32)
    at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:104)
    at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:81)
    at com.metamx.common.guava.BaseSequence.accumulate(BaseSequence.java:67)
    at com.metamx.common.guava.MappedSequence.accumulate(MappedSequence.java:40)
    at com.metamx.common.guava.MappedSequence.accumulate(MappedSequence.java:40)
    at com.metamx.common.guava.FilteredSequence.accumulate(FilteredSequence.java:42)
    at com.metamx.common.guava.MappedSequence.accumulate(MappedSequence.java:40)
    at io.druid.segment.data.IncrementalIndexTest$4.run(IncrementalIndexTest.java:489)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

@drcrallen drcrallen closed this Feb 9, 2016
@drcrallen drcrallen reopened this Feb 9, 2016
@@ -252,7 +252,7 @@ It is illegal to set `retainMissingValue = true` and also specify a `replaceMiss

A property of `injective` specifies if optimizations can be used which assume there is no combining of multiple names into one. For example: If ABC123 is the only key that maps to SomeCompany, that can be optimized since it is a unique lookup. But if both ABC123 and DEF456 BOTH map to SomeCompany, then that is NOT a unique lookup. Setting this value to true and setting `retainMissingValue` to FALSE (the default) may cause undesired behavior.

A property `optimize` can be supplied to allow optimization of lookup based extraction filter (by default `optimize = false`).
A property `optimize` can be supplied to allow optimization of lookup based extraction filter (by default `optimize = true`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an incompatible change for configs that had relied on defaulting to false. Can you explain more on why this won't impact configs that were relying on that behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drcrallen lookups is an experimental feature, so changes like that are expected to happen.
I have set this to true, after had been tested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LookupExtractor is not listed as experimental, and neither is the "optimize" flag (as far as I can tell).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it was added in 032d3bf which is in 0.9.0 As such it can change just fine, but the default for an experimental feature should be legacy behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well... it can be changed in 0.9.0 (which is not yet released). Does the default behavior need to change before we release 0.9.0?

@b-slim
Copy link
Contributor Author

b-slim commented Feb 10, 2016

@drcrallen please check the new changes !

@b-slim
Copy link
Contributor Author

b-slim commented Feb 11, 2016

@drcrallen more comments ?

@drcrallen
Copy link
Contributor

@b-slim I'm still not sure about #2291 (diff) and if changing the default from false to true is going to be problematic. I'd like a second opinion from one of the other committers (maybe @cheddar since he's already in this PR) and will stand by whichever way they decide. 👍 after second opinion.

@b-slim
Copy link
Contributor Author

b-slim commented Feb 11, 2016

@drcrallen if [https://github.com//pull/2291#discussion-diff-52407458R74] is a blocker i will reverted and send a separate PR to make it true by default. What do you think ?

@cheddar
Copy link
Contributor

cheddar commented Feb 11, 2016

Fwiw, I think that given that the lookups are still pretty experimental, the change from the default to optimize doesn't seem so bad to me. The worst risk is that the optimization is broken and it breaks people who move up to this version.

Hopefully their tests will show that it is broken and we will have a chance to fix before it's too horrible. So, given the risks and the fact that this is experimental, I'm fine with the change.

@drcrallen
Copy link
Contributor

cool, 👍 then

@b-slim
Copy link
Contributor Author

b-slim commented Feb 11, 2016

@drcrallen and @cheddar thanks for the review i will merge after build pass.!

b-slim added a commit that referenced this pull request Feb 11, 2016
Promoting LookupExtractor state and LookupExtractorFactory to be a first class druid state object.
@b-slim b-slim merged commit 368988d into master Feb 11, 2016
@drcrallen drcrallen deleted the lookupManager branch February 12, 2016 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants