Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern acceleration #85

Closed
wants to merge 33 commits into from

Conversation

elliVM
Copy link
Contributor

@elliVM elliVM commented Sep 9, 2024

Pattern acceleration feature to activate bloom filtering on set regex pattern. Goal is to limit bloom filtering to certain patterns like UUID.

Flow:

  1. Walker walks XML and for each search term regex checks tokenized term against tables in bloomdb excluding the filtertype table
  2. Collect list of tables with a pattern match record from SQL metadata
  3. For each table a temp table is created
  4. For each temp table generate condition that compares temp table filter against parent table filter using bloommatch UDF
  5. Combine conditions from each table, if all filters were null then also true
  6. Combine conditions from each search term
  7. Left join tables to NestedTopNQuery and use the final combined condition in query

Notes: Tokenizer max token count is set to 0 to get only major tokens since that is what dpf_03 currently uses to tokenize the bloom filter tables

elliVM added 30 commits August 27, 2024 10:02
…n, primary constructor for IndexStatementCondition explanatory comments for LatestCondition and EarliestCondition
… testing for ConditionWalker bloom search terms
@elliVM elliVM self-assigned this Sep 9, 2024
@elliVM elliVM changed the title Pattern acceleration refactor Pattern acceleration Sep 9, 2024
@elliVM elliVM linked an issue Sep 9, 2024 that may be closed by this pull request

public BloomFilterTempTable(DSLContext ctx, Table<?> parentTable, long bloomTermId, Set<Token> tokenSet) {
this.ctx = ctx;
this.parentTable = parentTable;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be ideal if the primary constructor had no code

filterBAOS.close();
}
catch (IOException e) {
throw new UncheckedIOException(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps it would be worth it to add some extra context to this exception in addition to the original exception to help debugging in the future.


Set<Token> tokenSet = new PatternMatch(ctx, "test").tokenSet();
BloomFilterTempTable tempTable = new BloomFilterTempTable(ctx, table, 0L, tokenSet);
Assertions.assertThrows(RuntimeException.class, tempTable::generateCondition);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check that RuntimeException.getMessage() is of expected value?

}

@Test
void singleTalbePatternMatchTest() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Member

@kortemik kortemik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rebase this on the latest main branch, it has commits that do not belong here. i.e. license changes

@elliVM
Copy link
Contributor Author

elliVM commented Sep 12, 2024

Moving to another branch

@elliVM elliVM closed this Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants