Term accelerated searches using bloomfilter #30

elliVM · 2024-03-12T08:14:31Z

Allow search string pattern to be accelerated without using a global bloomfilter

elliVM · 2024-03-18T11:24:39Z

Added pattern table to bloomdb can be used to select specific filter. Pattern is stored a it's own bloom filter byte array, when a search term is included in the saved patterns (using bloommatch udf) it will activate bloom search using filter of that pattern.

For simplicity when a filter is created it will be assigned a single pattern that the search term is matched against. later this can be changed to multiple patterns per filter and vice versa.

elliVM · 2024-03-20T10:36:09Z

Changing to support multiple patterns per filter

elliVM · 2024-03-25T10:52:19Z

Implemented a schema with a pattern table and a junction table between patterns and filters,. Condition walker selects only filters with pattern match with search term and run UDF bloommatch for temp table filters generated from filter types and search term.

Add testing
Disable bloom if no filters found

elliVM · 2024-03-27T11:44:37Z

Changes to be made:

Only one pattern per filter is needed remove junction table. Move pattern to filtertype
Change to use regex for matching matching instead of UDF, start first without tokenization.
Update schema, move pattern to filtertype table as a datatype that can use regex.
Later tokenize search term before matching.

elliVM · 2024-04-04T07:02:19Z

Testing version with pattern matching against tokenized search terms

elliVM · 2024-04-12T05:28:29Z

New changes to be made

Create a database table for each stored regex pattern with a bloom filter
Join all filter tables that have a regex pattern match with incoming archive search term
Run bloommatch UDF for each logfile and select those that match any of the joined filters
To run bloommatch, create a temp table for each bloom filter table that has a pattern match

elliVM · 2024-04-22T07:35:19Z

Created a new walker that finds all dynamic bloomfilter tables that have a pattern match with the tokenized search term, will use this to select the tables for join with the main query. (Combined with Condition Walker)

elliVM · 2024-04-29T09:54:33Z

Created classes to hold dynamic tables and temp tables

elliVM · 2024-05-07T06:38:46Z

Internal PR

elliVM · 2024-06-03T09:49:24Z

updates to filtertype table: pattern varchar value increased to 2048 and pattern added to unique composite index

elliVM · 2024-07-05T09:55:32Z

Fixed issues with filter size selection in temp tables generated for bloommatch condition. Limited tokenizers to use only major tokens to match with dpf_03.

Working in QA with working filtering (pth-07 5.3.0-22-gbd5da88a)

Test example
index=alert_examples earliest=-999d "c3468f80-4273-4867-9b66-3f470787c365"

without bloom took 16-18s
with bloom 3-6s

elliVM · 2024-07-08T06:50:42Z

Fixing an issue where table pattern match filtering from meta data was fetching the whole table data to java memory, limited fetch to check only 1 row and only PK field.

elliVM · 2024-07-08T10:02:04Z

Duplicate rows on multiple pattern matches when multiple tables are joined, testing fix using group by logfile.id

update - group by too slow, false positives maybe caused by null on null bloommatch check if a pattern match table was joined that has no matching logfiles for index.

elliVM · 2024-07-15T12:51:07Z

null check after bloommatch condition for bloom filters fixed duplicate issues and speed up query with multiple joined tables.

elliVM · 2024-07-17T10:00:43Z

Fixed bug with multiple search terms, tested and working in QA.

elliVM · 2024-07-22T09:50:14Z

Updated old tests to company standards

elliVM · 2024-07-24T10:10:38Z

refactoring: move all tokenization to PatternMatch class and move all bloommatch condition generation steps to BloomFilterTempTable class

elliVM · 2024-08-05T10:07:35Z

QA testing showed good results with small number of matches performance gain fell down gradually as matches increased still all queries were faster with bloom enabled.
A single large table and multiple tables had good performance.

elliVM · 2024-08-26T10:11:09Z

will split the refactoring into another PR and implement the changes requested in review

elliVM · 2024-09-12T08:17:12Z

rebased to main

elliVM · 2024-09-16T05:07:54Z

After logic review meeting doing refactoring to make code clearer to review

Check single responsibility of classes
Split classes into smaller pieces where possible
Reduce coupling with use of interfaces
Better naming of classes and methods

elliVM · 2024-09-18T09:52:27Z

Refactoring

New classes:

Class	Responsibility
PatternMatchTables	Finds bloomdb Tables that match a pattern condition
CategoryTableImpl	Temp table from a bloom filter table that can return a `CategoryTableCondition`
Created	`CategoryTable` that is created to database
WithFilterTypes	`CategoryTable` with its filters inserted
TableFilters	Inserts filters of a `CategoryTable`
TableFilterTypesFromMetadata	Fetches different filter types of a table from metadata
CategoryTableCondition	Condition that compares category tables filter bytes against boom filter table filter bytes with bloommatch UDF, selects the same size and bloom term id
PatternMatchCondition	Condition that check if any of given tokens match with `bloomdb.filtertype.pattern`

Other changes:

Many method naming changes
Added interfaces CategoryTable, TableRecords, BloomQueryCondition
Added missing withoutFilters option and implementation to IndexStatementCondition
Equality methods for all new classes
Tests for all new classes

elliVM · 2024-09-23T09:36:40Z

Fixing tokenization of query incoming search term that worked with old bloom filters that had every token but does not work with new bloom filter tables with pattern filtered tokens since some of the added tokens are not present.
Will not tokenize search term for now, multiple values can still be searched but have to be split between search terms and search term will have to regex match a pattern.

elliVM self-assigned this Mar 12, 2024

ronja-ui mentioned this issue Jun 7, 2024

Feedback survey results teragrep/teragrep#7

Open

elliVM linked a pull request Aug 12, 2024 that will close this issue

Pattern acceleration support #65

Closed

elliVM mentioned this issue Aug 14, 2024

bloomfilter use reported to be very slow #26

Open

ronja-ui added the review Issues or pull requests waiting for a review label Aug 21, 2024

q22u removed the review Issues or pull requests waiting for a review label Aug 22, 2024

ronja-ui added review Issues or pull requests waiting for a review and removed review Issues or pull requests waiting for a review labels Aug 26, 2024

elliVM removed a link to a pull request Sep 9, 2024

Pattern acceleration support #65

Closed

StrongestNumber9 mentioned this issue Sep 9, 2024

Imput regex pattern for bloomfilter operations in DPL teragrep/pth_10#264

Open

elliVM linked a pull request Sep 9, 2024 that will close this issue

Pattern acceleration #85

Closed

elliVM mentioned this issue Sep 9, 2024

Create command to save a pattern for bloom search teragrep/pth_10#235

Closed

elliVM removed a link to a pull request Sep 12, 2024

Pattern acceleration #85

Closed

elliVM added the review Issues or pull requests waiting for a review label Sep 13, 2024

elliVM removed the review Issues or pull requests waiting for a review label Sep 16, 2024

elliVM added review Issues or pull requests waiting for a review and removed review Issues or pull requests waiting for a review labels Sep 30, 2024

elliVM closed this as completed Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Term accelerated searches using bloomfilter #30

Term accelerated searches using bloomfilter #30

elliVM commented Mar 12, 2024

elliVM commented Mar 18, 2024

elliVM commented Mar 20, 2024

elliVM commented Mar 25, 2024 •

edited

Loading

elliVM commented Mar 27, 2024 •

edited

Loading

elliVM commented Apr 4, 2024

elliVM commented Apr 12, 2024 •

edited

Loading

elliVM commented Apr 22, 2024 •

edited

Loading

elliVM commented Apr 29, 2024

elliVM commented May 7, 2024

elliVM commented Jun 3, 2024

elliVM commented Jul 5, 2024 •

edited

Loading

elliVM commented Jul 8, 2024

elliVM commented Jul 8, 2024 •

edited

Loading

elliVM commented Jul 15, 2024

elliVM commented Jul 17, 2024

elliVM commented Jul 22, 2024 •

edited

Loading

elliVM commented Jul 24, 2024

elliVM commented Aug 5, 2024

elliVM commented Aug 26, 2024

elliVM commented Sep 12, 2024 •

edited

Loading

elliVM commented Sep 16, 2024 •

edited

Loading

elliVM commented Sep 18, 2024

elliVM commented Sep 23, 2024

Term accelerated searches using bloomfilter #30

Term accelerated searches using bloomfilter #30

Comments

elliVM commented Mar 12, 2024

elliVM commented Mar 18, 2024

elliVM commented Mar 20, 2024

elliVM commented Mar 25, 2024 • edited Loading

elliVM commented Mar 27, 2024 • edited Loading

elliVM commented Apr 4, 2024

elliVM commented Apr 12, 2024 • edited Loading

elliVM commented Apr 22, 2024 • edited Loading

elliVM commented Apr 29, 2024

elliVM commented May 7, 2024

elliVM commented Jun 3, 2024

elliVM commented Jul 5, 2024 • edited Loading

elliVM commented Jul 8, 2024

elliVM commented Jul 8, 2024 • edited Loading

elliVM commented Jul 15, 2024

elliVM commented Jul 17, 2024

elliVM commented Jul 22, 2024 • edited Loading

elliVM commented Jul 24, 2024

elliVM commented Aug 5, 2024

elliVM commented Aug 26, 2024

elliVM commented Sep 12, 2024 • edited Loading

elliVM commented Sep 16, 2024 • edited Loading

elliVM commented Sep 18, 2024

Refactoring

elliVM commented Sep 23, 2024

elliVM commented Mar 25, 2024 •

edited

Loading

elliVM commented Mar 27, 2024 •

edited

Loading

elliVM commented Apr 12, 2024 •

edited

Loading

elliVM commented Apr 22, 2024 •

edited

Loading

elliVM commented Jul 5, 2024 •

edited

Loading

elliVM commented Jul 8, 2024 •

edited

Loading

elliVM commented Jul 22, 2024 •

edited

Loading

elliVM commented Sep 12, 2024 •

edited

Loading

elliVM commented Sep 16, 2024 •

edited

Loading