M3Query improvements #1590

arnikola · 2019-04-26T12:56:45Z

General

Remove hand-rolled json writers in favor of using a common library (Consider using [easyjson](https://github.com/mailru/easyjson) for query API JSON serialization #1144)
Remove use of global Zap logger usage in favor of src/x/instrument logger which uses Zap and use src/x/test/NewLogger in tests
Potentially refactor the physical/execution plan and controller to simplify it and be able to more easily add tracing in a single place (rather than adding it on a per-function level)
Refactor executor.Engine to be an interface to allow us to mock it in tests.

Add benchmarker and correctness tester

Look at Prombench again, if possible make use of its query generation engine to build a tool to generate valid queries and hit them against both Prom and m3q to verify correctness
Run prombench against m3query to identify areas to improve performance
Potentially add hooks into our CI process that would run diffs vs prombench to ensure correctness and no perf degradation

Introduce pooling to intermediary blocks

Ensure block closing logic will not cause blocks to be double-closed or to hang around for too long
Add pools to ColumnBlockBuilder
investigate adding a RowBlockBuilder which would be optimized for series-wise traversal vs row-based traversal, possibly with some of the downstream

Series consolidation

Improve options for series consolidation after fetches (functionality here may change based on how indexing will look - may have different paths for dealing with a centralized index compared with index per namespace)

Function improvements

Translate function arguments to parameters in the parsing step rather than sending interface{} and making functions
Remove NaNs from output during processing
Make OffsetBlock into a more generic DeferredBlock (something like: take in a row transform, a column transform, and a time transform) ([query] Update offset to accept opts #1638)
Refactor remaining lazy functions to use this DeferredBlock
Enable multi-target querying (potentially re-use fetch results)

Block splitting logic

Enable block splitting logic
Ensure that nowhere matches series by order across block boundaries
Remove forced padding for block splitting logic
Refactor series consolidation at the end of processing to be a separate step (maybe as a function)

Future work

Dynamic function optimization (less essential with DeferredBlock)
Reuse fetches (potentially do something clever about combining into a single fetch? not sure it's useful though)
Optimized remote function execution (e.g. functions like sum run on multiple DCs can be run remotely first, then have post-aggregated results sent rather than just performing the fetch remotely)
Centralized index support
Support M3QL

The text was updated successfully, but these errors were encountered:

robskillington · 2019-04-29T18:45:27Z

One thing interesting about the JSON writers, the major reason in the past we did that is because the default JSON marshaller allocates a lot, however if we have JSON write methods for bytes/strings/whatever other concrete types we can avoid allocations when returning responses (which is what we have today with the current common library streamjson writer).

arnikola · 2019-04-30T14:45:35Z

Sounds good; was going to add a few benchmarks around the handrolled JSON writer since figured that we were using it on our legacy stack for a reason :p

andrewmains12 · 2019-04-30T18:52:10Z

One thing interesting about the JSON writers, the major reason in the past we did that is because the default JSON marshaller allocates a lot, however if we have JSON write methods for bytes/strings/whatever other concrete types we can avoid allocations when returning responses (which is what we have today with the current common library streamjson writer).

For sure; I think the ideal is to use a codegenned marshaller (or other speedy JSON implementation). #1144 has some suggestions from another contributor.

benraskin92 · 2019-04-30T19:50:02Z

Do we want to add tracing to more paths?

benraskin92 · 2019-04-30T19:54:43Z

Also, as an aspirational goal, how about removing time from blocks and just using step size?

asafm · 2020-10-11T18:38:58Z

@gibbscullen Is this issue tracked in another issue?

gibbscullen · 2020-10-12T01:29:19Z

Closed as part of effort to clean up stale issues. Re-opening as still interest in tracking issue.

arnikola added the area:query All issues pertaining to query label Apr 26, 2019

arnikola self-assigned this Apr 26, 2019

arnikola mentioned this issue Apr 29, 2019

[query] add list tags endpoint #1565

Merged

gibbscullen closed this as completed Sep 18, 2020

gibbscullen reopened this Oct 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

M3Query improvements #1590

M3Query improvements #1590

arnikola commented Apr 26, 2019 •

edited

Loading

robskillington commented Apr 29, 2019

arnikola commented Apr 30, 2019

andrewmains12 commented Apr 30, 2019

benraskin92 commented Apr 30, 2019

benraskin92 commented Apr 30, 2019

asafm commented Oct 11, 2020

gibbscullen commented Oct 12, 2020

M3Query improvements #1590

M3Query improvements #1590

Comments

arnikola commented Apr 26, 2019 • edited Loading

General

Add benchmarker and correctness tester

Introduce pooling to intermediary blocks

Series consolidation

Function improvements

Block splitting logic

Future work

robskillington commented Apr 29, 2019

arnikola commented Apr 30, 2019

andrewmains12 commented Apr 30, 2019

benraskin92 commented Apr 30, 2019

benraskin92 commented Apr 30, 2019

asafm commented Oct 11, 2020

gibbscullen commented Oct 12, 2020

arnikola commented Apr 26, 2019 •

edited

Loading