Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Matrix Agg RunningStats performance by using field ordinals as hash keys #18691

Closed
nknize opened this issue Jun 1, 2016 · 10 comments
Closed
Labels
:Analytics/Aggregations Aggregations >enhancement feedback_needed good first issue low hanging fruit Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@nknize
Copy link
Contributor

nknize commented Jun 1, 2016

This improvement comes from feedback in PR #18300 where it was suggested to use ordinals as hash keys instead of field names.

@markharwood
Copy link
Contributor

cc @elastic/es-search-aggs

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@imotov
Copy link
Contributor

imotov commented Jun 25, 2020

@nknize is this something that we still want to do?

@nknize nknize added the good first issue low hanging fruit label Jul 23, 2020
@nknize
Copy link
Contributor Author

nknize commented Jul 23, 2020

Yes. I'm marking this as a good first issue since it's not all that difficult.

@santhosh96
Copy link

Hi there! I am new to open source contribution :) Would like to work on this issue. Any context or guidance would be great!

@imrigp
Copy link

imrigp commented Oct 1, 2020

@nik9000
Copy link
Member

nik9000 commented Mar 2, 2021

I think this isn't done. I'd expect all these to be arrays if it were:

    /** per field sum of observations */
    protected HashMap<String, Double> fieldSum;
    /** counts */
    protected HashMap<String, Long> counts;
    /** mean values (first moment) */
    protected HashMap<String, Double> means;
    /** variance values (second moment) */
    protected HashMap<String, Double> variances;
    /** skewness values (third moment) */
    protected HashMap<String, Double> skewness;
    /** kurtosis values (fourth moment) */
    protected HashMap<String, Double> kurtosis;
    /** covariance values */
    protected HashMap<String, HashMap<String, Double>> covariances;

@SheldonWang3000
Copy link

Hi there! I'm new to elasticsearch and open source and I'd like to contribute. Could I start with this one?
And to verify the requirements, do we want to create a new HashMap<String, Integer> to store all field names and the index, then modify all maps from HashMap<String, Double> to HashMap<Integer, Double>? Or maybe modify all maps to ArrayList? And do we want the back compatibility to support reading old structures from the stream?

@ghost
Copy link

ghost commented Aug 16, 2023

Not sure whether it will improve anything, because:

  • we somehow do need to map names to their values, so there will be at least one HashMap
  • the String class' hash() method under the hood caches the calculated values
  • as mappings are added to all HashMaps at once, there will be no duplicate String objects

@yokeshwaran1
Copy link

Hi everyone,
I reviewed the discussions on this issue, Im interested in contributing. Has there been any recent progress by anyone? If not, I like work at improving the Matrix Agg RunningStats performance with the use of field ordinals as hash keys. I understand the concerns of @aabyodj , I like to do some performance tests to verify the improvements. Any additional context or suggestions will be helpful.

@wchaparro
Copy link
Member

Closing as not planned, focus on ES|QL development

@wchaparro wchaparro closed this as not planned Won't fix, can't repro, duplicate, stale Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement feedback_needed good first issue low hanging fruit Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

10 participants