Optimizing use the QueryCache #14028

kkewwei · 2024-12-02T10:19:06Z

Description

In my use-case, I discover the utilization percentage of QueryCache(with a capacity of 3GB and only 50MB used) is extremely low. Most of the queries are as follows:

POST index1/_search
{
   "size": 300,
   "query": {
      "bool": {
         "filter": [
            {
               "terms": {
                  "user_type": [0, 1, 3, 5, 4, 6]
               }
            }
         ],
         "should": [
            {
               "match": {
                  "name_ik": {
                     "query": "ab cd ed gh",
                     "operator": "OR",
                     "analyzer": "ik_max_word",
                  }
               }
            }
         ]
      }
   }
}

should-match condition will match over 500 documents, and the query value keeps changing, due to the should clause, it won't be cached by QueryCache.
filter-terms will match over 100,000,000 documents, the user_type has several fixed values, it will also not be cached because of the skipCacheFactor(100,000,000 / skipCacheFactor > 500).

lucene/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java

Line 773 in 067b472

if (cost / skipCacheFactor > leadCost) {

There seem to be several points for optimization:

When the utilization percentage of QueryCache is not full utilized, can we loosen the restrictions to cache more queries that don't meet the current conditions?
User could be allowed to dynamically modify the skip_factor of QueryCache. Alternatively, this parameter could be deprecated, If a query meets minFrequencyToCache, meaning it is frequent, it should be placed into QueryCache.

Furthermore, if user were able to dynamically adjust maxSize, maxRamBytesUsed in QueryCache, considering that the overhead of such adjustments is rather small.

The text was updated successfully, but these errors were encountered:

kkewwei · 2024-12-05T11:44:52Z

@jpountz, please help confirm when you are free, really appreciate feedback from you.

msokolov · 2024-12-05T12:56:28Z

I believe query cache is highly customizable: QueryCache is just an interface; you can implement whatever you want. And the main impl LRUQueryCache has many extension points, esp. including QueryCachingPolicy. Have you tried to implement what you want using these tools?

As a side note, it's generally not effective or appreciated to call out individuals in public here or on the mailing list unless there is some ongoing dialog. Not sure if Adrien is already engaged with you on this?

kkewwei · 2024-12-05T13:35:18Z

Thank you very much for your reminder. I apologize for my offense and I will not do it again next time.

It seems that the QueryCachingPolicy fails to fulfill my all requirements:

It supports method: shouldCache, which can relax the requirements when there is a large amount of cache space available.
And I also hope that the cache size can be adjusted dynamically, considering that the overhead of such adjustments is rather small for QueryCache.
If a query meets minFrequencyToCache, meaning it is high frequent, it should be placed into QueryCache. As importing skipCacheFactor to avoid slow query down too much, we just increase the minFrequencyToCache to solve the problem.

kkewwei added the type:enhancement label Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing use the QueryCache #14028

Optimizing use the QueryCache #14028

kkewwei commented Dec 2, 2024 •

edited

Loading

kkewwei commented Dec 5, 2024 •

edited

Loading

msokolov commented Dec 5, 2024

kkewwei commented Dec 5, 2024 •

edited

Loading

Optimizing use the QueryCache #14028

Optimizing use the QueryCache #14028

Comments

kkewwei commented Dec 2, 2024 • edited Loading

Description

kkewwei commented Dec 5, 2024 • edited Loading

msokolov commented Dec 5, 2024

kkewwei commented Dec 5, 2024 • edited Loading

kkewwei commented Dec 2, 2024 •

edited

Loading

kkewwei commented Dec 5, 2024 •

edited

Loading

kkewwei commented Dec 5, 2024 •

edited

Loading