-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
offset
modifier causing OOM
#678
Comments
Update: I can reproduce it with local repo. The process used 3.19 GiB with |
Without
And with
I think that's the problem. It lies in the rewriting of the query at the AST level. But I don't understand the rationale behind the design. Would anyone please help? |
Hi @jacksontj, I think I've fixed it in #681. Could you help take a look when you have time? Thanks a lot! Also, thanks for creating such a tool! It's really amazing! |
Thanks for the report; this is actually a VERY interesting issue. Your PR actually was SUPER helpful in parsing your report here -- and was amazing, thanks! Now I have opened #683 which I believe will fix the issue. A longer description is in the PR but TLDR when I added subquery support I did fail to update the Offset selection for subqueries -- so this should cover it. I did some testing and it seems fine (and I added a test for it), mind giving that PR build a spin to see if that alleviates your issue? If not we can potentially grab some new data and continue from there. |
Thanks for the fix!
No problem! I've tested it. The logs look good!
I think #683 is good to go. BTW, could you elaborate more on the reason for the |
So this is a bit tricky -- and you do raise a good point here. To start off; lets talk about why we do query pushdown -- and it pretty much all comes down to serialization and bandwidth. The initial PoC implementation of promxy (prior to open source) was actually without NodeReplacer -- but the performance was SUPER SLOW. After doing a lot of digging the issue basically just boiled down to the downstream APIs having to serialize large sums of data and ship it across the wire. So the "query push down" is intended to offload some compute but primarily to reduce the amount of data we have to move around. So with that in mind; assuming there are "no holes" pushing down a
You do bring up an excellent point here -- and TBH this is a bit of a tricky one. The initial design for promxy was an aggregating proxy in front of some sharded prometheus hosts. In that model (at least historically) if there are holes it is generally "all series" with holes (i.e. prometheus restart/crash). But there definitely could conciveably be scenarios where a single prometheus host has an issue getting data from a specific endpoint (i.e. network policy that applies to only 1 replica of prometheus) -- and in that case we would see issues with the data calculation. Fundamentally promxy is trying to strike a balance between correctness and performance -- and so far this mix has been sufficiently good. That being said; now that I'm thinking through it a bit more (and describing it out here) we could in theory make the pushdown have some |
Hi!
I have a promxy server to form a HA query frontend against two identical VictoriaMetrics. It worked perfectly!
However, I found that some special queries will make promxy OOM. And I just found a strange thing: when I removed the
offset
modifier in one of the queries, the OOM's gone. Here's the original query:The promxy has 4 GiB to use and it normally uses ~200 MiB. Therefore, I guess that there's some magic behind the offset. Also, I saw this PR that handles the offset specifically.
I'd like to dig into the codes later when I have time, but I think you folks are more professional and can definitely provide more insights so I opened this issue first. I will post more findings if any.
The text was updated successfully, but these errors were encountered: