[BUG] Memory usage steady growing over time #597

Thorsieger · 2024-07-01T07:55:05Z

Describe the bug
I am experiencing a slow but steady memory leak which forces a service restart every week or so.

Logs
Memory usage over time on the physical server :

pprof (on one instance) :

~ 20 minutes after startup :

Showing nodes accounting for 1917.94MB, 99.01% of 1937.09MB total
Dropped 34 nodes (cum <= 9.69MB)
Showing top 10 nodes out of 45
      flat  flat%   sum%        cum   cum%
 1025.26MB 52.93% 52.93%  1025.26MB 52.93%  github.com/dgryski/go-trigram.NewIndex
  618.16MB 31.91% 84.84%   618.16MB 31.91%  strings.(*Builder).grow (inline)
  101.44MB  5.24% 90.08%   101.44MB  5.24%  github.com/go-graphite/go-carbon/cache.(*Cache).Add
  101.05MB  5.22% 95.29%   101.05MB  5.22%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).updateFileList.func3
      22MB  1.14% 96.43%   138.94MB  7.17%  github.com/go-graphite/go-carbon/receiver/tcp.(*TCP).HandleConnection
   17.23MB  0.89% 97.32%    17.23MB  0.89%  github.com/go-graphite/go-carbon/cache.(*Cache).makeQueue
      15MB  0.77% 98.09%       15MB  0.77%  github.com/go-graphite/go-carbon/points.OnePoint (inline)
   14.16MB  0.73% 98.82%    14.16MB  0.73%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).getExpandedGlobs
       2MB   0.1% 98.93%    37.12MB  1.92%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).fetchWithCache.func1
    1.62MB 0.084% 99.01%    35.12MB  1.81%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).prepareDataProto

~ 15 hours after startup :

      flat  flat%   sum%        cum   cum%
 2481.68MB 21.47% 21.47%  2481.68MB 21.47%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).getExpandedGlobs
 1952.18MB 16.89% 38.36%  1952.18MB 16.89%  github.com/go-graphite/protocol/carbonapi_v3_pb.(*FetchRequest).UnmarshalVT
 1882.28MB 16.28% 54.64%  1882.28MB 16.28%  strings.(*Builder).grow
 1313.07MB 11.36% 66.00%  3921.19MB 33.92%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).insert
 1205.57MB 10.43% 76.43%  1205.57MB 10.43%  github.com/go-graphite/go-carbon/carbonserver.newFileNode (inline)
  931.52MB  8.06% 84.49%   931.52MB  8.06%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).addChild (inline)
  561.05MB  4.85% 89.35%   561.05MB  4.85%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).fullPath
  471.03MB  4.08% 93.42%   471.03MB  4.08%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).newDir (inline)
  184.90MB  1.60% 95.02%   184.90MB  1.60%  github.com/go-graphite/go-carbon/cache.(*Cache).Add
     161MB  1.39% 96.41%   767.56MB  6.64%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).expandGlobsTrie

Today (~ 90 hours after startup) :

Showing nodes accounting for 42091.68MB, 94.62% of 44485.60MB total
Dropped 171 nodes (cum <= 222.43MB)
Showing top 10 nodes out of 44
      flat  flat%   sum%        cum   cum%
12231.80MB 27.50% 27.50% 12231.80MB 27.50%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).getExpandedGlobs
 9761.41MB 21.94% 49.44%  9761.41MB 21.94%  github.com/go-graphite/protocol/carbonapi_v3_pb.(*FetchRequest).UnmarshalVT
 9495.70MB 21.35% 70.78%  9495.70MB 21.35%  strings.(*Builder).grow
 2663.25MB  5.99% 76.77%  2663.25MB  5.99%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).fullPath
 2278.12MB  5.12% 81.89%  2278.12MB  5.12%  github.com/go-graphite/go-carbon/carbonserver.newFileNode (inline)
 2016.61MB  4.53% 86.43%  6359.81MB 14.30%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).insert
 1363.03MB  3.06% 89.49%  1363.03MB  3.06%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).addChild (inline)
  830.51MB  1.87% 91.36%  4875.24MB 10.96%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).expandGlobsTrie
  749.19MB  1.68% 93.04%   751.19MB  1.69%  github.com/go-graphite/go-carbon/carbonserver.newGlobState
  702.04MB  1.58% 94.62%   702.04MB  1.58%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).newDir (inline)

Go-carbon Configuration:

[common]
user = "carbon"
graph-prefix = "carbon.agents.{host}"
metric-endpoint = "tcp://10.254.0.36:2003"
metric-interval = "1m0s"
max-cpu = 6

[whisper]
data-dir = "/var/lib/graphite/whisper"
schemas-file = "/etc/go-carbon/storage-schemas.conf"
aggregation-file = "/etc/go-carbon/storage-aggregation.conf"
workers = 8
max-updates-per-second = 10000
max-creates-per-second = 500
hard-max-creates-per-second = false
sparse-create = false
flock = true
enabled = true
hash-filenames = true
compressed = false
remove-empty-file = false

[cache]
max-size = 50000000
write-strategy = "noop"

[udp]
listen = ":2003"
enabled = false
buffer-size = 0

[tcp]
listen = ":2003"
enabled = true
buffer-size = 0

[pickle]
listen = ":2004"
max-message-size = 67108864
enabled = false
buffer-size = 0

[carbonlink]
listen = "127.0.0.1:7002"
enabled = true
read-timeout = "30s"

[grpc]
listen = "127.0.0.1:7003"
enabled = false

[tags]
enabled = false
tagdb-url = "http://127.0.0.1:8000"
tagdb-chunk-size = 32
tagdb-update-interval = 100
local-dir = "/var/lib/graphite/tagging/"
tagdb-timeout = "1s"

[carbonserver]
listen = "0.0.0.0:8080"
enabled = true
buckets = 10
metrics-as-counters = false
read-timeout = "60s"
write-timeout = "60s"
query-cache-enabled = false
query-cache-size-mb = 40960
find-cache-enabled = true
trigram-index = false
scan-frequency = "5m0s"
trie-index = true
file-list-cache = ""
concurrent-index = false
realtime-index = 0
cache-scan = false
max-globs = 600
fail-on-max-globs = false
max-metrics-globbed  = 30000
max-metrics-rendered = 1000
empty-result-ok = false
internal-stats-dir = ""
stats-percentiles = [99, 98, 95, 75, 50]

[dump]
enabled = false
path = "/var/lib/graphite/dump/"
restore-per-second = 0

[pprof]
listen = "localhost:7007"
enabled = true

[[logging]]
logger = ""
file = "stdout"
level = "info"
encoding = "json"
encoding-time = "iso8601"
encoding-duration = "seconds"

Metric retention and aggregation schemas
N/A

Simplified query (if applicable)
N/A

Additional context
I have a graphite infrastructure that handle 2.4M metrics/minutes. The storage part is composed of 4 go-carbon instances behind a carbon-c-relay. This 4 storages nodes are on a single physical server : 32 cpu/512GB ram/NVME storage.

go-carbon version : ghcr.io/go-graphite/go-carbon:0.17.3

After checking existing issues, I tried both trie and/or trigram for indexes with no effect. I enabled pprof, the output is above.

may be related to #579

The text was updated successfully, but these errors were encountered:

Thorsieger · 2024-07-03T13:01:15Z

and today :

Showing nodes accounting for 66.14GB, 94.50% of 70GB total
Dropped 165 nodes (cum <= 0.35GB)
Showing top 10 nodes out of 52
      flat  flat%   sum%        cum   cum%
   20.40GB 29.14% 29.14%    20.40GB 29.14%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).getExpandedGlobs
   16.35GB 23.36% 52.50%    16.35GB 23.36%  github.com/go-graphite/protocol/carbonapi_v3_pb.(*FetchRequest).UnmarshalVT
   15.97GB 22.81% 75.31%    15.97GB 22.81%  strings.(*Builder).grow
    4.29GB  6.14% 81.45%     4.29GB  6.14%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).fullPath
    2.87GB  4.10% 85.55%     2.87GB  4.10%  github.com/go-graphite/go-carbon/carbonserver.newFileNode (inline)
    2.22GB  3.17% 88.72%     7.32GB 10.45%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).insert
    1.44GB  2.06% 90.78%     1.44GB  2.06%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).addChild (inline)
    1.36GB  1.95% 92.73%     6.24GB  8.92%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).expandGlobsTrie
    0.78GB  1.12% 93.85%     0.78GB  1.12%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).newDir (inline)
    0.45GB  0.65% 94.50%     4.88GB  6.97%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).query

If you need more information, please ask ;)

deniszh · 2024-07-05T15:51:48Z

Hi @Thorsieger
Yes, pprofs is quite convincing - looks like there's memory leak in getExpandedGlobs and polssibly in UnmarshalVT. Need to be investigated.
Maybe less noticable for us because we're doing deploy every month, at least.

deniszh · 2024-07-05T15:52:27Z

And looks like big "max-metrics-globbed" is main grow driver too.

cxfcxf · 2024-07-09T01:13:57Z

i can confirm this happening as well

after upgrade, it slowly leaking memory

we have huge memory pool left, so it can take a while to do it.

upgraded from v0.15.6, and that version does not have memory leak since we have been using it for years.

deniszh · 2024-07-12T09:09:16Z

Another thing which I suspect is carbonapi_v3_pb. carbonapi_v3_pb support in go-carbon is quite raw and I'm not sure if bug-free. We're still using carbonapi_v2_pb in prod.

Thorsieger · 2024-07-15T09:22:38Z

I'm actually using carbonapi v0.16.1 with auto backend's protocol (either carbonapi_v2_pb or carbonapi_v3_pb).

I'm not aware of differences between this two protocols, are they drop-in replacement ?
In that case I could try to switch my configuration to try.

deniszh · 2024-07-15T22:01:31Z

Auto means v3 if supported, otherwise v2. V3 supports some metadata and different requests, like Multiglob, which very likely causing that leak - as you can see major usage coming from glob function and v3 unmarshal.
But I'm afraid you need to test v2 first - it would work, but speed can be different.

Thorsieger · 2024-07-30T14:53:55Z

I tried to replicate the issue in a staging environment (only ~ 40k metrics/mins) without success. Replaying similar requests on this env did not result in the same problem.

As API V2 and V3 do not support the same requests, I will not be able to test directly in production. If my customers are using Multiglob functions, I cannot break this way of operating for debugging purpose.

Is their anything else I can do to help pinpoint the source of the problem ?

deniszh · 2024-07-31T11:21:21Z

As API V2 and V3 do not support the same requests, I will not be able to test directly in production. If my customers are using Multiglob functions, I cannot break this way of operating for debugging purpose. What do you mean? There should be no difference in functionality with go-carbon and carbonapi in use of v2 or v3 Вт, 30 июля 2024 г. в 16:54, Thorsieger ***@***.***>:

…

I tried to replicate the issue in a staging environment (only ~ 40k metrics/mins) without success. Replaying similar requests on this env did not result in the same problem. As API V2 and V3 do not support the same requests, I will not be able to test directly in production. If my customers are using Multiglob functions, I cannot break this way of operating for debugging purpose. Is their anything else I can do to help pinpoint the source of the problem ? — Reply to this email directly, view it on GitHub <#597 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJLTVT3CSHP4UEKK5XLGO3ZO6SJRAVCNFSM6AAAAABKFAQ5RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJYGU2TEMRWGE> . You are receiving this because you commented.Message ID: ***@***.***>

cxfcxf · 2024-07-31T23:43:57Z

Auto means v3 if supported, otherwise v2. V3 supports some metadata and different requests, like Multiglob, which very likely causing that leak - as you can see major usage coming from glob function and v3 unmarshal. But I'm afraid you need to test v2 first - it would work, but speed can be different.

i can run some test, but currently we reverted to go-carbon 0.15.6 with carbonapi 0.16.0~1

we have the same memory leak issue with go-carbon 0.17.3, but when i reverted to 0.15.6 it no longer leaks

should both of them go through carbonapi_v3_pb as well? wondering why go-carbon 0.15.6 is not leaking with carbonapi_v3_pb, or you mean when it compiled with problem carbonapi_v3_pb, causing it to leak in 0.17.3?

deniszh · 2024-08-01T07:38:32Z

@cxfcxf : well, if it's not leaking with 0.15.6 but leaking on 0.17.3 with same carbonapi and using v3 then my hypothesis is wrong and it's code change in go-carbon instead.
Unfortunately, between these versions is too many commits to find bad commit manually. If you able to quickly reproduce issue then you can find bad commit using git bisect, but it would require 8-10 steps, which is not easy.

Thorsieger · 2024-09-02T07:22:53Z

Hello, I try to downgrade to 0.17.1 (cannot got further because it contain a bugfix I need) and the memory leak is still present.

I hope that help. That's still 278 commit and 1014 files changes 😬.

deniszh · 2024-09-02T07:34:05Z

Hi Thorsieger, Thanks for an attempt but I doubt it help. I suspecting that issue lies in carbonapi v3 implementation. Did you try graphiteapi with explicit v2 protocol instead? Пн, 2 сент. 2024 г. в 09:23, Thorsieger ***@***.***>:

…

Hello, I try to downgrade to 0.17.1 (cannot got further because it contain a bugfix I need) and the memory leak is still present. I hope that help. That's still 278 commit and 1014 files changes 😬. — Reply to this email directly, view it on GitHub <#597 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJLTVVA5AE7WR2ZK4HW7E3ZUQG6FAVCNFSM6AAAAABKFAQ5RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRTHE4TQOJTGM> . You are receiving this because you commented.Message ID: ***@***.***>

Thorsieger · 2024-09-13T11:09:11Z

Hi deniszh,

I have setup yesterday graphiteapi using backend protocol carbonapi_v2_pb and then restart my gocarbon ( 0.17.3 ).

here the first result for memory usage :

pprof just after restart :

Showing nodes accounting for 6218.28MB, 94.23% of 6598.91MB total
Dropped 107 nodes (cum <= 32.99MB)
Showing top 10 nodes out of 45
      flat  flat%   sum%        cum   cum%
 1445.08MB 21.90% 21.90%  4070.69MB 61.69%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).insert
 1121.06MB 16.99% 38.89%  1121.06MB 16.99%  github.com/go-graphite/go-carbon/carbonserver.newFileNode (inline)
 1017.52MB 15.42% 54.31%  1017.52MB 15.42%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).addChild (inline)
  664.67MB 10.07% 64.38%   664.67MB 10.07%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).getExpandedGlobs
  534.02MB  8.09% 72.47%   534.02MB  8.09%  strings.(*Builder).grow
  511.82MB  7.76% 80.23%   513.34MB  7.78%  net/textproto.(*Reader).ReadLine
  487.03MB  7.38% 87.61%   487.03MB  7.38%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).newDir (inline)
  189.05MB  2.86% 90.47%   191.05MB  2.90%  github.com/go-graphite/go-carbon/carbonserver.newGlobState
  127.51MB  1.93% 92.41%   127.51MB  1.93%  github.com/go-graphite/go-carbon/cache.(*Cache).Add
  120.51MB  1.83% 94.23%   120.51MB  1.83%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).fullPath

pprof now (~ 16 hours after startup) :

Showing nodes accounting for 20587.76MB, 95.93% of 21460.81MB total
Dropped 123 nodes (cum <= 107.30MB)
Showing top 10 nodes out of 43
      flat  flat%   sum%        cum   cum%
 5221.39MB 24.33% 24.33%  5221.39MB 24.33%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).getExpandedGlobs
 4189.95MB 19.52% 43.85%  4191.46MB 19.53%  net/textproto.(*Reader).ReadLine
 3908.99MB 18.21% 62.07%  3908.99MB 18.21%  strings.(*Builder).grow
 1989.61MB  9.27% 71.34%  5801.28MB 27.03%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).insert
 1742.09MB  8.12% 79.46%  1742.09MB  8.12%  github.com/go-graphite/go-carbon/carbonserver.newFileNode (inline)
 1384.03MB  6.45% 85.91%  1384.03MB  6.45%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).addChild (inline)
  943.59MB  4.40% 90.30%   943.59MB  4.40%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).fullPath
  685.54MB  3.19% 93.50%   685.54MB  3.19%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).newDir (inline)
  305.54MB  1.42% 94.92%  1504.26MB  7.01%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).expandGlobsTrie
  217.03MB  1.01% 95.93%   217.03MB  1.01%  github.com/go-graphite/go-carbon/cache.(*Cache).Add

memory usage seems to still grow over time :/

Thorsieger · 2024-09-16T07:57:24Z

if that help, the last pprof from today :

Showing nodes accounting for 78.15GB, 94.55% of 82.66GB total
Dropped 182 nodes (cum <= 0.41GB)
Showing top 10 nodes out of 44
      flat  flat%   sum%        cum   cum%
   25.28GB 30.58% 30.58%    25.28GB 30.58%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).getExpandedGlobs
   20.34GB 24.60% 55.18%    20.35GB 24.61%  net/textproto.(*Reader).ReadLine
   19.43GB 23.51% 78.69%    19.43GB 23.51%  strings.(*Builder).grow
    4.51GB  5.46% 84.15%     4.51GB  5.46%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).fullPath
    2.40GB  2.91% 87.06%     2.40GB  2.91%  github.com/go-graphite/go-carbon/carbonserver.newFileNode (inline)
    1.78GB  2.16% 89.22%     6.02GB  7.28%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).insert
    1.43GB  1.72% 90.95%     8.02GB  9.70%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).expandGlobsTrie
    1.17GB  1.41% 92.36%     1.17GB  1.41%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).addChild (inline)
    1.06GB  1.28% 93.64%     1.06GB  1.29%  github.com/go-graphite/go-carbon/carbonserver.newGlobState
    0.75GB   0.9% 94.55%     6.59GB  7.97%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).query

deniszh · 2024-09-17T14:44:15Z

@Thorsieger : thanks for your data. Unfortunately, I still have no ideas why it's behaves like this in your case but not in our:
:(

(pprof) top
Showing nodes accounting for 6312.24MB, 81.07% of 7785.89MB total
Dropped 195 nodes (cum <= 38.93MB)
Showing top 10 nodes out of 75
      flat  flat%   sum%        cum   cum%
 1402.58MB 18.01% 18.01%  3780.18MB 48.55%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).insert
 1042.02MB 13.38% 31.40%  1042.02MB 13.38%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).addChild (inline)
 1038.74MB 13.34% 44.74%  1038.74MB 13.34%  github.com/go-graphite/protocol/carbonapi_v2_pb.(*MultiFetchResponse).MarshalVT
  848.05MB 10.89% 55.63%   848.05MB 10.89%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).newDir (inline)
  526.79MB  6.77% 62.40%   686.33MB  8.82%  github.com/go-graphite/go-carbon/cache.(*Cache).Add
  487.53MB  6.26% 68.66%   487.53MB  6.26%  github.com/go-graphite/go-carbon/carbonserver.newFileNode (inline)
  291.65MB  3.75% 72.40%   291.65MB  3.75%  github.com/go-graphite/protocol/carbonapi_v2_pb.(*GlobResponse).MarshalVT
  251.04MB  3.22% 75.63%   251.04MB  3.22%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).fullPath
  239.55MB  3.08% 78.71%  1104.07MB 14.18%  github.com/go-graphite/go-carbon/receiver/tcp.(*TCP).HandleConnection
  184.29MB  2.37% 81.07%  1319.65MB 16.95%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).prepareDataProto

Thorsieger · 2024-09-18T12:18:09Z

The main difference is maybe that we have more api request than you ? We have ~ 9k requests/minutes pic each day

api requests from last 7 days

I found that some of our clients are making globed render requests. Example :

avg(stats.timers.product.api.*.median)
sumSeriesWithWildcards(stats.counters.product.api.jobs.*.*.count, 7)

Maybe it has a link to getExpandedGlobs calls ?

deniszh · 2024-09-18T12:30:49Z

The main difference is maybe that we have more api request than you ? We have ~ 9k requests/minutes pic each day

Server (from which I posted yesterday stats) have 1.5K requests per second, so around 90K/min. But we're using own fork of carbonapi, not sure how that could be related to getExpandedGlobs call.
Probably that's only reasonable explanation.

I found that some of our clients are making globed render requests

That's absolutely normal behaviour of carbonapi, getExpandedGlobs doing exactly that. The question why it holds memory and not releasing it back. :(

Thorsieger · 2024-09-18T13:03:16Z

I there anything else I can provide you to help to find out ?

deniszh · 2024-09-19T09:30:37Z

@Thorsieger : I'm afraid not. :(
Without ability to reproduce it I can't bisect bad commit and do anything with it.

Thorsieger · 2024-10-09T14:49:02Z

Hello,

I may have found the bad commit : 676cb0e#diff-9f4dfc723a0b5109df1e60ae2ae68dd14bc00ee1fb515362f9bcc63e81d81bbfR261

It has been merged en v0.17.0 and relate to expandedGlobsCache. It states that there is no memory limit for the cache and that it can't be disabled.

Can you look a way to disable globscache maybe ? Or if you find any problem with this code ?

Additionally we found that there is no cleanup call for expandedGlobsCache or find request , it only exists for query cache :

go-carbon/carbonserver/carbonserver.go

Line 1927 in a5c9c55

    
           go listener.queryCache.ec.StoppableApproximateCleaner(10*time.Second, listener.exitChan)

Maybe adding this cleanup for other caches could help ?

deniszh · 2024-10-10T09:18:38Z

Yes, disabling it probably would not work, but adding cleanup and configurable size limit should do work.
Thanks a lot for finding this! Will prepare a fix

deniszh · 2024-10-11T16:56:03Z

Should be fixed in v0.18.0

Thorsieger · 2024-10-17T15:42:07Z

After 3 days running v0.18.0 we have this memory usage :

pprof at start:

(pprof) text
Showing nodes accounting for 6218.28MB, 94.23% of 6598.91MB total
Dropped 107 nodes (cum <= 32.99MB)
Showing top 10 nodes out of 45
      flat  flat%   sum%        cum   cum%
 1445.08MB 21.90% 21.90%  4070.69MB 61.69%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).insert
 1121.06MB 16.99% 38.89%  1121.06MB 16.99%  github.com/go-graphite/go-carbon/carbonserver.newFileNode (inline)
 1017.52MB 15.42% 54.31%  1017.52MB 15.42%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).addChild (inline)
  664.67MB 10.07% 64.38%   664.67MB 10.07%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).getExpandedGlobs
  534.02MB  8.09% 72.47%   534.02MB  8.09%  strings.(*Builder).grow
  511.82MB  7.76% 80.23%   513.34MB  7.78%  net/textproto.(*Reader).ReadLine
  487.03MB  7.38% 87.61%   487.03MB  7.38%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).newDir (inline)
  189.05MB  2.86% 90.47%   191.05MB  2.90%  github.com/go-graphite/go-carbon/carbonserver.newGlobState
  127.51MB  1.93% 92.41%   127.51MB  1.93%  github.com/go-graphite/go-carbon/cache.(*Cache).Add
  120.51MB  1.83% 94.23%   120.51MB  1.83%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).fullPath

pprof now :

(pprof) text
Showing nodes accounting for 1.88GB, 91.77% of 2.05GB total
Dropped 128 nodes (cum <= 0.01GB)
Showing top 10 nodes out of 55
      flat  flat%   sum%        cum   cum%
    0.54GB 26.32% 26.32%     1.44GB 70.09%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).insert
    0.36GB 17.77% 44.10%     0.36GB 17.77%  github.com/go-graphite/go-carbon/carbonserver.newFileNode (inline)
    0.35GB 17.18% 61.27%     0.35GB 17.18%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).addChild (inline)
    0.18GB  8.81% 70.09%     0.18GB  8.81%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).newDir (inline)
    0.14GB  6.76% 76.84%     0.14GB  6.76%  github.com/go-graphite/go-carbon/cache.(*Cache).Add
    0.09GB  4.57% 81.42%     0.09GB  4.57%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).getExpandedGlobs
    0.07GB  3.39% 84.80%     0.07GB  3.39%  strings.(*Builder).grow
    0.07GB  3.31% 88.12%     0.07GB  3.31%  github.com/go-graphite/go-carbon/carbonserver.newGlobState
    0.06GB  2.74% 90.86%     0.06GB  2.74%  net/textproto.(*Reader).ReadLine
    0.02GB  0.91% 91.77%     0.11GB  5.27%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).query

it's quite convincing that the memory is correctly released

Thank's a lot for the support !

Thorsieger added the bug label Jul 1, 2024

deniszh mentioned this issue Oct 10, 2024

Find/glob cache fixes #643

Merged

Thorsieger closed this as completed Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Memory usage steady growing over time #597

[BUG] Memory usage steady growing over time #597

Thorsieger commented Jul 1, 2024

Thorsieger commented Jul 3, 2024

deniszh commented Jul 5, 2024

deniszh commented Jul 5, 2024

cxfcxf commented Jul 9, 2024 •

edited

Loading

deniszh commented Jul 12, 2024

Thorsieger commented Jul 15, 2024

deniszh commented Jul 15, 2024

Thorsieger commented Jul 30, 2024

deniszh commented Jul 31, 2024 via email

cxfcxf commented Jul 31, 2024

deniszh commented Aug 1, 2024

Thorsieger commented Sep 2, 2024

deniszh commented Sep 2, 2024 via email

Thorsieger commented Sep 13, 2024

Thorsieger commented Sep 16, 2024

deniszh commented Sep 17, 2024

Thorsieger commented Sep 18, 2024

deniszh commented Sep 18, 2024

Thorsieger commented Sep 18, 2024

deniszh commented Sep 19, 2024

Thorsieger commented Oct 9, 2024 •

edited

Loading

deniszh commented Oct 10, 2024

deniszh commented Oct 11, 2024

Thorsieger commented Oct 17, 2024

[BUG] Memory usage steady growing over time #597

[BUG] Memory usage steady growing over time #597

Comments

Thorsieger commented Jul 1, 2024

Thorsieger commented Jul 3, 2024

deniszh commented Jul 5, 2024

deniszh commented Jul 5, 2024

cxfcxf commented Jul 9, 2024 • edited Loading

deniszh commented Jul 12, 2024

Thorsieger commented Jul 15, 2024

deniszh commented Jul 15, 2024

Thorsieger commented Jul 30, 2024

deniszh commented Jul 31, 2024 via email

cxfcxf commented Jul 31, 2024

deniszh commented Aug 1, 2024

Thorsieger commented Sep 2, 2024

deniszh commented Sep 2, 2024 via email

Thorsieger commented Sep 13, 2024

Thorsieger commented Sep 16, 2024

deniszh commented Sep 17, 2024

Thorsieger commented Sep 18, 2024

deniszh commented Sep 18, 2024

Thorsieger commented Sep 18, 2024

deniszh commented Sep 19, 2024

Thorsieger commented Oct 9, 2024 • edited Loading

deniszh commented Oct 10, 2024

deniszh commented Oct 11, 2024

Thorsieger commented Oct 17, 2024

cxfcxf commented Jul 9, 2024 •

edited

Loading

Thorsieger commented Oct 9, 2024 •

edited

Loading