Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] chip-tool memory leak for interactive mode when run cluster command #34221

Closed
wilbur-se opened this issue Jul 6, 2024 · 13 comments
Closed
Labels
bug Something isn't working chip-tool needs triage

Comments

@wilbur-se
Copy link
Contributor

Reproduction steps

1.run chip-tool with interactive server/start mode
2.run any cluster command like "any command-by-id 6 0 '{}' 1 1" or "onoff on 1 1" many times (e.g. 100)
3.the memory usage for chip-tool is creasing especially for subcommand "any"

no effect for "read" and "subscribe"

Bug prevalence

always

GitHub hash of the SDK that was being used

all

Platform

other

Platform Version(s)

No response

Anything else?

found two memory issue after debugging:

  1. for class CustomArgument, it will has 4k memory alloc for each time when parse command augument with json data. And the object for this class will never deconstruct except for chip-tool exist

  2. it will call mCommissionerStorage.Init() for each time when run cluster command (for the case that no augurment "commissioner-nodeid")

CHIPCommand::RunQueuedCommand() -> EnsureCommissionerForIdentity() -> GetIdentityNodeId() -> mCommissionerStorage.Init()

@wilbur-se wilbur-se added bug Something isn't working needs triage labels Jul 6, 2024
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jul 23, 2024
We were not properly cleaning up the buffer in a CustomArgument when a command
using it finished.

Addresses the main part of project-chip#34221
@bzbarsky-apple
Copy link
Contributor

@wilbur-se Sorry for the lag here, but I finally got a chance to look into this. The first issue you list is absolutely an issue! Created #34461 to fix that.

For the second bit (the multiple init of mCommissionerStorage), I am not seeing an obvious leak in there. While it does re-init the object, it should just be throwing away the old data and re-reading the data from disk as needed. Do you have more information on what memory leaks you are seeing in there?

mergify bot pushed a commit that referenced this issue Jul 23, 2024
We were not properly cleaning up the buffer in a CustomArgument when a command
using it finished.

Addresses the main part of #34221
@wilbur-se
Copy link
Contributor Author

@bzbarsky-apple Thank you for the fix.

If "--commissioner-nodeid" exist for chip-tool command, will skip mCommissionerStorage.Init (CHIPCommand::GetIdentityNodeId), no momery usage continuously increase.
If no this option, momery usage will continuously increase when execute chip-tool command like 1000 times.

Sorry, I am not C++ expert. Maybe "mConfig.parse(ifs)" (PersistentStorage::Init) allocate memory again? I am not sure

@bzbarsky-apple
Copy link
Contributor

If no this option, momery usage will continuously increase when execute chip-tool command like 1000 times.

That's the part I am having trouble reproducing. I'll try again....

@bzbarsky-apple
Copy link
Contributor

@wilbur-se Which exact memory use statistic are you measuring that is increasing? I've just tried running onoff on 1 1 a large number of times, and the total memory malloc'ed oscillates some but does not seem to be regularly growing....

@wilbur-se
Copy link
Contributor Author

@bzbarsky-apple here is my test step and environment

matter SDK 1.3.0.0 and build on ubuntu 22.04

I monitor the VmRSS in file /proc//status.

  1. start chip-tool: chip-tool interactive start

  2. commission matter lighting to chip-tool fabric (on the same machine)

  3. run script to send "onoff on 1 1" 1000 times

  • first run script: 44288 kB to 56704 kB (12416 kB)
  • second run script: 56704 kB to 69120 kB (12416 kB)
  • third run script: 69120 kB to 81408 kB (12288 kB)

change the command in step 3 to "onoff on 1 1 --commissioner-nodeid 112233"

  • first run script: 81408 kB to 81536 kB (128 kB)
  • second run script: 81536 kB to 81536 kB (0 kB)
  • third run script: 81536 kB to 81536 kB (0 kB)
  • fourth run script: 81536 kB to 81536 kB ()

You can see it has significant different for my setup.
I put the script here that is run for secureCRT

`
#$language = "VBScript"
#$interface = "1.0"

crt.Screen.Synchronous = Flase

Sub Main
TestCountMax = 1000
i = 1
Do While i <= TestCountMax
crt.Screen.Send "onoff on 1 1"
'crt.Screen.Send "onoff on 1 1 --commissioner-nodeid 112233"
crt.Sleep 100
crt.Screen.Send chr(13)
crt.Sleep 200
i = i+1
Loop
End Sub
`

@bzbarsky-apple
Copy link
Contributor

OK. It's possible this is something Linux-specific; I am testing on Mac and I am not seeing increases in memory allocated, but I am not measuring a metric that would include fragmentation (which VmRSS probably measures).

@andy31415

This comment was marked as outdated.

@andy31415
Copy link
Contributor

andy31415 commented Jul 25, 2024

My script was buggy because >>> occurs in the regular output of the program. Here is the fixed version:

#!/usr/bin/env python3

# This test assumes:
#   - `rm /tmp/chip_*` done for a clean start
#   - App running on the same machine (e.g lighting app)

import click
import logging
import pexpect
import sys


@click.command()
@click.option("--chip-tool", default="out/linux-x64-chip-tool-boringssl/chip-tool", show_default=True)
@click.option("--log-output", default=None, help="Where to write the log for the session. May use 'console'")
@click.option("--iterations", default=10000, help="number of commands to run")
def main(chip_tool, log_output, iterations):
    logging.basicConfig(
        level=logging.INFO,
        format="%(asctime)s %(name)-12s %(levelname)-8s %(message)s",
    )

    logfile = None
    if log_output == "console":
        logfile = sys.stdout.buffer
    else:
        logfile = open(log_output, 'wb')

    # Step 0: start up chip-tool
    chip_tool = pexpect.spawn(f"{chip_tool} interactive start", logfile=logfile)
    chip_tool.sendline("")
    chip_tool.expect_exact("\n>>> ")
    logging.info("Chip-tool ready. Starting test.")


    # Step 1: commission
    chip_tool.sendline("pairing onnetwork 1 20202021")
    chip_tool.expect_exact("\n>>> ")
    logging.info("Pairing complete.")

    for i in range(iterations):
        logging.info("Running iteration %d", i+1)
        chip_tool.sendline("onoff toggle 1 1 --commissioner-nodeid 112233")
        chip_tool.expect_exact("\n>>> ")

    logging.info("Terminating.")
    chip_tool.terminate(force=True)


if __name__ == "__main__":
    main()

@andy31415
Copy link
Contributor

No changes after 1500 toggles:

image

@andy31415
Copy link
Contributor

This was with commissionernodeid. Trying without...

@andy31415
Copy link
Contributor

@wilbur-se I cannot reproduce this on ToT (4423b3a)

Your script does not seem to wait for the reply from chip-tool (the '\n>>> '). Could it be that the I/O buffering somehow increases? Althought I don't think that should be it.

In any case, for me even after 1000 runs (tried both on and toggle) the VmRSS stays stable. Could you re-try the test with my script?

@wilbur-se
Copy link
Contributor Author

wilbur-se commented Jul 26, 2024

@andy31415 , I add delay to start next command in my script, and the resposne is very fast (10ms).

I tested chip-tool(1.3.0.0) by your python script, it has same result with my previous test.

@andy31415 @bzbarsky-apple I checked the master branch and compare the code, it has bellow changes:
1721964411689

After apply the changes to 1.3.0.0, no memory growing again. So maybe we can close this.

#26028
#33838

@bzbarsky-apple
Copy link
Contributor

Ah! #33838 indeed fixed memory growth due to re-parsing the config file that was described above. So looks like this is fixed now.

j-ororke pushed a commit to j-ororke/connectedhomeip that referenced this issue Jul 31, 2024
We were not properly cleaning up the buffer in a CustomArgument when a command
using it finished.

Addresses the main part of project-chip#34221
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working chip-tool needs triage
Projects
None yet
Development

No branches or pull requests

3 participants