Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak issue in ConfigDBConnector. #655

Merged
merged 6 commits into from
Aug 1, 2022

Conversation

liuh-80
Copy link
Contributor

@liuh-80 liuh-80 commented Jul 27, 2022

Why I did it

Fix memory leak issue in ConfigDBConnector:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer:

void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on)
{
    m_db_name = db_name;
    m_key_separator = m_table_name_separator = get_db_separator(db_name);
    SonicV2Connector_Native::connect(m_db_name, retry_on);

    if (wait_for_init)
    {
        auto& client = get_redis_client(m_db_name);
        auto pubsub = client.pubsub(); <== this pointer not delete later.

Also change DBConnector::pubsub() to deprecated for none SWIG scenario.

How I did it

Change DBConnector::pubsub() to return a smart pointer.

How to verify it

Pass all test case.

Run following code in python and validate there is no epoll and socket leak:

import gc
from swsscommon import swsscommon
config_db = swsscommon.ConfigDBConnector_Native()
config_db.connect()
config_db.connect()
config_db.connect()

gc.collect()

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@@ -174,7 +174,7 @@ class DBConnector : public RedisContext
/* Create new context to DB */
DBConnector *newConnector(unsigned int timeout) const;

PubSub *pubsub();
std::shared_ptr<PubSub> pubsub();
Copy link
Contributor Author

@liuh-80 liuh-80 Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a draft PR to validate this change will not break existed code:
sonic-net/sonic-buildimage#11540

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build image PR passed all test.

@liuh-80 liuh-80 marked this pull request as ready for review July 28, 2022 00:12
@liuh-80 liuh-80 requested a review from qiluo-msft July 28, 2022 00:12
common/dbconnector.h Outdated Show resolved Hide resolved
@@ -24,7 +24,7 @@ void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bo
if (wait_for_init)
{
auto& client = get_redis_client(m_db_name);
auto pubsub = client.pubsub();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code change here because pubsub() mark as deprecated.

qiluo-msft
qiluo-msft previously approved these changes Jul 28, 2022
@liuh-80
Copy link
Contributor Author

liuh-80 commented Jul 28, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@qiluo-msft
Copy link
Contributor

qiluo-msft commented Jul 28, 2022

Please check this unit test failure

tests/test_countertable.py .swig/python detected a memory leak of type 'std::string *', no destructor found.

In reply to: 1198490986

@liuh-80
Copy link
Contributor Author

liuh-80 commented Jul 29, 2022

Please check this unit test failure

tests/test_countertable.py .swig/python detected a memory leak of type 'std::string *', no destructor found.

Fixed, False alert because UT time out. add a new UT to cover the changed code.

@liuh-80 liuh-80 merged commit 2247dbe into sonic-net:master Aug 1, 2022
zbud-msft added a commit to renukamanavalan/sonic-swss-common that referenced this pull request Aug 3, 2022
* Fix memory leak issue in ConfigDBConnector. (sonic-net#655)

#### Why I did it
Fix memory leak issue in ConfigDBConnector:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer:

```
void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on)
{
    m_db_name = db_name;
    m_key_separator = m_table_name_separator = get_db_separator(db_name);
    SonicV2Connector_Native::connect(m_db_name, retry_on);

    if (wait_for_init)
    {
        auto& client = get_redis_client(m_db_name);
        auto pubsub = client.pubsub(); <== this pointer not delete later.
```

Also change DBConnector::pubsub() to deprecated for none SWIG scenario.

#### How I did it
Change DBConnector::pubsub() to return a smart pointer.

#### How to verify it
Pass all test case.

Run following code in python and validate there is no epoll and socket leak:
```
import gc
from swsscommon import swsscommon
config_db = swsscommon.ConfigDBConnector_Native()
config_db.connect()
config_db.connect()
config_db.connect()

gc.collect()
```
#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [x] 202111

#### Description for the changelog
Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

#### Link to config_db schema for YANG module changes

#### A picture of a cute animal (not mandatory but encouraged)


Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>

* Transfer organization from Azure to sonic-net (sonic-net#656)

Transfer organization from Azure to sonic-net

* Add docker-mux related table names  (sonic-net#627)

This PR is to add table name definitions for database entries used by docker mux processes.

Sign-off: Jing Zhang [email protected]

* Add libzmq dependency

* Add test logs

* Add libzmq3-dev dependency

* Add libboost-serialization and uuid-dev dependencies

* Add boost and uuid

* Add installation before building docker

Co-authored-by: Hua Liu <[email protected]>
Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
Co-authored-by: Liu Shilong <[email protected]>
Co-authored-by: Jing Zhang <[email protected]>
Co-authored-by: Ubuntu <zain@zb-dev-vm.022x1jpnpm4u1iy2d325acts3c.yx.internal.cloudapp.net>
@SuvarnaMeenakshi
Copy link

@liuh-80 should change be included in 202205 branch?

@qiluo-msft
Copy link
Contributor

@liuh-80 Do we need to backport to older branches like 202012?

@yxieca
Copy link
Contributor

yxieca commented Sep 21, 2022

@liuh-80 this change cannot be cherry-picked to 202205 branch cleanly, can you raise a separate PR?

@abdosi abdosi added the Chassis label Oct 22, 2022
@abdosi
Copy link
Contributor

abdosi commented Oct 22, 2022

@liuh-80 please create PR for 2022205
cc @SuvarnaMeenakshi for viz.

liuh-80 added a commit to liuh-80/sonic-swss-common that referenced this pull request Nov 8, 2022
Fix memory leak issue in ConfigDBConnector:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer:

```
void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on)
{
    m_db_name = db_name;
    m_key_separator = m_table_name_separator = get_db_separator(db_name);
    SonicV2Connector_Native::connect(m_db_name, retry_on);

    if (wait_for_init)
    {
        auto& client = get_redis_client(m_db_name);
        auto pubsub = client.pubsub(); <== this pointer not delete later.
```

Also change DBConnector::pubsub() to deprecated for none SWIG scenario.

Change DBConnector::pubsub() to return a smart pointer.

Pass all test case.

Run following code in python and validate there is no epoll and socket leak:
```
import gc
from swsscommon import swsscommon
config_db = swsscommon.ConfigDBConnector_Native()
config_db.connect()
config_db.connect()
config_db.connect()

gc.collect()
```

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [x] 202111

Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
lukasstockner pushed a commit to genesiscloud/sonic-swss-common that referenced this pull request Nov 10, 2022
Fix memory leak issue in ConfigDBConnector:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer:

```
void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on)
{
    m_db_name = db_name;
    m_key_separator = m_table_name_separator = get_db_separator(db_name);
    SonicV2Connector_Native::connect(m_db_name, retry_on);

    if (wait_for_init)
    {
        auto& client = get_redis_client(m_db_name);
        auto pubsub = client.pubsub(); <== this pointer not delete later.
```

Also change DBConnector::pubsub() to deprecated for none SWIG scenario.

Change DBConnector::pubsub() to return a smart pointer.

Pass all test case.

Run following code in python and validate there is no epoll and socket leak:
```
import gc
from swsscommon import swsscommon
config_db = swsscommon.ConfigDBConnector_Native()
config_db.connect()
config_db.connect()
config_db.connect()

gc.collect()
```

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [x] 202111

Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
@abdosi
Copy link
Contributor

abdosi commented Nov 11, 2022

@liuh-80 can we have PR for 2022205. This is very critical for Chassis/multi-asic environment.

cc @rlhui for viz

@SuvarnaMeenakshi
Copy link

@liuh-80 can we have PR for 2022205. This is very critical for Chassis/multi-asic environment.

cc @rlhui for viz

@abdosi There is a PR for 202205 #706.

liuh-80 added a commit that referenced this pull request Nov 14, 2022
Fix memory leak issue in ConfigDBConnector:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer:

```
void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on)
{
    m_db_name = db_name;
    m_key_separator = m_table_name_separator = get_db_separator(db_name);
    SonicV2Connector_Native::connect(m_db_name, retry_on);

    if (wait_for_init)
    {
        auto& client = get_redis_client(m_db_name);
        auto pubsub = client.pubsub(); <== this pointer not delete later.
```

Also change DBConnector::pubsub() to deprecated for none SWIG scenario.

Change DBConnector::pubsub() to return a smart pointer.

Pass all test case.

Run following code in python and validate there is no epoll and socket leak:
```
import gc
from swsscommon import swsscommon
config_db = swsscommon.ConfigDBConnector_Native()
config_db.connect()
config_db.connect()
config_db.connect()

gc.collect()
```

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [x] 202111

Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>

Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants