-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue #5157 by identifying the dependency among objects and avoiding releasing an object still being referenced #1440
Conversation
…ing releasing an object still being referenced The issue is caused by the OA receives notification in an different order in which they were sent. OA doesn't have any dependency check try notifying sai-redis to release an object which is still being referenced, which causes sai-redis complain and the object leaks. The idea is to introduce a mechanism to identify the dependency thus preventing a referenced object from being released. 1. Introduce a new type representing the dependency among variant type of objects, including the following fields: - m_objsDependingOnMe, a set representing the objects that references the current object. eg. BUFFER_PROFILE.ingress_lossless_profile references BUFFER_POOL.ingress_lossless_pool - m_objsReferencingByMe, a map from a field of the current object's to the object name it references. 2. When a field of an object A has been updated with referencing another object B, - obj[A.m_objsReferencingByMe[field name]].m_objsDependingOnMe.remove(A) - A.m_objsReferencingByMe[field name] = B 3. When a an object A is about to be removed, - if obj.m_objsDependingOnMe isn't empty set, return task_need_retry else execute the normal remove flow. Signed-off-by: Stephen Sun <[email protected]>
This pull request introduces 1 alert when merging a0bcda7 into 5b0f7be - view on LGTM.com new alerts:
|
Signed-off-by: Stephen Sun <[email protected]>
retest vs, please |
Signed-off-by: Stephen Sun <[email protected]>
Hi @neethajohn , |
@stephenxs can you fix this:
i get this error on master with all enabled errors, im not sure why this don't pop up on jenkins build, my gcc: gcc version 9.1.0 (GCC) |
Hi @kcudnik , |
ok, can you add then this option co configure? |
As per latest update in DPB DOC, fixed this bug previously we had string value in "breakout_modes" key so it was not matching the whole string, But after the update via, now "breakout_modes" contain a dictionary where key is the breakout_mode and value is the alias. So we can easily check whether the key is present or not. Signed-off-by: Sangita Maity <[email protected]> Co-authored-by: Guohan Lu <[email protected]>
What I did
The issue is caused by the orchagent receiving notification in a different order in which they were sent.
orchagent doesn't have any dependency check and try notifying sai-redis to release an object which is still being referenced,
which causes sai-redis to complain and the object leaks.
The idea is to introduce a mechanism to identify the dependency thus preventing a referenced object from being released.
Signed-off-by: Stephen Sun [email protected]
Why I did it
Details if related
Introduce a new type representing the dependency among the variant type of objects, including the following fields:
eg. given the following dependencies,
BUFFER_PROFILE.ingress_lossless_profile
referencesBUFFER_POOL.ingress_lossless_pool
,BUFFER_PROFILE.pg_lossless_25000_5m_profile
referencesBUFFER_POOL.ingress_lossless_pool
the objects should be like this:
When a field of an object
A
B
nowB'
,We will have the following steps:
A.m_objsReferencingByMe[field name]
].m_objsDependingOnMe.remove(A
)where obj[
A.m_objsReferencingByMe[field name]
] should beB
A.m_objsReferencingByMe[field name]
=B'
B'.m_objsDependingOnMe.add(A)
When an object
A
is about to be removed,A.m_objsDependingOnMe
isn't empty set, returntask_need_retry
A
is referencing someone else:A.m_objsReferencingByMe
: obj[A.m_objsReferencingByMe[field]].m_objsDependingOnMe.remove(A
)Reference count vs. set
When it comes to recording the references, the reference count approach is the solution a lot of developers come up with.
A frequently asked question is why the set is used to record dependency instead of the reference count.
So I'd like to compare these two approaches.
As we don't have a big number of objects to trace and want to provide more detailed info, I prefer the set approach.
How I verified it
to create a buffer pool, a buffer profile and two buffer pgs in order and then remove the profile and then the pool and at last the two PGs. in this case, the profile and pool can't be removed because of non-zero referencing.
hset BUFFER_POOL|ingress_test_pool type ingress mode dynamic size 14542848
. the log is like following:hset BUFFER_PROFILE|ingress_test_profile dynamic_th 1 pool [BUFFER_POOL|ingress_test_pool] size 0
.hset BUFFER_PG|Ethernet8|1 profile [BUFFER_PROFILE|ingress_test_profile]
.hset BUFFER_PG|Ethernet16|2 profile [BUFFER_PROFILE|ingress_test_profile]
hdel BUFFER_PROFILE|ingress_test_profile dynamic_th pool size
. It can't be removed because of dependency.hdel BUFFER_POOL|ingress_test_pool type mode size
. It can't be removed because of dependency.hdel BUFFER_PG|Ethernet8|1 profile
. the reference count of buffer profile is reduced to 1.hdel BUFFER_PG|Ethernet16|2 profile
. Now the reference of the profile is decreased to 0 and then the pool. Eventually, all objects are removed.create the same objects and destroy buffer PGs first and then buffer profiles and at last buffer pools. In this case the objects are destroyed in order.
hset BUFFER_POOL|ingress_test_pool type ingress mode dynamic size 14542848
hset BUFFER_PROFILE|ingress_test_profile dynamic_th 1 pool [BUFFER_POOL|ingress_test_pool] size 0
hset BUFFER_PG|Ethernet8|1 profile [BUFFER_PROFILE|ingress_test_profile]
andhset BUFFER_PG|Ethernet16|2 profile [BUFFER_PROFILE|ingress_test_profile]
hdel BUFFER_PG|Ethernet16|2 profile
andhdel BUFFER_PG|Ethernet8|1 profile
hdel BUFFER_PROFILE|ingress_test_profile dynamic_th pool size
hdel BUFFER_POOL|ingress_test_pool type mode size