Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dualtor] Add script to verify consistency between kernel and ASIC #2840

Merged
merged 19 commits into from
Jul 14, 2023

Conversation

lolyu
Copy link
Contributor

@lolyu lolyu commented May 17, 2023

Work item tracking

  • Microsoft ADO (number only): 22571694

What I did

Add script dualtor_neighbor_check.py to verify the neighbor consistency
based on the mux state. It will have the following output:

NEIGHBOR       MAC                PORT        MUX_STATE    IN_MUX_TOGGLE    NEIGHBOR_IN_ASIC    TUNNERL_IN_ASIC    HWSTATUS
-------------  -----------------  ----------  -----------  ---------------  ------------------  -----------------  ----------
192.168.0.2    ee:86:d8:46:7d:01  Ethernet4   standby      False            no                  yes                consistent
192.168.0.3    86:73:c2:22:bf:02  Ethernet8   standby      False            no                  yes                consistent
192.168.0.24   56:a6:bf:c5:dd:17  Ethernet92  active       False            yes                 no                 consistent
192.168.0.25   3a:18:56:f5:02:18  Ethernet96  active       False            yes                 no                 consistent
192.168.0.100  00:00:00:00:00:00  N/A         N/A          N/A              no                  yes                consistent

Signed-off-by: Longxiang Lyu [email protected]

How I did it

the workflow of this scripts:

  1. for non-zero-mac neighbors in APPL_DB NEIGH_TABLE, use the ASIC_DB fdb entries to find the mux port that it belongs to.
  2. check if the neighbor is consistent with mux state:
    • if mux state is active, the neighbor is consistent only if the neighbor is present in ASIC_DB but no tunnel route.
    • if mux state is standby, the neighbor is consistent only if the tunnel route is present in ASIC_DB1 but no neighbor.
  3. if there are any inconsistent neighbors and the mux port is currently in-toggle, the script will have a non-zero negative return, and will write error messages to logs.

How to verify it

UT and verify on testbed.

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

@prsunny prsunny requested a review from Ndancejic May 18, 2023 22:08
Ndancejic
Ndancejic previously approved these changes May 19, 2023
@lolyu lolyu force-pushed the add_neighbor_check branch from 2765ec3 to 4639182 Compare May 24, 2023 13:17
@lolyu lolyu marked this pull request as ready for review May 24, 2023 13:17
@yxieca
Copy link
Contributor

yxieca commented May 24, 2023

@lolyu Are you able to pass the coverage checker?

@bingwang-ms
Copy link
Contributor

Discussed with Guohan offline, is it better if we integrate this script with existing route_checker.py? Both scripts are for consistency checking.

@lolyu
Copy link
Contributor Author

lolyu commented May 25, 2023

Discussed with Guohan offline, is it better if we integrate this script with existing route_checker.py? Both scripts are for consistency checking.

It is hard to integrate this with route_checker.py.
First, dualtor_neighbor_check.py has more tables to read, and it is based on a Lua script to read the db, and route_check.py uses the swsscommon table subscriptions to read table contents and its updates, its two different design.
Second, the mux states changes much frequently than the routes, especially in the mux flapping scenario, route_check.py reads the tables with swsscommon table subscriptions, so it is hard to capture a snapshot of the database and get accurate check result.

@lolyu lolyu force-pushed the add_neighbor_check branch from 709f676 to 9b3f440 Compare July 10, 2023 05:47
Signed-off-by: Longxiang Lyu <[email protected]>
@lolyu lolyu force-pushed the add_neighbor_check branch from 58f1e71 to a6678a7 Compare July 13, 2023 06:59
lolyu added a commit that referenced this pull request Jul 13, 2023
What I did
As the subject.
This PR is to leave the vlan neighbor route checking to dualtor_neighbor_check.py script: #2840

Signed-off-by: Longxiang Lyu [email protected]

How I did it
If any misses are found on dualtor, ignore those vlan neighbor misses.

How to verify it
UT and verify on testbed.
yxieca pushed a commit that referenced this pull request Jul 13, 2023
What I did
As the subject.
This PR is to leave the vlan neighbor route checking to dualtor_neighbor_check.py script: #2840

Signed-off-by: Longxiang Lyu [email protected]

How I did it
If any misses are found on dualtor, ignore those vlan neighbor misses.

How to verify it
UT and verify on testbed.
@lolyu lolyu requested a review from bingwang-ms July 14, 2023 09:25
@StormLiangMS
Copy link
Contributor

@lolyu do we have ADO for this? Pls add tag for branches which need this fix.

StormLiangMS pushed a commit that referenced this pull request Jul 19, 2023
What I did
As the subject.
This PR is to leave the vlan neighbor route checking to dualtor_neighbor_check.py script: #2840

Signed-off-by: Longxiang Lyu [email protected]

How I did it
If any misses are found on dualtor, ignore those vlan neighbor misses.

How to verify it
UT and verify on testbed.
yxieca pushed a commit that referenced this pull request Jul 21, 2023
…2840)

#### What I did
Add script `dualtor_neighbor_check.py` to verify the neighbor consistency
based on the mux state. It will have the following output:
```
NEIGHBOR       MAC                PORT        MUX_STATE    IN_MUX_TOGGLE    NEIGHBOR_IN_ASIC    TUNNERL_IN_ASIC    HWSTATUS
-------------  -----------------  ----------  -----------  ---------------  ------------------  -----------------  ----------
192.168.0.2    ee:86:d8:46:7d:01  Ethernet4   standby      False            no                  yes                consistent
192.168.0.3    86:73:c2:22:bf:02  Ethernet8   standby      False            no                  yes                consistent
192.168.0.24   56:a6:bf:c5:dd:17  Ethernet92  active       False            yes                 no                 consistent
192.168.0.25   3a:18:56:f5:02:18  Ethernet96  active       False            yes                 no                 consistent
192.168.0.100  00:00:00:00:00:00  N/A         N/A          N/A              no                  yes                consistent
```

Signed-off-by: Longxiang Lyu <[email protected]>

#### How I did it
the workflow of this scripts:
1. for non-zero-mac neighbors in `APPL_DB` `NEIGH_TABLE`, use the `ASIC_DB` fdb entries to find the mux port that it belongs to.
2. check if the neighbor is consistent with mux state:
    * if mux state is `active`, the neighbor is consistent only if the neighbor is present in `ASIC_DB` but no tunnel route.
    * if mux state is `standby`, the neighbor is consistent only if the tunnel route is present in `ASIC_DB`1 but no neighbor.
3. if there are any inconsistent neighbors and the mux port is currently in-toggle, the script will have a non-zero negative return, and will write error messages to logs.


#### How to verify it
UT and verify on testbed.

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)
rajkumar38 pushed a commit to rajkumar38/sonic-utilities that referenced this pull request Jul 25, 2023
What I did
As the subject.
This PR is to leave the vlan neighbor route checking to dualtor_neighbor_check.py script: sonic-net#2840

Signed-off-by: Longxiang Lyu [email protected]

How I did it
If any misses are found on dualtor, ignore those vlan neighbor misses.

How to verify it
UT and verify on testbed.
rajkumar38 pushed a commit to rajkumar38/sonic-utilities that referenced this pull request Jul 25, 2023
…onic-net#2840)

#### What I did
Add script `dualtor_neighbor_check.py` to verify the neighbor consistency
based on the mux state. It will have the following output:
```
NEIGHBOR       MAC                PORT        MUX_STATE    IN_MUX_TOGGLE    NEIGHBOR_IN_ASIC    TUNNERL_IN_ASIC    HWSTATUS
-------------  -----------------  ----------  -----------  ---------------  ------------------  -----------------  ----------
192.168.0.2    ee:86:d8:46:7d:01  Ethernet4   standby      False            no                  yes                consistent
192.168.0.3    86:73:c2:22:bf:02  Ethernet8   standby      False            no                  yes                consistent
192.168.0.24   56:a6:bf:c5:dd:17  Ethernet92  active       False            yes                 no                 consistent
192.168.0.25   3a:18:56:f5:02:18  Ethernet96  active       False            yes                 no                 consistent
192.168.0.100  00:00:00:00:00:00  N/A         N/A          N/A              no                  yes                consistent
```

Signed-off-by: Longxiang Lyu <[email protected]>

#### How I did it
the workflow of this scripts:
1. for non-zero-mac neighbors in `APPL_DB` `NEIGH_TABLE`, use the `ASIC_DB` fdb entries to find the mux port that it belongs to.
2. check if the neighbor is consistent with mux state:
    * if mux state is `active`, the neighbor is consistent only if the neighbor is present in `ASIC_DB` but no tunnel route.
    * if mux state is `standby`, the neighbor is consistent only if the tunnel route is present in `ASIC_DB`1 but no neighbor.
3. if there are any inconsistent neighbors and the mux port is currently in-toggle, the script will have a non-zero negative return, and will write error messages to logs.


#### How to verify it
UT and verify on testbed.

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)
StormLiangMS pushed a commit that referenced this pull request Aug 6, 2023
…2840)

#### What I did
Add script `dualtor_neighbor_check.py` to verify the neighbor consistency
based on the mux state. It will have the following output:
```
NEIGHBOR       MAC                PORT        MUX_STATE    IN_MUX_TOGGLE    NEIGHBOR_IN_ASIC    TUNNERL_IN_ASIC    HWSTATUS
-------------  -----------------  ----------  -----------  ---------------  ------------------  -----------------  ----------
192.168.0.2    ee:86:d8:46:7d:01  Ethernet4   standby      False            no                  yes                consistent
192.168.0.3    86:73:c2:22:bf:02  Ethernet8   standby      False            no                  yes                consistent
192.168.0.24   56:a6:bf:c5:dd:17  Ethernet92  active       False            yes                 no                 consistent
192.168.0.25   3a:18:56:f5:02:18  Ethernet96  active       False            yes                 no                 consistent
192.168.0.100  00:00:00:00:00:00  N/A         N/A          N/A              no                  yes                consistent
```

Signed-off-by: Longxiang Lyu <[email protected]>

#### How I did it
the workflow of this scripts:
1. for non-zero-mac neighbors in `APPL_DB` `NEIGH_TABLE`, use the `ASIC_DB` fdb entries to find the mux port that it belongs to.
2. check if the neighbor is consistent with mux state:
    * if mux state is `active`, the neighbor is consistent only if the neighbor is present in `ASIC_DB` but no tunnel route.
    * if mux state is `standby`, the neighbor is consistent only if the tunnel route is present in `ASIC_DB`1 but no neighbor.
3. if there are any inconsistent neighbors and the mux port is currently in-toggle, the script will have a non-zero negative return, and will write error messages to logs.


#### How to verify it
UT and verify on testbed.

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)
pdhruv-marvell pushed a commit to pdhruv-marvell/sonic-utilities that referenced this pull request Aug 23, 2023
What I did
As the subject.
This PR is to leave the vlan neighbor route checking to dualtor_neighbor_check.py script: sonic-net#2840

Signed-off-by: Longxiang Lyu [email protected]

How I did it
If any misses are found on dualtor, ignore those vlan neighbor misses.

How to verify it
UT and verify on testbed.
pdhruv-marvell pushed a commit to pdhruv-marvell/sonic-utilities that referenced this pull request Aug 23, 2023
…onic-net#2840)

#### What I did
Add script `dualtor_neighbor_check.py` to verify the neighbor consistency
based on the mux state. It will have the following output:
```
NEIGHBOR       MAC                PORT        MUX_STATE    IN_MUX_TOGGLE    NEIGHBOR_IN_ASIC    TUNNERL_IN_ASIC    HWSTATUS
-------------  -----------------  ----------  -----------  ---------------  ------------------  -----------------  ----------
192.168.0.2    ee:86:d8:46:7d:01  Ethernet4   standby      False            no                  yes                consistent
192.168.0.3    86:73:c2:22:bf:02  Ethernet8   standby      False            no                  yes                consistent
192.168.0.24   56:a6:bf:c5:dd:17  Ethernet92  active       False            yes                 no                 consistent
192.168.0.25   3a:18:56:f5:02:18  Ethernet96  active       False            yes                 no                 consistent
192.168.0.100  00:00:00:00:00:00  N/A         N/A          N/A              no                  yes                consistent
```

Signed-off-by: Longxiang Lyu <[email protected]>

#### How I did it
the workflow of this scripts:
1. for non-zero-mac neighbors in `APPL_DB` `NEIGH_TABLE`, use the `ASIC_DB` fdb entries to find the mux port that it belongs to.
2. check if the neighbor is consistent with mux state:
    * if mux state is `active`, the neighbor is consistent only if the neighbor is present in `ASIC_DB` but no tunnel route.
    * if mux state is `standby`, the neighbor is consistent only if the tunnel route is present in `ASIC_DB`1 but no neighbor.
3. if there are any inconsistent neighbors and the mux port is currently in-toggle, the script will have a non-zero negative return, and will write error messages to logs.


#### How to verify it
UT and verify on testbed.

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)
@lolyu
Copy link
Contributor Author

lolyu commented Nov 30, 2023

Hi @yxieca, could you please help cherry-pick this into 202211 branch?

yxieca pushed a commit that referenced this pull request Nov 30, 2023
…2840)

#### What I did
Add script `dualtor_neighbor_check.py` to verify the neighbor consistency
based on the mux state. It will have the following output:
```
NEIGHBOR       MAC                PORT        MUX_STATE    IN_MUX_TOGGLE    NEIGHBOR_IN_ASIC    TUNNERL_IN_ASIC    HWSTATUS
-------------  -----------------  ----------  -----------  ---------------  ------------------  -----------------  ----------
192.168.0.2    ee:86:d8:46:7d:01  Ethernet4   standby      False            no                  yes                consistent
192.168.0.3    86:73:c2:22:bf:02  Ethernet8   standby      False            no                  yes                consistent
192.168.0.24   56:a6:bf:c5:dd:17  Ethernet92  active       False            yes                 no                 consistent
192.168.0.25   3a:18:56:f5:02:18  Ethernet96  active       False            yes                 no                 consistent
192.168.0.100  00:00:00:00:00:00  N/A         N/A          N/A              no                  yes                consistent
```

Signed-off-by: Longxiang Lyu <[email protected]>

#### How I did it
the workflow of this scripts:
1. for non-zero-mac neighbors in `APPL_DB` `NEIGH_TABLE`, use the `ASIC_DB` fdb entries to find the mux port that it belongs to.
2. check if the neighbor is consistent with mux state:
    * if mux state is `active`, the neighbor is consistent only if the neighbor is present in `ASIC_DB` but no tunnel route.
    * if mux state is `standby`, the neighbor is consistent only if the tunnel route is present in `ASIC_DB`1 but no neighbor.
3. if there are any inconsistent neighbors and the mux port is currently in-toggle, the script will have a non-zero negative return, and will write error messages to logs.


#### How to verify it
UT and verify on testbed.

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants