Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors during SAI discovery #7895

Open
stepanblyschak opened this issue Jun 16, 2021 · 13 comments
Open

Errors during SAI discovery #7895

stepanblyschak opened this issue Jun 16, 2021 · 13 comments
Assignees
Labels
Triaged this issue has been triaged

Comments

@stepanblyschak
Copy link
Collaborator

stepanblyschak commented Jun 16, 2021

Description

SAI discovery is tries to query all attributes on an object, even if some attributes are not valid for some cases.
E.g: querying SAI_BRIDGE_PORT_ATTR_PORT_ID on bridge port of type SAI_BRIDGE_PORT_ATTR_TYPE == SAI_BRIDGE_PORT_TYPE_1Q_ROUTER.
So, SAI discovery does not respect attr's "conditions".

Steps to reproduce the issue:

  1. Boot the system
  2. show log | grep ERR

Describe the results you received:

Apr 17 18:10:26.219615 ptr-sonic-n2-t3 ERR syncd#SDK: [SAI_BRIDGE.ERR] mlnx_sai_bridge.c[1844]- mlnx_bridge_port_lag_or_port_get: Invalid port type - 2
Apr 17 18:10:26.219615 ptr-sonic-n2-t3 ERR syncd#SDK: [SAI_UTILS.ERR] mlnx_sai_utils.c[2216]- get_dispatch_attribs_handler: Failed getting attrib SAI_BRIDGE_PORT_ATTR_PORT_ID
Apr 17 18:10:26.219615 ptr-sonic-n2-t3 ERR syncd#SDK: [SAI_UTILS.ERR] mlnx_sai_utils.c[2334]- sai_get_attributes: Failed attribs dispatch
Apr 17 18:10:26.219615 ptr-sonic-n2-t3 ERR syncd#SDK: [SAI_BRIDGE.ERR] mlnx_sai_bridge.c[1411]- mlnx_bridge_1d_oid_to_data: Unexpected bridge type 0 is not 1D
Apr 17 18:10:26.219615 ptr-sonic-n2-t3 ERR syncd#SDK: [SAI_UTILS.ERR] mlnx_sai_utils.c[2216]- get_dispatch_attribs_handler: Failed getting attrib SAI_BRIDGE_ATTR_UNKNOWN_UNICAST_FLOOD_GROUP
Apr 17 18:10:26.219615 ptr-sonic-n2-t3 ERR syncd#SDK: [SAI_UTILS.ERR] mlnx_sai_utils.c[2334]- sai_get_attributes: Failed attribs dispatch
Apr 17 18:10:26.219615 ptr-sonic-n2-t3 ERR syncd#SDK: [SAI_BRIDGE.ERR] mlnx_sai_bridge.c[1411]- mlnx_bridge_1d_oid_to_data: Unexpected bridge type 0 is not 1D
Apr 17 18:10:26.219615 ptr-sonic-n2-t3 ERR syncd#SDK: [SAI_UTILS.ERR] mlnx_sai_utils.c[2216]- get_dispatch_attribs_handler: Failed getting attrib SAI_BRIDGE_ATTR_UNKNOWN_MULTICAST_FLOOD_GROUP
Apr 17 18:10:26.219615 ptr-sonic-n2-t3 ERR syncd#SDK: [SAI_UTILS.ERR] mlnx_sai_utils.c[2334]- sai_get_attributes: Failed attribs dispatch
Apr 17 18:10:26.221212 ptr-sonic-n2-t3 ERR syncd#SDK: [SAI_BRIDGE.ERR] mlnx_sai_bridge.c[1411]- mlnx_bridge_1d_oid_to_data: Unexpected bridge type 0 is not 1D
Apr 17 18:10:26.226601 ptr-sonic-n2-t3 ERR syncd#SDK: [SAI_UTILS.ERR] mlnx_sai_utils.c[2216]- get_dispatch_attribs_handler: Failed getting attrib SAI_BRIDGE_ATTR_BROADCAST_FLOOD_GROUP
Apr 17 18:10:26.226601 ptr-sonic-n2-t3 ERR syncd#SDK: [SAI_UTILS.ERR] mlnx_sai_utils.c[2334]- sai_get_attributes: Failed attribs dispatch

Describe the results you expected:

No errors

Output of show version:

                                                                                                                                                              
SONiC Software Version: SONiC.202012.96-eddce4d5_Internal                                                                                                     
Distribution: Debian 10.9                                                                                                                                     
Kernel: 4.19.0-12-2-amd64
Build commit: eddce4d5
Build date: Wed Jun  2 01:10:46 UTC 2021
Built by: sw-r2d2-bot@r-build-sonic-ci03

Platform: x86_64-mlnx_msn2700-r0
HwSKU: Mellanox-SN2700-D48C8
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1811X06311
Uptime: 14:00:17 up 39 min,  2 users,  load average: 8.81, 9.00, 9.48

Docker images:
REPOSITORY                    TAG                           IMAGE ID            SIZE
docker-syncd-mlnx             202012.96-eddce4d5_Internal   80883fee84fd        658MB
docker-syncd-mlnx             latest                        80883fee84fd        658MB
docker-snmp                   202012.96-eddce4d5_Internal   663b6805d255        442MB
docker-snmp                   latest                        663b6805d255        442MB
docker-teamd                  202012.96-eddce4d5_Internal   5400fd87e82b        411MB
docker-teamd                  latest                        5400fd87e82b        411MB
docker-nat                    202012.96-eddce4d5_Internal   cc6f1c7e9350        414MB
docker-nat                    latest                        cc6f1c7e9350        414MB
docker-sonic-mgmt-framework   202012.96-eddce4d5_Internal   42d48abb091b        621MB
docker-sonic-mgmt-framework   latest                        42d48abb091b        621MB
docker-router-advertiser      202012.96-eddce4d5_Internal   b1ae04066636        401MB
docker-router-advertiser      latest                        b1ae04066636        401MB
docker-platform-monitor       202012.96-eddce4d5_Internal   6f18d42d19bb        683MB
docker-platform-monitor       latest                        6f18d42d19bb        683MB
docker-lldp                   202012.96-eddce4d5_Internal   3ef06d2267dc        441MB
docker-lldp                   latest                        3ef06d2267dc        441MB
docker-database               202012.96-eddce4d5_Internal   1dea721237af        400MB
docker-database               latest                        1dea721237af        400MB
docker-orchagent              202012.96-eddce4d5_Internal   7996e49811c5        429MB
docker-orchagent              latest                        7996e49811c5        429MB
docker-sonic-telemetry        202012.96-eddce4d5_Internal   23e922307476        490MB
docker-sonic-telemetry        latest                        23e922307476        490MB
docker-fpm-frr                202012.96-eddce4d5_Internal   3002411b0dd8        429MB
docker-fpm-frr                latest                        3002411b0dd8        429MB
docker-dhcp-relay             202012.96-eddce4d5_Internal   66573837b291        407MB
docker-dhcp-relay             latest                        66573837b291        407MB
docker-sflow                  202012.96-eddce4d5_Internal   3905c2645941        412MB
docker-sflow                  latest                        3905c2645941        412MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

Cannot attach dump (some github issue), will retry later.
image

@stepanblyschak
Copy link
Collaborator Author

@kcudnik @lguohan Could you please take a look?

@kcudnik
Copy link
Contributor

kcudnik commented Jun 16, 2021

yes, discovery don't respect conditions, just query all attributes for discovery, this should not be harmful, just a noise in syslog on error, we could enhance that to skip conditional attributes

@kcudnik
Copy link
Contributor

kcudnik commented Jun 22, 2021

@stepanblyschak So after thinking a while, i see there is a problem here, since current discovery logic looks like this:

  • create_switch, then query all "OID" attributes on that switch,
  • then for each discovered OID (if not processed yet) query it's object type,
  • query all OID attributes of that TYPE for that OID,
  • mark oid as processed,
  • repeat until no OID left.

Problem here is, that for a given OID let say BRIDGE or BRIDGE_PORT, or any other type, we only have it's oid, and we don't have NON-OID attributes on which we could potentially deduce, whether next OID-ATTR query evaluate condition to true or false. In order to figure out whether we need to query that specific OID-ATTR (if conditional) we would need first query all it's condition attributes (non oids) and then determine whether condition is false, and don't query that OID attr, which will potentially prevent printing many error logs as you mentioned.

If we for example have 1 conditional oid on BRIDGE_PORT, and it condition have 2 other attributes, we will need to query 2 attributes instead of 1 (if condition is false, and total 3 attributes if condition is true), and we would need to do that for each BRIDGE, since each bridge could have each condition in different state. Overall potentially over all switch we will query more attributes that there are oid attributes. This would cost us some time, and we want to do this operation as fast as possible.

Also notice, that some of OID attributes, maybe not conditional at all, and they maybe just not implemented by that platform, and at this scenario there is no way to tell whether this attribute is supported on given object type, so the actual query is needed. And even if attribute is conditional, and condition is true, still current platform may not support that attr yet.

Looking at SAI increasing more and more attributes, that number of attr will actually grow (non conditional and conditional oid attributes) when new version of SAI headers will be updated, so potentially there will be new and more errors each time that happens.

Can't correctly interpret mlnx errors, whether that particular attributes eg. SAI_BRIDGE_ATTR_UNKNOWN_UNICAST_FLOOD_GROUP is not supported/not implemented since they point right before that "Unexpected bridge type 0 is not 1D". For not supported/implemented attributes, i think this should not be syslog level ERROR in my opinion rather WARNING, since that not implemented/not supported could be intentional behavior, or maybe they just left ERR just to notice.

In your example, could you provide syslog, or a number, how many error logs is produced during single switch_create and discovery? And which attributes are reported, we could then check how many of them are conditional and which are not and see how we could prevent at least those conditional to be queried.

@stepanblyschak
Copy link
Collaborator Author

stepanblyschak commented Jun 23, 2021

syslog.1.gz

@stepanblyschak
Copy link
Collaborator Author

stepanblyschak commented Jun 23, 2021

  1. SAI_BRIDGE_PORT_ATTR_PORT_ID.
  2. SAI_BRIDGE_ATTR_UNKNOWN_UNICAST_FLOOD_GROUP
  3. SAI_BRIDGE_ATTR_UNKNOWN_MULTICAST_FLOOD_GROUP
  4. SAI_BRIDGE_ATTR_BROADCAST_FLOOD_GROUP

These attributes cause errors. First one is Ok to check via condition but 2-4 are valid only for 1D bridge and it does not seem to be in metadata, just a comment in SAI - https://github.com/opencomputeproject/SAI/blob/master/inc/saibridge.h#L502

@kcudnik kcudnik self-assigned this Jun 23, 2021
@kcudnik
Copy link
Contributor

kcudnik commented Jun 23, 2021

this valid for bridge type, is in comment, since SAI does not support mixed conditions yet (im working on it now, and this will be supported later) condition should be then like: @validonly bridge_attr_type == 1D and ( ... )

thanks for attaching syslog, form it i see that there are only those 4 attributes on BRIDGE which are producing errors, since there is only 1 bridge used in sonic (default one) and there are no other errors produced in manner of query OID attributes, i will keep this on my todo list, but with low priority since it's not critical, and i will fix this at some point

@zhangyanzhao zhangyanzhao added the Triaged this issue has been triaged label Jun 23, 2021
@kcudnik
Copy link
Contributor

kcudnik commented Jun 25, 2021

@stepanblyschak I added support for SAI mixed conditions opencomputeproject/SAI#1255 so that validonly could be now fixed for bridge 1D here https://github.com/opencomputeproject/SAI/blob/master/inc/saibridge.h#L502

@kcudnik
Copy link
Contributor

kcudnik commented Jul 4, 2021

I corrected bridge 1D validonly condition, since mixed conditions support was added: opencomputeproject/SAI#1271

@stepanblyschak
Copy link
Collaborator Author

@kcudnik Is there a plan to fix this in syncd?

@kcudnik
Copy link
Contributor

kcudnik commented Sep 15, 2021

@kcudnik Is there a plan to fix this in syncd?

yes, i already replied here: #7895 (comment)

@stepanblyschak
Copy link
Collaborator Author

@kcudnik thanks!

@liat-grozovik
Copy link
Collaborator

@kcudnik , @stepanblyschak where do we stands with this issue on master sonic? is it still open? will it be targeted to the next sonic/sai releases?

@kcudnik
Copy link
Contributor

kcudnik commented Jul 29, 2022

i think this is still opened

roy-sror added a commit to roy-sror/sonic-mgmt that referenced this issue Oct 19, 2023
bingwang-ms pushed a commit to sonic-net/sonic-mgmt that referenced this issue Dec 1, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
#10396)

* Skip error messages for sonic-net/sonic-buildimage#7895
wangxin pushed a commit to sonic-net/sonic-mgmt that referenced this issue Feb 1, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Summary: Skip err msg for the known issue :
sonic-net/sonic-buildimage#7895
sonic-net/sonic-sairedis#582
This is a PR to fix the chery-pick conflict for the : #10396
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

4 participants