Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nvidia] Skip SAI discovery on ports on fast-boot #1416

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion saiasiccmp/SaiSwitchAsic.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,8 @@ std::set<sai_object_id_t> SaiSwitchAsic::getWarmBootDiscoveredVids() const

void SaiSwitchAsic::onPostPortCreate(
_In_ sai_object_id_t port_rid,
_In_ sai_object_id_t port_vid)
_In_ sai_object_id_t port_vid,
_In_ bool discoverPortObjects)
{
SWSS_LOG_ENTER();

Expand Down
3 changes: 2 additions & 1 deletion saiasiccmp/SaiSwitchAsic.h
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,8 @@ namespace saiasiccmp

virtual void onPostPortCreate(
_In_ sai_object_id_t port_rid,
_In_ sai_object_id_t port_vid) override;
_In_ sai_object_id_t port_vid,
_In_ bool discoverPortObjects = true) override;

virtual void postPortRemove(
_In_ sai_object_id_t portRid) override;
Expand Down
4 changes: 4 additions & 0 deletions syncd/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,10 @@ if SONIC_ASIC_PLATFORM_BROADCOM
libSyncd_a_CXXFLAGS += -DMDIO_ACCESS_USE_NPU
endif

if SONIC_ASIC_PLATFORM_MELLANOX
libSyncd_a_CPPFLAGS += -DSKIP_SAI_PORT_DISCOVERY_ON_FAST_BOOT
endif

libSyncdRequestShutdown_a_SOURCES = \
RequestShutdown.cpp \
RequestShutdownCommandLineOptions.cpp \
Expand Down
56 changes: 30 additions & 26 deletions syncd/SaiSwitch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -948,46 +948,50 @@ void SaiSwitch::redisUpdatePortLaneMap(

void SaiSwitch::onPostPortCreate(
_In_ sai_object_id_t port_rid,
_In_ sai_object_id_t port_vid)
_In_ sai_object_id_t port_vid,
_In_ bool discoverPortObjects)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like this eintrie change in this function is overcomplicated, it sholud be something like this:

if (object_type(oid) == SAI_OBJECT_TYPE_PORT && shouldSkipPorts)
  continue;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kcudnik
Regarding object_type(oid) == SAI_OBJECT_TYPE_PORT, the function is called onPostPortCreate so unless someone is calling it on object other than port I don't think this check is needed.

Do you mean early return? Like:

redisUpdatePortLaneMap(port_rid);

if (!discoverPortObjects)
{
    return;
}

...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i thoung you also modify discover process, since it will also discover all objects on all ports, so i guess on cold boot you only need onpostportcreate, but this could still crash on next fast-boot

please do couple of fst-boot to fast-boot reboots with your patch to see if this will wrok

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at taht code, you only need to modify SaiDiscovery process with flag to ignore port discovery, no else code is needed to be changed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and ig you look on master, in SaiDiscovery.cpp file at line 34, you can actually pass new flag - to not discover port objects over VendorSaiOptions class to not forward all bool arguments to discover ports, if you want

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, since we don want to skip port ddetection on other platforms than yours

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and you can disable those ports in init script for your platform only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kcudnik Platform is known at compile time, syncd is compiled differently for different platforms. My change as is right now should not affect other platforms. This change purpose is to improve startup time, however with platform detection done in script I will add some additional CPU cycles for that. Even though it is very small, on some lower systems the init scripts execution time is worse, that's why I am leaning towards moving all to compile time if possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's know at compile time, put this:

#ifdef nvidia
if (object_type(oid) == SAI_OBJECT_TYPE_PORT)
  continue;
#endif

in si discovery

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add log warn message, that discovery port was disabled on nvidia platform

{
SWSS_LOG_ENTER();

SaiDiscovery sd(m_vendorSai);
if (discoverPortObjects)
{
SaiDiscovery sd(m_vendorSai);

auto discovered = sd.discover(port_rid);
auto discovered = sd.discover(port_rid);

auto defaultOidMap = sd.getDefaultOidMap();
auto defaultOidMap = sd.getDefaultOidMap();

// we need to merge default oid maps
// we need to merge default oid maps

for (auto& kvp: defaultOidMap)
{
for (auto& it: kvp.second)
for (auto& kvp: defaultOidMap)
{
m_defaultOidMap[kvp.first][it.first] = it.second;
for (auto& it: kvp.second)
{
m_defaultOidMap[kvp.first][it.first] = it.second;
}
}
}

SWSS_LOG_NOTICE("discovered %zu new objects (including port) after creating port VID: %s",
discovered.size(),
sai_serialize_object_id(port_vid).c_str());
SWSS_LOG_NOTICE("discovered %zu new objects (including port) after creating port VID: %s",
discovered.size(),
sai_serialize_object_id(port_vid).c_str());

m_discovered_rids.insert(discovered.begin(), discovered.end());
m_discovered_rids.insert(discovered.begin(), discovered.end());

SWSS_LOG_NOTICE("putting ALL new discovered objects to redis for port %s",
sai_serialize_object_id(port_vid).c_str());
SWSS_LOG_NOTICE("putting ALL new discovered objects to redis for port %s",
sai_serialize_object_id(port_vid).c_str());

for (sai_object_id_t rid: discovered)
{
/*
* We also could thing of optimizing this since it's one call to redis
* per rid, and probably this should be ATOMIC.
*
* NOTE: We are also storing read only object's here, like default
* virtual router, CPU, default trap group, etc.
*/
for (sai_object_id_t rid: discovered)
{
/*
* We also could thing of optimizing this since it's one call to redis
* per rid, and probably this should be ATOMIC.
*
* NOTE: We are also storing read only object's here, like default
* virtual router, CPU, default trap group, etc.
*/

redisSetDummyAsicStateForRealObjectId(rid);
redisSetDummyAsicStateForRealObjectId(rid);
}
}

redisUpdatePortLaneMap(port_rid);
Expand Down
5 changes: 3 additions & 2 deletions syncd/SaiSwitch.h
Original file line number Diff line number Diff line change
Expand Up @@ -184,11 +184,12 @@ namespace syncd
*
* Performs actions needed after port creation. Will discover new
* queues, ipgs and scheduler groups that belong to new created port,
* and updated ASIC DB accordingly.
* and updated ASIC DB accordingly when discoverPortObjects is true.
*/
virtual void onPostPortCreate(
_In_ sai_object_id_t port_rid,
_In_ sai_object_id_t port_vid) override;
_In_ sai_object_id_t port_vid,
_In_ bool discoverPortObjects = true) override;

/**
* @brief Post port remove.
Expand Down
3 changes: 2 additions & 1 deletion syncd/SaiSwitchInterface.h
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,8 @@ namespace syncd

virtual void onPostPortCreate(
_In_ sai_object_id_t port_rid,
_In_ sai_object_id_t port_vid) = 0;
_In_ sai_object_id_t port_vid,
_In_ bool discoverPortObjects = true) = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very strict to ports, if we decide later on to do something similar on other objects then this is not optimal solution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is meant to be used on ports. Considering current approach, I assume there will be onPostXCreate() functions for other object types. Then, if needed, they can accept a boolean flag in the same way. This is simple and gives required granularity.


virtual void postPortRemove(
_In_ sai_object_id_t portRid) = 0;
Expand Down
24 changes: 22 additions & 2 deletions syncd/Syncd.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2023,7 +2023,7 @@ sai_status_t Syncd::processBulkOidCreate(

if (objectType == SAI_OBJECT_TYPE_PORT)
{
m_switches.at(switchVid)->onPostPortCreate(objectRids[idx], objectVids[idx]);
m_switches.at(switchVid)->onPostPortCreate(objectRids[idx], objectVids[idx], shouldDiscoverPortObjects());
}
}
}
Expand Down Expand Up @@ -3152,7 +3152,7 @@ sai_status_t Syncd::processOidCreate(

if (objectType == SAI_OBJECT_TYPE_PORT)
{
m_switches.at(switchVid)->onPostPortCreate(objectRid, objectVid);
m_switches.at(switchVid)->onPostPortCreate(objectRid, objectVid, shouldDiscoverPortObjects());
}
}

Expand Down Expand Up @@ -5338,3 +5338,23 @@ syncd_restart_type_t Syncd::handleRestartQuery(

return RequestShutdownCommandLineOptions::stringToRestartType(op);
}

bool Syncd::shouldDiscoverPortObjects() const
{
SWSS_LOG_ENTER();

#ifdef SKIP_SAI_PORT_DISCOVERY_ON_FAST_BOOT
const bool discoverPortObjectsInFastBoot = false;
#else
const bool discoverPortObjectsInFastBoot = true;
#endif
Comment on lines +5346 to +5350
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fast boot cak be initiated after code was compiled which then this check will be hardcoded

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also there are no tests for testing this code

Copy link
Contributor Author

@stepanblyschak stepanblyschak Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fast boot cak be initiated after code was compiled which then this check will be hardcoded

This was the intention. For Nvidia - skip discover on ports in fast boot. The runtime check for fast boot is done in the condition below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be runtime check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kcudnik What is the benefit of runtime check here? Syncd is compiled per platform and on Nvidia we do not want to run discovery. We know this at compile time.


// Comparing with m_veryFirstRun, so that we only skip discovery when switch is fast booting
// and not after it finished fast boot (e.g. port breakout after fast-reboot).
if ((m_commandLineOptions->m_startType == SAI_START_TYPE_FAST_BOOT) && m_veryFirstRun)
{
return discoverPortObjectsInFastBoot;
}

return true;
}
2 changes: 2 additions & 0 deletions syncd/Syncd.h
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,8 @@ namespace syncd
syncd_restart_type_t handleRestartQuery(
_In_ swss::NotificationConsumer &restartQuery);

bool shouldDiscoverPortObjects() const;

private:

/**
Expand Down
Loading