Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CMIS]Improved 400G link bring up sequence #254

Merged
merged 11 commits into from
Jun 28, 2022
21 changes: 16 additions & 5 deletions sonic-xcvrd/xcvrd/xcvrd.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import time
import datetime
import subprocess
import argparse

from sonic_py_common import daemon_base, device_info, logger
from sonic_py_common import multi_asic
Expand Down Expand Up @@ -936,15 +937,15 @@ class CmisManagerTask:
CMIS_STATE_REMOVED = 'REMOVED'
CMIS_STATE_FAILED = 'FAILED'

def __init__(self, port_mapping):
def __init__(self, port_mapping, skip_cmis_mgr=False):
self.task_stopping_event = multiprocessing.Event()
self.task_process = None
self.port_dict = {}
self.port_mapping = copy.deepcopy(port_mapping)
self.xcvr_table_helper = XcvrTableHelper()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If XcvrTableHelper is implemented as a class member, it should be initialized in task_worker instead of __init__.
This is because CmisManagerTask is a subprocess wrapper, which means a subprocess will be forked once task_run is called.
If it is initialized in __init__, the flow is

  • parent creates the XcvrTableHelper along with all the low level objects, like the sockets based on which the tables are accessed
  • and then the child is forked, all resources opened by the parent are duplicated
  • now there are two copies of all the resources but the parent's one will not really be used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, initializing it in __init__ and then duplicating it from parent to child should also work. But it looks like taking a detour to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs what is the concern here? only the worker thread is going to use the socket, not the parent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xcvr_table_helper is initialized in the parent and then duplicated in the client. but it will never be used by the parent.
so the resource allocated in the parent is wasted.

  • if it is initialized in __init__:
    • xcvr_table_helper initialized in parent. resource allocated for it but not used.
    • duplicated in the child during fork
  • if it is initialized in task_worker
    • xcvr_table_helper is initialized in the child before it starts to work.
    • no resource was allocated in the parent.

as I mentioned, it also works in the current way. so I'm ok if you do not decide to change it.

self.isPortInitDone = False
self.isPortConfigDone = False

self.skip_cmis_mgr = skip_cmis_mgr

def log_notice(self, message):
helper_logger.log_notice("CMIS: {}".format(message))
Expand Down Expand Up @@ -1444,6 +1445,10 @@ def task_run(self):
self.log_notice("Platform chassis is not available, stopping...")
return

if self.skip_cmis_mgr:
self.log_notice("Skipping CMIS Task Manager")
return

self.task_process = multiprocessing.Process(target=self.task_worker)
if self.task_process is not None:
self.task_process.start()
Expand Down Expand Up @@ -1816,6 +1821,7 @@ def task_run(self, sfp_error_event):
if self.task_stopping_event.is_set():
return


self.task_process = multiprocessing.Process(target=self.task_worker, args=(
self.task_stopping_event, sfp_error_event))
self.task_process.start()
Expand Down Expand Up @@ -1980,10 +1986,11 @@ def retry_eeprom_reading(self):


class DaemonXcvrd(daemon_base.DaemonBase):
def __init__(self, log_identifier):
def __init__(self, log_identifier, skip_cmis_mgr=False):
super(DaemonXcvrd, self).__init__(log_identifier)
self.stop_event = threading.Event()
self.sfp_error_event = multiprocessing.Event()
self.skip_cmis_mgr = skip_cmis_mgr

# Signal handler
def signal_handler(self, sig, frame):
Expand Down Expand Up @@ -2126,7 +2133,7 @@ def run(self):
port_mapping_data, retry_eeprom_set = self.init()

# Start the CMIS manager
cmis_manager = CmisManagerTask(port_mapping_data)
cmis_manager = CmisManagerTask(port_mapping_data, self.skip_cmis_mgr)
cmis_manager.task_run()

# Start the dom sensor info update thread
Expand Down Expand Up @@ -2214,7 +2221,11 @@ def get_state_port_tbl(self, asic_id):


def main():
xcvrd = DaemonXcvrd(SYSLOG_IDENTIFIER)
parser = argparse.ArgumentParser()
parser.add_argument('--skip_cmis_mgr', action='store_true')
prgeor marked this conversation as resolved.
Show resolved Hide resolved

args = parser.parse_args()
xcvrd = DaemonXcvrd(SYSLOG_IDENTIFIER, args.skip_cmis_mgr)
xcvrd.run()


Expand Down