-
Notifications
You must be signed in to change notification settings - Fork 672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sfputil] Firmware download/upgrade CLI support for QSFP-DD (#1947) #2349
Changes from 5 commits
3134557
716fa37
8f91400
b1e41e8
88c9af4
1c9c0f4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1112,6 +1112,61 @@ def run_firmware(port_name, mode): | |
|
||
return status | ||
|
||
def is_fw_switch_done(port_name): | ||
""" | ||
Make sure the run_firmware cmd is done | ||
@port_name: | ||
Returns 1 on success, and exit_code = -1 on failure | ||
""" | ||
status = 0 | ||
physical_port = logical_port_to_physical_port_index(port_name) | ||
sfp = platform_chassis.get_sfp(physical_port) | ||
|
||
try: | ||
api = sfp.get_xcvr_api() | ||
except NotImplementedError: | ||
click.echo("This functionality is currently not implemented for this platform") | ||
sys.exit(ERROR_NOT_IMPLEMENTED) | ||
|
||
try: | ||
MAX_WAIT = 600 # 60s timeout. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we read from the module what is advertised instead of hardcoding here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @CliveNi why 60 secs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For more compatible with different module, timeout time should be the worst case. 60s is calculated for different implementation.
Case 1 and 3 will spend the most time to switch image. It will need to copy one image to non-volatile buffer, then interchange two images with each other. That will need 3 cycles erasing and programming on non-volatile memory. We can roughly assume all of fw component in module around 1MBytes. A common M0~M4 MCU needs around 40ms per page(2KBytes) for non-volatile erasing and programming, so that the calculation as below. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just curious, if a firmware size is >> 1MB (e.g. close to 2 MB), does that mean 60 sec wait time would not be enough? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Exactly, 1MB is a roughly number, but I think the margin is enough for different DSP solutions now. Actually, it only uses around 500KB totally in our 400/800G implementation. |
||
fw_info = api.get_module_fw_info() | ||
cnt = 0 | ||
is_busy = 1 if (fw_info['status'] == False) and (fw_info['result'] is not None) else 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you show examples of fw_info 1) when a module is busy and 2) when a module shows faulty status? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "fw_info" is returned by "get_module_fw_info()" which is in cmis.py of "sonic-platform-common". It will return 3 combination patterns of status and result as below.
The unit test function "test_is_fw_switch_done" in sfputil_test.py will also test all these cases. |
||
while is_busy and cnt < MAX_WAIT: | ||
time.sleep(0.1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. May not need to check this frequently. It also brings a lot of workload to the module. Checking every 1 or 2 seconds should be OK without causing too much time delay. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree, already revised to check every 2 seconds. |
||
fw_info = api.get_module_fw_info() | ||
is_busy = 1 if (fw_info['status'] == False) and (fw_info['result'] is not None) else 0 | ||
cnt += 1 | ||
|
||
if fw_info['status'] == True: | ||
(ImageA, ImageARunning, ImageACommitted, ImageAInvalid, | ||
ImageB, ImageBRunning, ImageBCommitted, ImageBInvalid) = fw_info['result'] | ||
|
||
if (ImageARunning == 1) and (ImageAInvalid == 1): # ImageA is running, but also invalid. | ||
click.echo("FW info error : ImageA shows running, but also shows invalid!") | ||
status = -1 # Abnormal status. | ||
elif (ImageBRunning == 1) and (ImageBInvalid == 1): # ImageB is running, but also invalid. | ||
click.echo("FW info error : ImageB shows running, but also shows invalid!") | ||
status = -1 # Abnormal status. | ||
elif (ImageARunning == 1) and (ImageACommitted == 0): # ImageA is running, but not committed. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ImageACommitted check is NOT correct here because a running image need NOT be committed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, so that (Running == 1) and (Committed == 0) will return success, isn't it? |
||
click.echo("FW images switch successful : ImageA is running") | ||
status = 1 # run_firmware is done. | ||
elif (ImageBRunning == 1) and (ImageBCommitted == 0): # ImageB is running, but not committed. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. commit_firmware() happens after the run_firmware() so why are you checking for ImageBCommitted? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is_fw_switch_done() executes between run_firmware() and commit_firmware() in upgrade() and assuming the running and committed image is same before upgrade(). run_firmware() will change the running image, so that checking running and committed image are different will be a easy way for judging image switch done. |
||
click.echo("FW images switch successful : ImageB is running") | ||
status = 1 # run_firmware is done. | ||
else: # No image is running, or running and committed image is same. | ||
click.echo("FW info error : Failed to switch into uncommitted image!") | ||
status = -1 # Failure for Switching images. | ||
else: | ||
click.echo("FW switch : Timeout!") | ||
status = -1 # Timeout or check code error or CDB not supported. | ||
|
||
except NotImplementedError: | ||
click.echo("This functionality is not applicable for this transceiver") | ||
|
||
return status | ||
|
||
def commit_firmware(port_name): | ||
status = 0 | ||
physical_port = logical_port_to_physical_port_index(port_name) | ||
|
@@ -1280,6 +1335,10 @@ def upgrade(port_name, filepath): | |
|
||
click.echo("Firmware run in mode 1 successful") | ||
|
||
if is_fw_switch_done(port_name) != 1: | ||
click.echo('Failed to switch firmware images!') | ||
sys.exit(EXIT_FAIL) | ||
|
||
Comment on lines
+1336
to
+1339
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why are you checking for committed image inside this function when the commit_firmware() happens at line 1342 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above reply. |
||
status = commit_firmware(port_name) | ||
if status != 1: | ||
click.echo('Failed to commit firmware! CDB status: {}'.format(status)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why this API is required because there is already a return status from run_firmware() to indicate a failure. An imageA could be running while the committed Image could be Image B. For eg.
Now the running image is B but the committed image is A
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"run_firmware" indicates the status which read before reset to target image. This status only means module received and done the judgement for "run_firmware" cmd, but not means module already jumped into target image.
"run_firmware" will maybe involved non-volatile memory operation in module according to module implementation, so that it will maybe spend several or tens of seconds. That will cause commit failed if sends "commit_firmware" once "run_firmware" done without "is_fw_switch_done".