Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Scrapli 'core' and platform migration #297

Merged
merged 50 commits into from
Jan 17, 2025

Conversation

kaelemc
Copy link

@kaelemc kaelemc commented Dec 20, 2024

This PR implements scrapli to the 'core' functions, and uses the driver for some platforms.

Changes

vrnetlab base image

A new vrnetlab 'base' image is added. This image should be used as the base for all migrated platforms.

It contains common pkgs preinstalled as well as Scrapli & Scrapli community.

Scrapli 'core'

The core now uses Scrapli's Driver for serial console and qemu monitor connections.

Scrapli is imported in vrnetlab.py but will not throw an error if not found, which means scrapli and telnetlib may co-exist.

Setting use_scrapli to True in the class initializer will enable Scrapli for the serial console and qemu monitor connections.

Core functions wait_write, expect and read_until now have Scrapli equivalents.

wait_write

wait_write can effectively still be used as normal. Inside the function if use_scrapli was set to true then the wait_write_scrapli function is run instead.

wait_write_scrapli replicates the functionality of wait_write, but using the Scrapli serial console connection instead of telnetlib.

wait_write_scrapli currently doesn't have support for con, clean_buffer or hold args.

expect

expect was part of telnetlib. The scrapli compatible version is the con_expect function.

con_expect aims to replicate the functionality of telnetlib.expect

It accepts a list of byte-strings which are used for regex matching something on the console.

Like telnetlib.expect it will return the list index of the first thing that it matched, the re match object and the read console buffer up until the match (or timeout).

Unlike telnetlib, the timeout is optional and con_expect will not block forever. Rather the timeout in this case is used for how long you wish to block for.

My personal opinion is to not use the timeout/block, it makes behaviour less reliable.

read_until

read_until was another function of telnetlib, the Scrapli compatible function is con_read_until. Generally this wasn't used directly in any nodes, but rather for the wait_write function.

con_read_until will continously read and output the serial console buffer to stdout until the string it must match on is matched.

It returns the entire buffer of the console read until the match.

By default con_read_until is blocking, however there is a timeout arg if the function should be required to timeout after some amount of time.

Logging

  • Logging formatted has been improved -- Logs levels are coloured via ANSI escape code now.
  • Common logs are done on vrnetlab side now:
    • Env vars
    • If transparent mgmt intf is in use
    • If scrapli is in use
    • SMP/vCPU and RAM settings

Misc

These are fairly opinionated additions:

write_to_stdout has been added. It's a simple helper function to write something to the stdout and flush the buffer so everything is written.

The main usage is to print console output, the jusitifcation is because it looks uglier when console output is printed by the logger.

Another addition is the format_bool_color function. This is simply used to return text which is ANSI formatted in green or red depending on if some boolean is true or false.

Migrated platforms

  • Cisco CSR1kv -- Uses CVAC (mounted ISO)
  • Cisco Cat8kv -- Uses CVAC (mounted ISO)
  • Cisco Cat9kv -- Uses CVAC (mounted ISO)
  • Cisco vIOS -- IOSXEDriver
  • Cisco NX-OS -- NXOSDriver
  • Cisco Nexus 9000v -- NXOSDriver
  • Cisco IOS-XRv -- IOSXRDriver
  • Cisco IOS-XRv9k -- IOSXRDriver
  • Nokia SR-OS -- nokia_sros platform from Scrapli Community

I plan to implement other platforms later down the line; time permitting.

Migration steps

There are two ways you can migrate:

  • Maintain wait_write functionality but use Scrapli as the telnet backend.
  • Migrate everything to Scrapli and use the platform implementation for config management.

Steps

  • Migrate the Dockerfile to ghcr.io/srl-labs/vrnetlab-base as the base image.
  • In the class initializer, set use_scrapli to True.
  • Change self.tn.expect() to self.con_expect() and remove the timeout.
  • Change the console buffer printing to use the self.write_to_stdout() instead of trace or debug logging.
  • Change the telnet close from self.tn.close() to self.scrapli_tn.close()

Extra steps if migrating to Scrapli platform/driver

  • Create the scrapli device configuration and open the connection:
    • You can close the self.scrapli_tn connection and open either manually or use the context manager (see XRv9k).
    • (RECOMMENDED) You can commandeer the existing self.scrapli_tn connection, so the new driver uses the existing transport (see vIOS)
  • (OPTIONAL) Implement the SCRAPLI_TIMEOUT env var to let the user control the driver timeout.

ssasso and others added 25 commits December 16, 2024 10:11
* backdoor to reset VR

* option to reset specific VMs
- Implement scrapli for telnet console and qemu monitor
- Add scrapli for core funcs (wait_write, read_until, expect)
- Add conditional use of scrapli via 'use_scrapli' var. Default is disabled
- Add colours to logging
- Log env vars
- Log if transparent mgmt intf is in use
- Log if scrapli is in use
- Log overlay image creation
- Log defined SMP and RAM
- Use Scrapli IOSXEDriver for config
- Update install VM var name to 'cat8kv' from 'csr'
- Fix installer class init so overlay image is only created once
- Remove license check
- Send bootstrap config via day0/CVAC config (mounted file to cdrom)
- Send startup config via Scrapli IOSXEDriver
- Use Scrapli IOSXEDriver for sending bootstrap and startup configs
- Use Scrapli IOSXRDriver to send bootstrap and startup configs
- Converts the qcow2 image into required vmdk format for vrnetlab via qemu-img.
- Use Scrapli IOSXRDriver for bootstrap and startup configs
- Change class names to 'XRv9k' instead of 'XRv'
- Explicitly wait for SDR baking to complete in install process
- Remove call home/LC check
- Use NXOSDriver for bootstrap and startup configs
- Use NXOSDriver for bootstrap and startup configs
- Use IOSXEDriver for bootstrap and startup configs
- vios, csr, cat8kv, cat9kv -- add configuration saving
- XRv, XRv9k -- log configuration saving
- Use scrapli community 'nokia_sros' platform
- Remove wait_write clean_buffer override
- Check if tftpboot conifg exists *before* opening Scrapli connection
- Log command outputs with 'DEBUG_SCRAPLI' env var (defaults to false)
@kaelemc
Copy link
Author

kaelemc commented Dec 20, 2024

I rebased my branch to add the /reset functionality with Scrapli.

There is a current caveat with SROS. Please use my fork/branch of scrapli community. I have some minor changes to the regex to allow for BOF configuration prompt.

Once cloned, please rebuild the base image with the Dockerfile in this PR branch.

Once all has been tested and issues are ironed out I will make a PR to get this added in scrapli community.

Of course as always I am open to/want feedback. If you want any explanations for anything please let me know!

@kaelemc kaelemc mentioned this pull request Dec 20, 2024
35 tasks
@tjbalzer
Copy link

Did some tests on:

  • Cisco cat9kv
  • Cisco csr1kv
  • Cisco cat8kv
  • Cisco vIOS

Everything worked as expected:

  • bootstrap config loaded OK
  • user provided startup-config loaded OK

All tests were done with fully functional labs up to six nodes.

Looks good, no Scrapli related issues so far (the log is a little chattier than before... ;-)).

@kaelemc
Copy link
Author

kaelemc commented Dec 22, 2024

@tjbalzer Thanks for testing!

For SROS there is an env var DEBUG_SCRAPLI which controls whether we show the result of each command or not, by default it's disabled. In your opinion would something like this be better for Cisco the nodes too, or should we hide the channel input logging from Scrapli instead?

sros/docker/launch.py Outdated Show resolved Hide resolved
sros/docker/launch.py Outdated Show resolved Hide resolved
sros/docker/launch.py Outdated Show resolved Hide resolved
If the startup-configuration provided is classic then the default configuration engine will be set to classic mode.

In this case the scrapli device variant should also be set to classic so the scrapli magic can do it's thing with the correct prompt matching.
@jcpvdm
Copy link

jcpvdm commented Jan 14, 2025

Found two new issues in my setup (using SROS 24.10.R1):

failing to apply default configs

When "Applying basic SR OS configuration", it's still on bof exclusive mode, hence it's failing.

2025-01-14 10:17:13,459: sync_channel   INFO sending channel input: commit; strip_prompt: False; eager: False
2025-01-14 10:17:13,971: sync_channel   INFO sending channel input: /; strip_prompt: False; eager: False
2025-01-14 10:17:14,015: launch         DEBUG CHANNEL INPUT: commit
2025-01-14 10:17:14,015: launch         DEBUG OUTPUT:
(ex:bof)[/]
A:admin@vSIM#
2025-01-14 10:17:14,015: launch         DEBUG CHANNEL INPUT: /
2025-01-14 10:17:14,015: launch         DEBUG OUTPUT:
(ex:bof)[/]
A:admin@vSIM#
2025-01-14 10:17:14,015: launch         DEBUG Applying basic SR OS configuration...
(...)
A:admin@vSIM#
2025-01-14 10:17:15,878: launch         DEBUG CHANNEL INPUT: /configure system name sr1
2025-01-14 10:17:15,878: launch         DEBUG OUTPUT:
               ^^^^^^^^^
MINOR: CLI #2069: Operation not allowed - currently in exclusive mode

(ex:bof)[/]

Fix suggestion: use quit-config instead of /.

diff --git a/sros/docker/launch.py b/sros/docker/launch.py
index 04a6d16..9d5c116 100755
--- a/sros/docker/launch.py
+++ b/sros/docker/launch.py
@@ -1129,14 +1129,14 @@ class SROS_vm(vrnetlab.VM):
         """Commit configuration. No-op for SR OS version <= 22"""
         if SROS_VERSION.major <= 22 or SROS_VERSION.magc:
             return
-        res = self.sros_con.send_configs(["commit", "/"], strip_prompt=False)
+        res = self.sros_con.send_configs(["commit", "quit-config"], strip_prompt=False)
         self.log_scrapli_cmd_res(res)
 
     def commitBofConfig(self):
         """Commit configuration. No-op for SR OS version <= 22"""
         if SROS_VERSION.major <= 22 or SROS_VERSION.magc:
             return
-        res = self.sros_con.send_configs(["commit", "/"], strip_prompt=False)
+        res = self.sros_con.send_configs(["commit", "quit-config",], strip_prompt=False)
         self.log_scrapli_cmd_res(res)

bof commands fail when startup-config is classic

2025-01-14 10:46:31,429: base_driver    INFO attempting to acquire 'exec' privilege level
2025-01-14 10:46:31,430: sync_channel   INFO sending channel input: environment no more; strip_prompt: True; eager: False
2025-01-14 10:46:31,514: sync_channel   INFO sending channel input: edit-config bof exclusive; strip_prompt: False; eager: False
2025-01-14 10:46:31,600: launch         DEBUG CHANNEL INPUT: edit-config bof exclusive
2025-01-14 10:46:31,600: launch         DEBUG OUTPUT:
       ^
Error: Bad command.
A:sr1#
(...)

@kaelemc
Copy link
Author

kaelemc commented Jan 14, 2025

@jcpvdm Can you manually revert the change from this commit and try again?

@jcpvdm
Copy link

jcpvdm commented Jan 14, 2025

@kaelemc reverting f42e1 fixed "failing to apply default configs" only. I believe self.sros_con.send_commands(cmds, strip_prompt=False) in persistBofAndConfig is triggering an implicit quit-config from bof exclusive.

A:admin@vSIM#
2025-01-14 11:23:03,342: launch         DEBUG CHANNEL INPUT: /
2025-01-14 11:23:03,342: launch         DEBUG OUTPUT:
(ex:bof)[/]
A:admin@vSIM#
2025-01-14 11:23:03,384: base_driver    INFO attempting to acquire 'exec' privilege level
2025-01-14 11:23:03,384: sync_channel   INFO sending channel input: quit-config; strip_prompt: True; eager: False
2025-01-14 11:23:03,516: base_driver    INFO attempting to acquire 'exec' privilege level
2025-01-14 11:23:03,517: sync_channel   INFO sending channel input: /admin save bof; strip_prompt: False; eager: False
2025-01-14 11:23:04,112: sync_channel   INFO sending channel input: /admin save; strip_prompt: False; eager: False
2025-01-14 11:23:04,728: launch         DEBUG CHANNEL INPUT: /admin save bof
2025-01-14 11:23:04,728: launch         DEBUG OUTPUT:
Writing configuration to cf3:/bof.cfg
Saving configuration OK
Completed.

[/]

@kaelemc
Copy link
Author

kaelemc commented Jan 14, 2025

@jcpvdm I'm not seeing this "failing to apply default configs". If you are still seeing it, let me know if quit-config does the trick in persisBofAndConfig() and we'll add it in.

Newest commit should fix some stuff when a classic startup config is used.

Let me know how it works out.

P.S. Thanks for testing! It's always appreciated.

@kaelemc
Copy link
Author

kaelemc commented Jan 14, 2025

@hellt In the latest commit I've also reverted persistBofAndConfig() back to it's old place. Let me know if you want to do something different.

Edit: Force pushed as I had to fix the commit message :)

As classic startup configurations are now supported for MD-CLI defaulting versions, the classic CLI will mean the default config engine is classic on node boot. In this commit all logic that determined when to send/not send config for classic versions is now replaced with a single 'classic_cfg' global variable.

Most of the logic across the code had repeated statements checking if the version was <= 22 or magc. 'classic_cfg' is set to True in this case. Else it is False.
sros/docker/launch.py Outdated Show resolved Hide resolved
@jcpvdm
Copy link

jcpvdm commented Jan 14, 2025

@kaelemc

Newest commit should fix some stuff when a classic startup config is used.
Let me know how it works out.

I see both issues are fixed in the last commit, but may have introduced a new one has @hellt mentioned above.

@jcpvdm I'm not seeing this "failing to apply default configs".

You don't see the issue even before reverting commit f42e1? Can you share the output from container logs? (make sure there is no configuration file present when booting)

@kaelemc
Copy link
Author

kaelemc commented Jan 15, 2025

@jcpvdm I was mistaken, I observed the exact same behaviour as you reported.

@kaelemc
Copy link
Author

kaelemc commented Jan 17, 2025

@jcpvdm Hopefully everything is fixed! It'd be appreciated if you could test and let me know if there's anything I've broken/missed 😄.

Regarding the 16.x MD-CLI - went with @hellt's solution where we won't enforce MD-CLI for the older versions.

@jcpvdm
Copy link

jcpvdm commented Jan 17, 2025

Looks good @kaelemc 👍 , my test cases passed

@kaelemc
Copy link
Author

kaelemc commented Jan 17, 2025

Cool, thanks @jcpvdm !

@hellt hellt merged commit 06c32d8 into hellt:scrapli-dev Jan 17, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants