-
Notifications
You must be signed in to change notification settings - Fork 79
Final adjustments and promotion to prod for JET and OC lessons #218
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Matt Oswalt <[email protected]>
Signed-off-by: Matt Oswalt <[email protected]>
Signed-off-by: Matt Oswalt <[email protected]>
I just try the JET lab in PTR site, it is using The reason I asked is when I go through the lesson, there is no output in stage 2 after interface is added. Further debug the root cause is the PFE doesn't up at that moment. Shall we have some mechanisms to check the PFE status before returning the page to user? |
Ah, I knew I forgot about something. We probably can't/shouldn't adjust things on the Syringe side, but we can play around with the image. Either boot the PFE first and add a big honkin' sleep before booting the rest, or maybe add a PFE check inside the image and block SSH until it's up (which would delay Syringe effectively) I'll play with it and let you know |
@mwiget What's the best way from within the container image to verify that the PFE is up and running? I can telnet to port 3000 immediately after the lesson is starting but Junos doesn't quite see it. So I'm hoping there's some other way I can detect PFE health within the container image. |
Looks like the PFE is detected, but goes through a testing phase? Wonder if it would even be useful then, to delay the vcp since it looks like it detects it right away, but has to do a bunch of testing stuff before it makes those interfaces available...
|
Disregard my last. If the entry shows up in So my question is, how can we get visibility into the cosim boot process? I'm poking around at the logs in the container, but they're significantly lacking in useful data. And as I mentioned before, I can telnet to ports 3000 and 3001 right away, so that's not useful as a valid health check. |
@Mierdin I just ran some tests on GCP with nested kvm active, and I'm surprised how long it takes to boot. I see a total of 15 minutes (6 minutes for Junos VCP alone). Connectivity between VCP and cosim is possible (checked via telnet 169.254.0.1 port 3000 from vcp) long before the pfe gets detected. To your question on what to check for the PFE to be ready. I use something like this:
Basically logging into Junos and check for memory on FPC 0. On non-nested kvm, PFE's are coming up within seconds of being able to log into Junos. I'm not hopeful finding a workaround to the delay. Must be some code that pull kvm into emulation mode within the Junos VM. |
Looking at the Junos messages, I see many of these warnings when running nested, which I never see on baremetal:
Taking 186 messages, each reporting at least 4 seconds, eats up 12 minutes. Explaining the overall delay. |
I think it's time to start looking at baremetal hosting. I really want this content published ASAP (these two lessons are really good) but I feel like it won't have nearly the right impact if we don't give it the performance it needs. So I'm sorry it's taking so long to get this content published but I think I'm going to push this until the next release, likely 0.4.0. That will give me time to mull over options and come up with a better game plan for the infra ops side of things. @valjeanchan @jnpr-raylam you good with this? I just want to make sure this content is shown in the best light possible, and it's starting to look like nested virt is just not going to cut it. |
we're fine with this, and it's good to know we're going to investigate for the baremetal hosting. without the nested virt, we can try to add the vmx and develop some courses for telemetry, and also it's possible to develop some contrail stuffs. |
Looks like the majority of the image issues have been sorted. This PR will take care of some last-minute cleanups in the JET and OpenConfig lessons, including the addition of a second image in the JET lesson so that the ping tests in stage 4 will work.
In addition, the two lessons will be promoted to production so they'll show up in the main site in the next release (currently targeted for later this week)
/cc @valjeanchan @jnpr-raylam