TDK Lambda Genesys IOC: Investigate (and fix?) why the IOC can't connect easily #1594

KathrynBaker · 2016-09-15T12:22:01Z

During the Muon Front End testing it was realised that something exceedingly odd was occurring in relation to the Genesys PSUs.

Using the IOC to connect was failing, unless another program (HTERM in this instance) had already connected. This is not desirable behaviour, as connecting to 14 PSUs manually does take time.

The following scenarios should help to think about where the problem might be, as it is unclear whether this is related to the Genesys specifically, Server 2012 and Nport could be part of the problem, or it could be something else entirely.

These are the scenarios tested, that I can recall, and their results.
The PSU and Computer have both just been power cycled, start the IOC - comms will not initialise (you will see multiple timeout messages)
On the computer, stop the IOC, connect to the PSU via HTERM, disconnect in HTERM (do NOT send any commands), start the IOC - everything should be fine
Power cycle the PSU - everything should continue to work
Stop the IOC, start the IOC (any way you like) - everything should continue to work

I can't remember if these were explicitly tested or their results if they were:
Stop the IOC, power cycle the PSU, start the IOC
Power cycle the computer

This will need access to a Genesys, and may need a Server 2012 system, and may or may not require a Moxa to be included in the testing. As yet, a test Genesys has not been sourced, nor has an opportunity to explore and test with the existing MUONFE system. Some time was spent on this, but given that there was a workaround that was used instead, with the awareness that this may need to be redone between now and any fix for the problem.

ChrisM-S · 2016-09-28T14:21:07Z

I had a random conversation at lunch today (with someone talking about comms. issues with some hardware, their system didn't use flow control and got occasional overwrites which brought a cryo-magnet down! - also controlled from EPICS). It rang a small bell in my head about this problem. Is it possible that flow control needs to be enabled (software or hardware) for this power supply to respond? (and also to work reliably?)

Two scenarios I can think of are:

The device assumes software flow-control pre-emptively. ie it requires an XON to be sent before it will communicate - but then will continue to communicate unless it detects an overrun?
The device assumes/requires hardware flow control and requires RTS, DCD or some other signal handshake before communication can start. In a no-flow control situation the signal may just remain asserted?

The one thing that I imagine HTERM will do is establish flow control first, even if it then works without it. Is it possible to just try enabling flow control in the EPICS parameters (hardware or software) unless the device appears to have a preference and see if this works.

The other possible lesson is to check whether we should be regularly using flow control in some scenarios where traffic could get heavy. In this case, a buffer became overwriten and a query command ending with a ? turned into the same command without - which set a setpoint to 0 :-)

KathrynBaker · 2016-09-28T14:31:42Z

I know that HTERM is not set up with flow control, and I can’t find any reference to use a flow control other than “none” in the genesys manuals. It might be worth a try, but if that is the solution then there is something more fundamentally wrong with the system.

Typically we don’t turn flow control on because the devices can’t handle it, sending random flow control signals to something that can’t deal with it is as bad as not sending them to something that does.

ChrisM-S · 2016-09-28T14:50:29Z

Yes, it is a bit of a mine field!

For what it’s worth, I really needed it on the Thurlby PS for the binary ICE (but didn’t have it enabled or even think of it) because the thing crashed if you wrote too fast before it was ready and overwrote its buffer. In the end, I made sure of my own flow control by ensuring I read the response from the previous command before sending another (and only send complete commands which produced output).

When you next get a chance to look at the Lambda Genesys, if you widen out the HTERM screen there are four green indicators for CTS, DSR, RI and DCD. It would be very interesting to see if any of these change/flash when you “connect” to the port.

KathrynBaker · 2016-11-09T15:19:40Z

For reference, I have yet to find a solution, but I have tried the following in the asyn setup, none of which has given any success:

Parity as none, even and odd
clocal as Y and N
ctrscts as N and Y
ixon/ixoff as NN, YY, YN and NY

where not coupled and specified everything else was in the default mode (first option listed above), and so far no joy talking to the lambda genesys directly from EPICS.

I have even tried the clocal and ctrscts in each configuration.

I haven't (yet) gone through every permutation of ALL the values, as that feels like clutching at straws.

KathrynBaker · 2016-11-09T18:17:52Z

Things to try next time:
Stopping and starting Nport
Accessing the Moxa ports without NPort

Possible other solutions:
Write a script to run during computer startup that just does that connect/disconnect

Because this can't be fixed this time around, I'm sending it back to the backlog, and will need to remember to repropose it when the next shutdown is coming around.

ChrisM-S · 2016-11-10T11:34:46Z

If HTERM and LabVIEW are both able to talk to the Lambda Genesys with everything as is, there must be some instruction or setup which is different/missing (or differently timed) in the EPICS driver. It might be worth using Sysinternals “Portmon” (or even the breakout box?) to check the signals (DCD, RTS etc.) to see if there is something there. If not chased down, it is pretty likely to rear its head again at some point.

KathrynBaker · 2016-11-10T12:01:58Z

Hence returning this ticket to the backlog. There wasn’t time to chase this down this shutdown to any greater extent, at least not without losing the ability to do other work which was also necessary. As such, we don’t have a solution, there are some alternatives to try next time around which may or may not provide answers, including the use of port monitors (a second person is likely needed from a safety perspective with a physical one, as it requires messing around in the back of a PSU).

KathrynBaker · 2017-01-19T09:58:57Z

One thing I haven't tested, that might be worth a try for this is to use the MOXA via the IP address and port number rather than the COM port, as that would remove nport on Server 2012 as a potential issue

KathrynBaker · 2017-01-30T10:39:35Z

There is now a spare TDK connected to COM20 (Port 16 on the first MOXA) on NDEMUONFE. Due to location and safety concerns, ideally we would be in the vicinity of the PSU (on the Mezzanine, in front of a Danfysik), we should not leave it plugged into the mains and turned on, and under no circumstances can we turn the output power on.

There are various tests we can attempt, and as this is a spare, we could potentially keep testing during cycle.

To be tried:

Connect to port on MOXA 'directly'
Borrow the port and try to connect from a Windows 7 LabVIEW system (just for sanity)
Borrow the port and try to connect from a Windows 7 EPICS system
Try the various connection options with a sniffer on the lines to see just what lines are going up and down each time

KathrynBaker · 2017-04-20T15:21:22Z

James has confirmed that the PSUs are currently on (no current, don;t apply one) - so a restart would give us a number to try and test in a different ways. I may go down tomorrow morning, otherwise I will be down there on Monday

KathrynBaker · 2017-04-24T14:26:37Z

A quick update to add as I've discovered a few more things (no solution yet) - but if anyone has any ideas, please weigh in.

I haven't been able to try any of the alternatives yet - adding the ports to other locations has not been successful. however in these comparisons I did start noticing some differences in the Async-Settings when looking at the server of the Moxa, and between the ones that hadn't been talked to via HTERM. As such I realised that there may be an issue with the RTS/CTS that I hadn't noticed before. As such, I have gone through those settings again, and rather than fix the problem I've realised that instead I can use it to recreate the problem.

e.g.
Connect to COM17 (port 13 if looking at the Moxa directly) via HTERM
Start the IOC
GENESYS_01:1 starts and connects, the others will continue to show the timeout issues
Stop the IOC
Include a setting of crtscts to N (don't use hadware flow control - should be the setting on the MOXA)
Start the IOC
Nothing connects now, they all get timeouts
Stop the IOC, do the connection via HTERM, restart the IOC, and any that have had a HTERM connect will now be working again.

My next stop is to try variations of N and No in that setting of crtscts (and double check that parameter name), but I'm not convinced it isn't a red herring. At least we can now put this back into the error state without having to restart the computer - although just what is having this effect I'm not sure.

ChrisM-S · 2017-04-24T17:00:31Z

If you can get do a mode command on the port it might look different before and after lockup/freeing e.g. C:\Users\gamekeeper\Documents> mode com17 Status for device COM1: ----------------------- Baud: 9600 Parity: None Data Bits: 8 Stop Bits: 1 Timeout: ON XON/XOFF: OFF CTS handshaking: OFF DSR handshaking: OFF DSR sensitivity: OFF DTR circuit: ON RTS circuit: ON C:\Users\gamekeeper\Documents>

FreddieAkeroyd · 2017-04-24T18:51:35Z

Looking at the asyn source code you need to use single letters (Y and N) but it is case insensitive. It uses the win32 SetCommConfig() function and the command Chris mentions prints out most of the relevant parameters

FreddieAkeroyd · 2017-04-24T23:03:40Z

I've pushed a mod to asyn that will print out a lot of extra information via the dbior IOC command - you'd need to rebuild/redeploy to get it

KathrynBaker · 2017-04-25T09:46:12Z

For reference, using the mode com17 command as suggested by Chris: In the working mode (no setting of crtscts in the st.cmd) Status for device COM17: ------------------------ Baud: 9600 Parity: None Data Bits: 8 Stop Bits: 1 Timeout: ON XON/XOFF: OFF CTS handshaking: OFF DSR handshaking: OFF DSR sensitivity: OFF DTR circuit: OFF RTS circuit: OFF Setting crtscts to “N” (as Freddie pointed out, the capitals are the valid responses – I am clutching at straws to get this working!) Status for device COM17: ------------------------ Baud: 9600 Parity: None Data Bits: 8 Stop Bits: 1 Timeout: ON XON/XOFF: OFF CTS handshaking: OFF DSR handshaking: OFF DSR sensitivity: OFF DTR circuit: OFF RTS circuit: ON Setting crtscts to “Y” Status for device COM17: ------------------------ Baud: 9600 Parity: None Data Bits: 8 Stop Bits: 1 Timeout: ON XON/XOFF: OFF CTS handshaking: ON DSR handshaking: OFF DSR sensitivity: OFF DTR circuit: OFF RTS circuit: HANDSHAKE Looking at the Serial Control Fields, there is a flow control option, the default is “Unknown”, there are also choices of “None” and “Hardware” for the FCTL field within the ASYN record (http://www.aps.anl.gov/epics/modules/soft/asyn/R4-16/asynRecord.html), and looking at the ASYN driver information (http://www.aps.anl.gov/epics/modules/soft/asyn/R4-23/asynDriver.html#drvAsynSerialPort), the crtscts parameter is the one to be setting. I will try again the various combinations of crtscts and clocal (also involved in some of these lines), after which I can consider a build with the extra info Freddie added.

ChrisM-S · 2017-04-25T10:28:58Z

The option with “None” looks good to me if you can set it. From what you show, I suspect the Power supply _is_ using/obeying hardware flow control, I’m guessing that when RTS circuit is OFF on COM17, what it actually does is asserts the RTS (Request to send) line permanently so the PSU will always be able to send data.

KathrynBaker · 2017-04-25T10:45:42Z

There seems to be no way from Stream device to set that to None, which is the shame. I’ve tried the four combinations of crtscts and clocal, and none of them work, none of them set the DTR and RTS circuits to off. Not setting them leaves these parameters in the state they are beforehand. Looking at a port that hasn’t been touched yet with this most recent round of testing gives the following information: Status for device COM14: ------------------------ Baud: 1200 Parity: None Data Bits: 8 Stop Bits: 1 Timeout: OFF XON/XOFF: OFF CTS handshaking: OFF DSR handshaking: OFF DSR sensitivity: OFF DTR circuit: ON RTS circuit: ON Those two values of ON are what is causing the inability to speak on a computer restart. There is no way in EPICS to set these to OFF that I can see. The Moxa ports are set with the appropriate defaults in Nport, and on the firmware in the Moxa – so there is something in the HTERM that is turning these settings off. I’m just trying a restart of the system with different settings in the Moxa, see if that makes a difference. This looks to be something relating to this model of Moxa (I have the same “on” values on my desktop one), so I’ll see what I can sort out.

KathrynBaker · 2017-04-25T10:48:52Z

Changing the settings in the Moxa has no effect, it looks like this might be how Server2012 is initialising the ports, so I'm going to go down that path to see if I can find a solution from that side, as there isn't one available easily in software

KathrynBaker · 2017-04-25T12:38:19Z

Next update. I tried all the combinations, and they will turn the dtr and rts circuits to on or handshaking from the st.cmd, but not one will set them to OFF.

I'll hold off using the extras from Freddie for a little while. However, I did try a couple of other things (including installing a new version of nport administrator!)

I used the same mode command to set the two specific fields to OFF, and that allows the port to communicate easily. However, even via that setting method, it isn't maintained during a restart. But, at least using that method we can write a bat script that can be called on instrument restart to get things back into working order.

Or, can we call that line from the st.cmd? Which would be even more reliable, and adaptable to other instruments.

FreddieAkeroyd · 2017-04-25T15:35:08Z

The asyn code is:

else if (epicsStrCaseCmp(key, "crtscts") == 0) {
    if (epicsStrCaseCmp(val, "Y") == 0) {
        tty->commConfig.dcb.fOutxCtsFlow = TRUE;
        tty->commConfig.dcb.fRtsControl = RTS_CONTROL_HANDSHAKE;
    }
    else if (epicsStrCaseCmp(val, "N") == 0) {
        tty->commConfig.dcb.fOutxCtsFlow = FALSE;
        tty->commConfig.dcb.fRtsControl = RTS_CONTROL_ENABLE;
    }

So setting it to N sets RTS_CONTROL_ENABLE and there is no way to set RTS_CONTROL_DISABLE currently. If crtscts was specified in st.cmd, I'd expect the settings to remain unchanged but maybe windows applies some default? The previous mods I did just printed more information, and I think we have worked out what we need to do. I'm not sure if RTS_CONTROL_ENABLE or RTS_CONTROL_DISABLE is correct above or whether another option is required. I can add a special option for us (D for disable) at the moment and then query the correct setting with the epics mailing list or Mark Rivers.

FreddieAkeroyd · 2017-04-25T21:17:36Z

In EPICS disabling CRTSCTS or DTR flow control via the asyn setoptions sets the line to ENABLE and high permanently, there is no option to "disable" the mechanism. I've now pushed a mod to asyn to allow you to set both crtscts and clocal options to the value "D" to turn off the mechanisms.

FreddieAkeroyd · 2017-04-25T22:09:34Z

BTW @KathrynBaker @ChrisM-S good detective work :-)

KathrynBaker · 2017-04-27T12:07:49Z

Given the proximity to beam start, I'm not going to make any changes to NDEMUONFE this shutdown, as Freddie has handled the modification elsewhere, I'm going to mark this ticket as ready for review, and create a new one for modifying the setup on MUONFE before next cycle

KathrynBaker added the bug label Sep 15, 2016

KathrynBaker added the proposal label Oct 26, 2016

kjwoodsISIS added 0.5 ready and removed proposal labels Nov 3, 2016

kjwoodsISIS added this to the SPRINT_2016_11_03 milestone Nov 3, 2016

kjwoodsISIS assigned KathrynBaker Nov 3, 2016

KathrynBaker added in progress and removed ready labels Nov 8, 2016

KathrynBaker removed their assignment Nov 9, 2016

KathrynBaker removed this from the SPRINT_2016_11_03 milestone Nov 9, 2016

KathrynBaker removed the in progress label Nov 9, 2016

KathrynBaker added the proposal label Jan 3, 2017

kjwoodsISIS added 1 and removed 0.5 proposal labels Jan 12, 2017

kjwoodsISIS added this to the SPRINT_2017_01_12 milestone Jan 12, 2017

kjwoodsISIS added ready and removed ready labels Jan 12, 2017

kjwoodsISIS added awaiting and removed ready labels Feb 9, 2017

kjwoodsISIS modified the milestones: SPRINT_2017_04_06, SPRINT_2017_03_09 Apr 6, 2017

KathrynBaker added the in progress label Apr 21, 2017

AdrianPotter assigned KathrynBaker Apr 26, 2017

KathrynBaker added review and removed in progress labels Apr 27, 2017

KathrynBaker mentioned this issue Apr 27, 2017

TDK Lambda Genesys IOC: Modify the setup on MUONFE to use the disable options added in ASYN #2276

Closed

KathrynBaker removed the awaiting label Apr 27, 2017

AdrianPotter added under review completed and removed review under review labels May 17, 2017

kjwoodsISIS added the fixed label Jun 1, 2017

kjwoodsISIS closed this as completed Jun 1, 2017

kjwoodsISIS removed the completed label Jun 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TDK Lambda Genesys IOC: Investigate (and fix?) why the IOC can't connect easily #1594

TDK Lambda Genesys IOC: Investigate (and fix?) why the IOC can't connect easily #1594

KathrynBaker commented Sep 15, 2016 •

edited

Loading

ChrisM-S commented Sep 28, 2016

KathrynBaker commented Sep 28, 2016

ChrisM-S commented Sep 28, 2016

KathrynBaker commented Nov 9, 2016 •

edited

Loading

KathrynBaker commented Nov 9, 2016 •

edited

Loading

ChrisM-S commented Nov 10, 2016

KathrynBaker commented Nov 10, 2016

KathrynBaker commented Jan 19, 2017

KathrynBaker commented Jan 30, 2017

KathrynBaker commented Apr 20, 2017

KathrynBaker commented Apr 24, 2017

ChrisM-S commented Apr 24, 2017 via email

FreddieAkeroyd commented Apr 24, 2017

FreddieAkeroyd commented Apr 24, 2017

KathrynBaker commented Apr 25, 2017 via email

ChrisM-S commented Apr 25, 2017 via email

KathrynBaker commented Apr 25, 2017 via email

KathrynBaker commented Apr 25, 2017

KathrynBaker commented Apr 25, 2017

FreddieAkeroyd commented Apr 25, 2017

FreddieAkeroyd commented Apr 25, 2017

FreddieAkeroyd commented Apr 25, 2017

KathrynBaker commented Apr 27, 2017

TDK Lambda Genesys IOC: Investigate (and fix?) why the IOC can't connect easily #1594

TDK Lambda Genesys IOC: Investigate (and fix?) why the IOC can't connect easily #1594

Comments

KathrynBaker commented Sep 15, 2016 • edited Loading

ChrisM-S commented Sep 28, 2016

KathrynBaker commented Sep 28, 2016

ChrisM-S commented Sep 28, 2016

KathrynBaker commented Nov 9, 2016 • edited Loading

KathrynBaker commented Nov 9, 2016 • edited Loading

ChrisM-S commented Nov 10, 2016

KathrynBaker commented Nov 10, 2016

KathrynBaker commented Jan 19, 2017

KathrynBaker commented Jan 30, 2017

KathrynBaker commented Apr 20, 2017

KathrynBaker commented Apr 24, 2017

ChrisM-S commented Apr 24, 2017 via email

FreddieAkeroyd commented Apr 24, 2017

FreddieAkeroyd commented Apr 24, 2017

KathrynBaker commented Apr 25, 2017 via email

ChrisM-S commented Apr 25, 2017 via email

KathrynBaker commented Apr 25, 2017 via email

KathrynBaker commented Apr 25, 2017

KathrynBaker commented Apr 25, 2017

FreddieAkeroyd commented Apr 25, 2017

FreddieAkeroyd commented Apr 25, 2017

FreddieAkeroyd commented Apr 25, 2017

KathrynBaker commented Apr 27, 2017

KathrynBaker commented Sep 15, 2016 •

edited

Loading

KathrynBaker commented Nov 9, 2016 •

edited

Loading

KathrynBaker commented Nov 9, 2016 •

edited

Loading