-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TDK Lambda Genesys IOC: Investigate (and fix?) why the IOC can't connect easily #1594
Comments
I had a random conversation at lunch today (with someone talking about comms. issues with some hardware, their system didn't use flow control and got occasional overwrites which brought a cryo-magnet down! - also controlled from EPICS). It rang a small bell in my head about this problem. Is it possible that flow control needs to be enabled (software or hardware) for this power supply to respond? (and also to work reliably?) Two scenarios I can think of are:
The one thing that I imagine HTERM will do is establish flow control first, even if it then works without it. Is it possible to just try enabling flow control in the EPICS parameters (hardware or software) unless the device appears to have a preference and see if this works. The other possible lesson is to check whether we should be regularly using flow control in some scenarios where traffic could get heavy. In this case, a buffer became overwriten and a query command ending with a ? turned into the same command without - which set a setpoint to 0 :-) |
I know that HTERM is not set up with flow control, and I can’t find any reference to use a flow control other than “none” in the genesys manuals. It might be worth a try, but if that is the solution then there is something more fundamentally wrong with the system. Typically we don’t turn flow control on because the devices can’t handle it, sending random flow control signals to something that can’t deal with it is as bad as not sending them to something that does. |
Yes, it is a bit of a mine field! For what it’s worth, I really needed it on the Thurlby PS for the binary ICE (but didn’t have it enabled or even think of it) because the thing crashed if you wrote too fast before it was ready and overwrote its buffer. In the end, I made sure of my own flow control by ensuring I read the response from the previous command before sending another (and only send complete commands which produced output). When you next get a chance to look at the Lambda Genesys, if you widen out the HTERM screen there are four green indicators for CTS, DSR, RI and DCD. It would be very interesting to see if any of these change/flash when you “connect” to the port. |
For reference, I have yet to find a solution, but I have tried the following in the asyn setup, none of which has given any success: Parity as none, even and odd where not coupled and specified everything else was in the default mode (first option listed above), and so far no joy talking to the lambda genesys directly from EPICS. I have even tried the clocal and ctrscts in each configuration. I haven't (yet) gone through every permutation of ALL the values, as that feels like clutching at straws. |
Things to try next time: Possible other solutions: Because this can't be fixed this time around, I'm sending it back to the backlog, and will need to remember to repropose it when the next shutdown is coming around. |
If HTERM and LabVIEW are both able to talk to the Lambda Genesys with everything as is, there must be some instruction or setup which is different/missing (or differently timed) in the EPICS driver. It might be worth using Sysinternals “Portmon” (or even the breakout box?) to check the signals (DCD, RTS etc.) to see if there is something there. If not chased down, it is pretty likely to rear its head again at some point. |
Hence returning this ticket to the backlog. There wasn’t time to chase this down this shutdown to any greater extent, at least not without losing the ability to do other work which was also necessary. As such, we don’t have a solution, there are some alternatives to try next time around which may or may not provide answers, including the use of port monitors (a second person is likely needed from a safety perspective with a physical one, as it requires messing around in the back of a PSU). |
One thing I haven't tested, that might be worth a try for this is to use the MOXA via the IP address and port number rather than the COM port, as that would remove nport on Server 2012 as a potential issue |
There is now a spare TDK connected to COM20 (Port 16 on the first MOXA) on NDEMUONFE. Due to location and safety concerns, ideally we would be in the vicinity of the PSU (on the Mezzanine, in front of a Danfysik), we should not leave it plugged into the mains and turned on, and under no circumstances can we turn the output power on. There are various tests we can attempt, and as this is a spare, we could potentially keep testing during cycle. To be tried:
|
James has confirmed that the PSUs are currently on (no current, don;t apply one) - so a restart would give us a number to try and test in a different ways. I may go down tomorrow morning, otherwise I will be down there on Monday |
A quick update to add as I've discovered a few more things (no solution yet) - but if anyone has any ideas, please weigh in. I haven't been able to try any of the alternatives yet - adding the ports to other locations has not been successful. however in these comparisons I did start noticing some differences in the Async-Settings when looking at the server of the Moxa, and between the ones that hadn't been talked to via HTERM. As such I realised that there may be an issue with the RTS/CTS that I hadn't noticed before. As such, I have gone through those settings again, and rather than fix the problem I've realised that instead I can use it to recreate the problem. e.g. My next stop is to try variations of N and No in that setting of crtscts (and double check that parameter name), but I'm not convinced it isn't a red herring. At least we can now put this back into the error state without having to restart the computer - although just what is having this effect I'm not sure. |
If you can get do a mode command on the port it might look different before and after lockup/freeing e.g.
C:\Users\gamekeeper\Documents> mode com17
Status for device COM1:
-----------------------
Baud: 9600
Parity: None
Data Bits: 8
Stop Bits: 1
Timeout: ON
XON/XOFF: OFF
CTS handshaking: OFF
DSR handshaking: OFF
DSR sensitivity: OFF
DTR circuit: ON
RTS circuit: ON
C:\Users\gamekeeper\Documents>
|
Looking at the asyn source code you need to use single letters (Y and N) but it is case insensitive. It uses the win32 SetCommConfig() function and the command Chris mentions prints out most of the relevant parameters |
I've pushed a mod to asyn that will print out a lot of extra information via the dbior IOC command - you'd need to rebuild/redeploy to get it |
For reference, using the mode com17 command as suggested by Chris:
In the working mode (no setting of crtscts in the st.cmd)
Status for device COM17:
------------------------
Baud: 9600
Parity: None
Data Bits: 8
Stop Bits: 1
Timeout: ON
XON/XOFF: OFF
CTS handshaking: OFF
DSR handshaking: OFF
DSR sensitivity: OFF
DTR circuit: OFF
RTS circuit: OFF
Setting crtscts to “N” (as Freddie pointed out, the capitals are the valid responses – I am clutching at straws to get this working!)
Status for device COM17:
------------------------
Baud: 9600
Parity: None
Data Bits: 8
Stop Bits: 1
Timeout: ON
XON/XOFF: OFF
CTS handshaking: OFF
DSR handshaking: OFF
DSR sensitivity: OFF
DTR circuit: OFF
RTS circuit: ON
Setting crtscts to “Y”
Status for device COM17:
------------------------
Baud: 9600
Parity: None
Data Bits: 8
Stop Bits: 1
Timeout: ON
XON/XOFF: OFF
CTS handshaking: ON
DSR handshaking: OFF
DSR sensitivity: OFF
DTR circuit: OFF
RTS circuit: HANDSHAKE
Looking at the Serial Control Fields, there is a flow control option, the default is “Unknown”, there are also choices of “None” and “Hardware” for the FCTL field within the ASYN record (http://www.aps.anl.gov/epics/modules/soft/asyn/R4-16/asynRecord.html), and looking at the ASYN driver information (http://www.aps.anl.gov/epics/modules/soft/asyn/R4-23/asynDriver.html#drvAsynSerialPort), the crtscts parameter is the one to be setting. I will try again the various combinations of crtscts and clocal (also involved in some of these lines), after which I can consider a build with the extra info Freddie added.
|
The option with “None” looks good to me if you can set it. From what you show, I suspect the Power supply _is_ using/obeying hardware flow control, I’m guessing that when RTS circuit is OFF on COM17, what it actually does is asserts the RTS (Request to send) line permanently so the PSU will always be able to send data.
|
There seems to be no way from Stream device to set that to None, which is the shame. I’ve tried the four combinations of crtscts and clocal, and none of them work, none of them set the DTR and RTS circuits to off. Not setting them leaves these parameters in the state they are beforehand. Looking at a port that hasn’t been touched yet with this most recent round of testing gives the following information:
Status for device COM14:
------------------------
Baud: 1200
Parity: None
Data Bits: 8
Stop Bits: 1
Timeout: OFF
XON/XOFF: OFF
CTS handshaking: OFF
DSR handshaking: OFF
DSR sensitivity: OFF
DTR circuit: ON
RTS circuit: ON
Those two values of ON are what is causing the inability to speak on a computer restart. There is no way in EPICS to set these to OFF that I can see. The Moxa ports are set with the appropriate defaults in Nport, and on the firmware in the Moxa – so there is something in the HTERM that is turning these settings off.
I’m just trying a restart of the system with different settings in the Moxa, see if that makes a difference. This looks to be something relating to this model of Moxa (I have the same “on” values on my desktop one), so I’ll see what I can sort out.
|
Changing the settings in the Moxa has no effect, it looks like this might be how Server2012 is initialising the ports, so I'm going to go down that path to see if I can find a solution from that side, as there isn't one available easily in software |
Next update. I tried all the combinations, and they will turn the dtr and rts circuits to on or handshaking from the st.cmd, but not one will set them to OFF. I'll hold off using the extras from Freddie for a little while. However, I did try a couple of other things (including installing a new version of nport administrator!) I used the same mode command to set the two specific fields to OFF, and that allows the port to communicate easily. However, even via that setting method, it isn't maintained during a restart. But, at least using that method we can write a bat script that can be called on instrument restart to get things back into working order. Or, can we call that line from the st.cmd? Which would be even more reliable, and adaptable to other instruments. |
The asyn code is:
So setting it to N sets RTS_CONTROL_ENABLE and there is no way to set RTS_CONTROL_DISABLE currently. If crtscts was specified in st.cmd, I'd expect the settings to remain unchanged but maybe windows applies some default? The previous mods I did just printed more information, and I think we have worked out what we need to do. I'm not sure if RTS_CONTROL_ENABLE or RTS_CONTROL_DISABLE is correct above or whether another option is required. I can add a special option for us (D for disable) at the moment and then query the correct setting with the epics mailing list or Mark Rivers. |
In EPICS disabling CRTSCTS or DTR flow control via the asyn setoptions sets the line to ENABLE and high permanently, there is no option to "disable" the mechanism. I've now pushed a mod to asyn to allow you to set both crtscts and clocal options to the value "D" to turn off the mechanisms. |
BTW @KathrynBaker @ChrisM-S good detective work :-) |
Given the proximity to beam start, I'm not going to make any changes to NDEMUONFE this shutdown, as Freddie has handled the modification elsewhere, I'm going to mark this ticket as ready for review, and create a new one for modifying the setup on MUONFE before next cycle |
During the Muon Front End testing it was realised that something exceedingly odd was occurring in relation to the Genesys PSUs.
Using the IOC to connect was failing, unless another program (HTERM in this instance) had already connected. This is not desirable behaviour, as connecting to 14 PSUs manually does take time.
The following scenarios should help to think about where the problem might be, as it is unclear whether this is related to the Genesys specifically, Server 2012 and Nport could be part of the problem, or it could be something else entirely.
These are the scenarios tested, that I can recall, and their results.
The PSU and Computer have both just been power cycled, start the IOC - comms will not initialise (you will see multiple timeout messages)
On the computer, stop the IOC, connect to the PSU via HTERM, disconnect in HTERM (do NOT send any commands), start the IOC - everything should be fine
Power cycle the PSU - everything should continue to work
Stop the IOC, start the IOC (any way you like) - everything should continue to work
I can't remember if these were explicitly tested or their results if they were:
Stop the IOC, power cycle the PSU, start the IOC
Power cycle the computer
This will need access to a Genesys, and may need a Server 2012 system, and may or may not require a Moxa to be included in the testing. As yet, a test Genesys has not been sourced, nor has an opportunity to explore and test with the existing MUONFE system. Some time was spent on this, but given that there was a workaround that was used instead, with the awareness that this may need to be redone between now and any fix for the problem.
The text was updated successfully, but these errors were encountered: