-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random segmentation fault (core dumped) in ARM64 #151
Comments
Hi @su600 this isn't c++, is python. Can you do me a favor and print out the Also post full stack trace error when it happens. |
I know pylogix is a Python Project, so it is weird happen My code above is just an example to test this problem, the |
Maybe the segfault comes from the IP stack? Since your code is running the read in a tight loop you may be overwhelming the device you are talking to. If this happens it might send garbage back. Put something like time.sleep(0.020) in your loop and see if this prevents the segfault. |
pylogix is a pure python project. One problem I see with your example, each time you are calling rockwellread(), you are creating a new instance of the driver while not closing it. The PLC will eventually flush the connections, but you are creating new connections faster than the PLC will flush them, it will eventually get tired of this. Why read each tag individually and not just just read the whole list at once? It would be much faster to read the list.
|
@evaldes2015 He already has a time.sleep(1) one second in the initial example. 0.2 is milliseconds, 0.02 is what microseconds? point is will make it faster, not slower. I agree might as well add a big timeout of at least 10s, if the segmentation error occurs then is definitely an issue of the container most likely. That's of course after trying Dustin previous example. @dmroeder Nice catch on the number of PLC instances without closing. |
@TheFern2 If you look at his code, there's a tight loop inside his readrockwell function where he reads a bunch of tags. He only sleeps between calls to the function, not between reads. If his device is an ENBT, he might overload it by doing this. |
I think @evaldes2015 was suggesting putting a 20ms delay between each read. That would be a little different than the 1 second call to start the read process. I agree with you @TheFern2 , the segmentation fault is likely some container issue. I don't work with containers, so it's a little hard for me to troubleshoot. I don't believe this to be a pylogix issue, though I'm willing to be wrong. Of course, how the connection is handled is still an issue. |
@evaldes2015 Ah gotcha, yes another sleep inside the readrockwell, I misunderstood. I would make all time sleeps 5 or 10s just for testing purposes, and slowly decrease them. Just to ensure the seg error doesn't occur. Because if it does occur even with timeouts then we'd be out of ideas as far as pylogix goes. |
Sleep is in seconds. 0.2 seconds = 200ms. 0.02 seconds = 20ms |
@su600 post details on image and container if @dmroeder and @evaldes2015 suggestions don't work. Oh nvm is on initial post, I don't have an arm64, that's not a raspberry pi, is it? |
My hardware is base on NXP ARM-Cotex-A53. My original code of is like this. I have used
I have also set the Maybe it is a problem of docker or something else, I'll do more test about the Docker container or other Python version. |
One more thing, my taglist is like this:
The top 4 tag value is Temperature, type is |
So for your last post, you say that if you read those tags as a list... for example...
... you have a value of None returned for the BOOL's but the REAL's will have the correct value? I just tried this and the BOOL's return True/False edit: when the value is None, is the status "Success", or something else? |
Value=None, status=path destination unknown. |
So Path Destination Unknown is the PLC's response, meaning it cannot find the tag you are requesting. I would triple check that the tag is spelled right when you are pulling it from the CSV. I've seen where people don't strip the whitespace, leaving a space at the beginning or end when they are parsing a CSV/TXT file. It's not always obvious. When they type the tag name themselves, they get it right and it works. |
I think we have multiple layers here. First we need to establish that your code that reads tags works fine from a laptop without using docker. I'm talking just pylogix and reading a list of tags, do not add any db, opc, or any other logic. If that code works fine without docker, next is to try it on docker again just pylogix. If that gives you a seg fault then is definitely a docker issue. https://dev.to/mizutani/how-to-get-core-file-of-segmentation-fault-process-in-docker-22ii If pylogix works fine on both laptop and container, then add the other logic little by little, first db, then opc, etc. Containers are super fragile for example I've had a container that wouldn't stay on just because a db wasn't named properly. Edit: I wrote all this before the last two responses. |
Btw I would suggest you wrap your code where you read tags in a try/catch and print tag.Name, and tag.Status on the catch for easy debugging on which tags are failing |
@TheFern2 you are right, there are two problems going on here. Reading and the segmentation fault. Segmentation fault suggests that the python interpreter crashed. From my experience, this happens when a program which binds to some other language crashes. For example, OpenCV, where they have a python layer that binds to C. I don't believe pylogix is causing the segmentation fault, something else is crashing. |
Yes, we are doing the something as you said, and need time. |
The tag name is right, and the code nothing changed, just update pylogix 0.6.4. to 0. 7. 7. |
Interesting. I read a list of 10 tags or so where the first few were REAL and the rest were individual BOOL's of an array. I'll do some more experiments to see if maybe the number of tags matter. |
Can confirm this same issue ("None" returned for bools when list of tags read) and had to downgrade to 0.6.7 before it would work. Single tags reads worked though. Edit: I keep the tag lists below ten items to prevent issues |
Curious, what controller are you working with? |
Contrologix but I don't know the model off the top of my head. I had used 0.6.7 on a different project with the same PLC. I switched to python 3.9 in between projects and upgraded pylogix at that time. I had a heck of a time trying to get this to work and went back to the version I had used previously so I couldn't tell you if it worked on any version between 0.6.7 and 0.7.x |
@dmroeder I've added a todo for me on the project to add list read for boolean to |
@dmroeder I am about to push a new test to master, testing BaseBOOLArray, at first glance it looks like latest code can only read 4 bool tags with Success, 5 and above all return |
Happy new year! I found this issue, Sorry, I mistouch the close button on my phone. |
@su600 I am not tracking what the Pillow issue has to do with this? Are you suggesting that pylogix has deep stack usage and therefore runs into problems on Alpine? There is an informative post about a similar problem with Alpine/musl. |
@dmroeder did a bit of troubleshooting this morning, I can confirm latest 0.7.7 there's a bug for boolean list read. #154 will def prevent this in the future as far as testing goes.
However the same test passes fine in 0.7.5:
|
I pushed a commit that should take care of this BOOL array in list issue. |
@dmroeder @TheFern2
site-packages/pylogix/eip.py:1627 is This segmentation fault randomly, and nothing to do with the taglist, I just read 1 tag in a loop, it is also occured. Could you analyze the code and provide a solution to avoid this? Thank you |
You're assuming that the issue is in pylogix. It may be network hardware or a driver. Have you done anything to rule that out? |
I mean it is a issue of Python socket. I'm trying to do a test on Raspberry Pi. |
@su600, do you experience this when you run your code outside of docker? |
When you say you ran one tag in a loop, did you happen to put a time.sleep in the loop? Please put a time.sleep(1) and monitor, if the issue occurs keep increasing sleep by 1s, until the stack trace doesn't occur. This looks more like the container network resources are throttling the connection, but that's just an educated guess I don't deal with docker much for pylogix. |
The segmentation fault occur in line 1627 time and time again for 6 times, after more test, the fault line is also random, but most related to socket. I have put sleep(5) each loop, and even sleep(0.1) and use My hardware OS is build by ourselves and is complex to prepare the environment outside of Docker. |
I would check if you can run a network monitoring tool for the container and see if it's hitting is limits. Also did you put a try/catch when you read tags? This will certainly prevent the program from crashing. When the catch happens log to a file and see how many times is happening. Might also want to log cpu_percent and virtal_memory from psutils. You could also log these two all the time and then compare with the log file in catch to see if there's a huge change. |
I have tested on Raspberry Pi 3B, armv7l, nothing goes wrong, both in Docker and outside Docker. |
Hi
My program Read tag values in a loop, 1 time/second.
This program running in Docker (base image is Python 3.7.4), on ARM64 platform. I have test many times, this
Segmentation fault (core dumped)
happen randomly.pylogix==0.6.4 and 0.7.7 both have this problem.
my code is like this.
For this code, random.random() won't Segmentation fault (core dumped), once I use
comm.Read
this happen randomly, Maybe after loops for dozens time or hundreds times.I am not familiar with C/C++, and don't know how to solve this. Please help.
The text was updated successfully, but these errors were encountered: