Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initialise_event_info_struct: initialise failed! #111

Closed
roxkamx opened this issue Sep 26, 2017 · 15 comments
Closed

initialise_event_info_struct: initialise failed! #111

roxkamx opened this issue Sep 26, 2017 · 15 comments
Assignees
Milestone

Comments

@roxkamx
Copy link

roxkamx commented Sep 26, 2017

Hi,

I have a project where I attach 32 devices grouped in 2 physical hubs ( 16 each ) to a Centos7 server.
With 16 devices my application works just fine.

When I start the application with both hubs attached ( i.e. 32 devices attached ) at the 24th device my application stops with the following error:

initialise_event_info_struct: initialise failed!

This message looks to be from RXTX ( as found on http://rxtx.qbang.org/pub/rxtx/rxtx-2.1-7r2/src/SerialImp.c )

Any suggestions how to overcome this issue? is there any limitation?
The same thing happened on 2 other different PCs, so I would exclude for now the USB driver.

Thanks

@madhephaestus
Copy link
Member

try 5.0.2

@madhephaestus
Copy link
Member

unfortunately as i try to understand how this is happening, i have run into the 'goto' problem. Fail can be reached by 2 ways, one is id this statement returns true:

if (eis->fd < FD_SETSIZE && eis->fd > 0) 

and the other

if(eis->send_event == NULL)

madhephaestus added a commit that referenced this issue May 22, 2020
so we know which of the 3 possible exit contidions it actually is.
@madhephaestus
Copy link
Member

Now the errors will be :

initialise_event_info_struct: eis->send_event == NULL!
initialise_event_info_struct: eis->fd >= FD_SETSIZE!
initialise_event_info_struct: eis->fd <= 0!

@wborn
Copy link
Contributor

wborn commented May 22, 2020

For me when the error occurs it results in:

initialise_event_info_struct: eis->fd >= FD_SETSIZE!

@madhephaestus
Copy link
Member

This issue seems to be related to this:

https://access.redhat.com/solutions/488623

@madhephaestus
Copy link
Member

It seems we would need to transition away from SELECT() and switch to poll()

http://man7.org/linux/man-pages/man2/select.2.html

to

http://man7.org/linux/man-pages/man2/poll.2.html

This will result in lots of opening and closing code needing to be checked when its changed, and a lot of changes to the data structures used to keep track of the serial port events.

@MrDOS
Copy link
Contributor

MrDOS commented May 22, 2020

I agree that switching to poll(2) would be preferable for many reasons, but at the same time, FD_SETSIZE is 1024 just about everywhere. That's a lot of fds, even with 32 physical ports. @roxkamx and @wborn, do your applications churn through opening/closing the serial port many times, or do you typically acquire a port instance once and then hold it open for as long as you can? I'm trying to understand if fd churn is happening due to library or application behaviour. Regardless, this is clearly bad behaviour on the part of the library, but I wonder if we can't find an application-level workaround in the meantime.

@wborn
Copy link
Contributor

wborn commented May 23, 2020

It seems the library is keeping thousands of file descriptors to temporary files, e.g. lsof -p PID lists these lines:

java    15241 wouter  955u   REG   0,26         0      947 /run/lock/tmpj1rrF2 (deleted)
java    15241 wouter  956u   REG   0,26         0      948 /run/lock/tmpPwuMWV (deleted)
java    15241 wouter  957u   REG   0,26         0      949 /run/lock/tmpgPk8dP (deleted)
java    15241 wouter  958u   REG   0,26         0      950 /run/lock/tmp356uvI (deleted)
java    15241 wouter  959u   REG   0,26         0      951 /run/lock/tmpKAGSMB (deleted)
java    15241 wouter  960u   REG   0,26         0      952 /run/lock/tmpE64g4u (deleted)
java    15241 wouter  961u   REG   0,26         0      953 /run/lock/tmpxsjGlo (deleted)
java    15241 wouter  962u   REG   0,26         0      954 /run/lock/tmpKDm6Ch (deleted)
java    15241 wouter  963u   REG   0,26         0      955 /run/lock/tmpmLexUa (deleted)
java    15241 wouter  964u   REG   0,26         0      956 /run/lock/tmpuMVYb4 (deleted)
java    15241 wouter  965u   REG   0,26         0      957 /run/lock/tmpSNrrtX (deleted)
java    15241 wouter  966u   REG   0,26         0      958 /run/lock/tmp8FMUKQ (deleted)
java    15241 wouter  967u   REG   0,26         0      959 /run/lock/tmpLbYo2J (deleted)
java    15241 wouter  968u   REG   0,26         0      960 /run/lock/tmpKxYTjD (deleted)
java    15241 wouter  969u   REG   0,26         0      961 /run/lock/tmpGWOpBw (deleted)
java    15241 wouter  970u   REG   0,26         0      962 /run/lock/tmpLTtWSp (deleted)
java    15241 wouter  971u   REG   0,26         0      963 /run/lock/tmpcAWtaj (deleted)
java    15241 wouter  972u   REG   0,26         0      964 /run/lock/tmptSf2rc (deleted)
java    15241 wouter  973u   REG   0,26         0      965 /run/lock/tmps2oBJ5 (deleted)
java    15241 wouter  974u   REG   0,26         0      966 /run/lock/tmp5pfb1Y (deleted)
java    15241 wouter  975u   REG   0,26         0      967 /run/lock/tmp6fWLiS (deleted)
java    15241 wouter  976u   REG   0,26         0      968 /run/lock/tmpGatnAL (deleted)

It's easily reproduced by running the example program in #180 and not inserting the USB stick. If I insert it after there are thousands of these file descriptors I immediately see the error. I think I've even seen it cause issues with other programs because of this.

@MrDOS
Copy link
Contributor

MrDOS commented Jun 4, 2020

This is really curious. Retrieving a specific port identifier by name (CommPortIdentifier.getPortIdentifier(String)) creates (and deletes) six lock files. And on my test system with six serial ports, 18 are created while scanning for port availability (CommPortIdentifier.getPortIdentifiers()). Then another one is created when opening the port, and another 12 are created while closing it.

Detecting ports opens each one, I think, to confirm that the user has access to the file. (I don't think there's a good reason for doing that instead of calling access(2), but that's the way it's done right now.) So that's one lock file per port detected, assuming the detection delegates out to the same routine for opening the port as when opening them regularly. On my test box, that should be six total, not 18. And if retrieving a port identifier also checks readability, then that should be another one, not six. And one would think that closing a port would not involve any open operations, just a close(2) for the port, and an unlink(2) for the lock file. So really, opening the port is the only operation which opens a sensible number of files. Wild. I'm going to dig in further.

@MrDOS MrDOS self-assigned this Jun 4, 2020
@MrDOS
Copy link
Contributor

MrDOS commented Jun 4, 2020

I can confirm that the problem only exists when using locking. @roxkamx, if you want a quick-and-dirty workaround, you can add #define DISABLE_LOCKFILES to somewhere near the top of src/main/c/SerialImp.h (anywhere above line 374) and recompile the natives for your platform.

@zyliu1985
Copy link

I can confirm that the problem only exists when using locking. @roxkamx, if you want a quick-and-dirty workaround, you can add #define DISABLE_LOCKFILES to somewhere near the top of src/main/c/SerialImp.h (anywhere above line 374) and recompile the natives for your platform.

Hi, is there any plan to fix this issue? Thank you.

@MrDOS
Copy link
Contributor

MrDOS commented Sep 12, 2020

Yes, I'd like to replace all of the locking with something more modern. But I don't have a timeline for getting to do it. Are you able to work around the issue for now by disabling lockfiles?

@MrDOS
Copy link
Contributor

MrDOS commented May 4, 2021

@roxkamx and @zyliu1985, can you please test again with this build of the library (from this workflow run) to see if #216 fixed this issue for you?

@wborn
Copy link
Contributor

wborn commented May 7, 2021

Many thanks for having a look at this @MrDOS! I gave the build with the changes a test and it seems to fix the issue for me. 👍

All the temporary lock files are gone from the output of lsof when testing it with my example of #180 (and I don't insert the USB device).

It also worked fine with openHAB. 🙂

@MrDOS MrDOS closed this as completed May 7, 2021
@farfade
Copy link

farfade commented Aug 19, 2021

Hello, wouldn't worth it releasing a stable version of the library with this fix because it seems to be blocking, especially when dealing with debian bullseye (see openhab problems) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants