-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
event: Error from accept() call: Too many open files #870
Comments
Thanks for a thorough error report. I tried updating my shairport-sync to see if I could reproduce, but unfortunately couldn't. I also didn't see shairport-sync make those dacp requests. Do you have any pointers on steps to reproduce? In any case you are right that forked-daapd should have some sort of protection againt log file spamming. I will look into that. |
To reproduce
/tmp/fo-da/sbin/forked-daapd -f
[ LOG] lib: Library init scan completed in 4 sec (0 changes)
lsof -p $(pidof forked-daapd) |wc -l
256 # sometimes there are 255
systemctl start shairport-sync.service
# find output name and number
mpc outputs
mpc enable 38252
mpc -q --wait load file:/home/media/playlists/radio
mpc -q play 3
lsof -p $(pidof forked-daapd) |wc -l Other findingsShairport-sync repository: https://github.com/mikebrady/shairport-sync struct timeval tv;
tv.tv_sec = 0;
tv.tv_usec = 500000;
if (setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, (const char *)&tv, sizeof tv) == -1)
debug(1, "dacp_send_command: error %d setting receive timeout.", errno); In wireshark I can see that the RST-packet sends 0.5 sec after shairport-sync receives the ACK for GET /ctrl-int/1/playstatusupdate?revision-number=2 HTTP/1.1 One can wonder why forked-daapd does not answer this GET in less then 0.5 seconds? |
Because playstatusupdate with revision number 2+ is not a normal request, it is a long poll. The idea is that if there is play status change then the request is replied. So the request can hang for a pretty long time. Having a 0.5 second timeout on that request doesn't make sense, so not sure why that is. I am pretty sure the same applies to iTunes, though I haven't tested that in a while. In any case I was still unable to reproduce, for some reason my shairport-sync is not making those requests at all. However, by mocking the requests with curl, cancelling with ^C + using Valgrind I can see that a memory leak is reported. So like your results it would indicate that there is some cleanup that isn't done. Debugging that requires some extra effort, since forked-daapd uses libevent for this stuff, but I am looking into it. I can probably do something to avoid the memleak, and then you can test if it also solves your issue. |
And here is branch where I have made some protection against repeated logging: |
I have now added a commit 4ecef8f to the same branch with a possible fix for the memleak. Let me know if it also fixes the issue with the open files. |
Thanks for your quick effort! Sorry, I was not prepared for that. I am away over the weekend and will test during next week. |
I'm encountering the same problem. Using the branch mentioned, the opened sockets is still increasing. Once they fill up I'll report back with the logging situation. EDIT: Log doesn't explode but CPU does go to 100% and the web UI and remotes are unresponsive. |
logavoidrepeat1 does not log anything when it reaches the "Too many open files"-case above. The CPU goes up to 100% and the open file count holds at 1205 as before. I would expect at least one error message. If I then try to change the song the next CPU goes up to 100% and I get an error in the log:
So yes the logging has changed to the better. |
Here is the issue I opened on shairport-sync: |
I suppose the CPU issue is because some other part of forked-daapd starts looping when the open file limit is reached. So it would be interesting to know what part that is. I've added a commit to the branch that also has log protection on debug level, so if you could try with that + set the log level to debug it might be possible to identify. I suggest you keep an eye on it while testing, because the log limiter won't stop the logging if multiple lines start repeating. The main task is of course still to figure out why those files are not closed. I'm not sure how to do that, except by me taking another stab at reproducing. My shairport-sync isn't making those requests, but maybe I just need to change some setup/config to enable them. |
This isn't with the latest debug log protection, but this is all the logs show:
If it is any help to reproducing this problem on your end, this is my shairport-sync config:
And my forked-daapd config:
|
Neither Here follow two cycles of debug output ( Hope this helps. Tom
|
Thanks to @tomgadow for bringing this to my attention. Let me look into the problem over the next few days. |
Thanks indeed to @tomgadow for such an exemplary bug report!
Shairport Sync will query the dacp status if it is built with support for the The 0.5 second timeout used by Shairport Sync was chosen somewhat arbitrarily. It was long enough not to cause problems with iTunes or iOS (and also the newer macOS Music app) and short enough to allow for a quick reaction to the disappearance of the player from the net. It would be safe to change that to, say, 3 seconds, as the setting does not affect any other connections. I did not try it with forked-daapd, I'm afraid. If you are going to experiment with Shairport Sync, I suggest you switch to the latest |
Thanks for the info, I will try with those build settings + the dev branch. I think it is high priority to have forked-daapd work with shairport-sync, so it is great to hear from you directly @mikebrady.
I don't think that will help with this problem. forked-daapd (and iTunes, I think?) treat I'm not sure what the indended use of playstatusupdate is for shairport-sync, but if you need to make standard request for playback status you just need to use revision number 1 instead of 2. Or I think you can also just not use any revision number. |
Thanks @ejurgensen. Agreed on the priority – I am a big admirer of your work, and apologies for not testing against it. I was not aware of the special significance of revision number If nothing had changed, some other code (403 perhaps?) was returned and one used the same revision number for the next query. |
@ejurgensen, is it possible that forked-daaapd could return a |
Same to you, shairport-sync is great! Yes, forked-daapd will indeed return a cmsr of 2 or more. You can see how these requests are handled here: https://github.com/ejurgensen/forked-daapd/blob/master/src/httpd_dacp.c#L2229 It's been a while since I worked with this part, but here is how I recall remotes normally use playstatusupdate:
The reply is always a 200 and the fields in the reply are the same (see make_playstatusupdate). How is shairport-sync using playstatusupdate? Is it polling at regular intervals? |
Thanks for the clarification. Yes indeed, Shairport Sync polls at regular intervals, as I didn't know the significance of the Okay, so let me do a little bit of checking and revert to you. |
Thanks to your instructions I was now able to reproduce the problem. It seems clear that the connection is not cleaned up. @tomgadow, you can probably also see that there is no logging of "Update request: client closed connection", and that means that libevent didn't register the disconnect. Drilling down, it looks like there are two parts to the issue:
* Not sure if it keeps making them 24/7? If so that means a quarter of million (60 x 60 x 24 x 3) daily requests are made by each shairport-sync speaker. |
Thanks for the update. Shairport Sync polls each second while playing and every three seconds when playing has stopped and while the remote control port does not completely reject requests. Furthermore, it times out after a period of inactivity, maybe 15 minutes. The long poll available for revision number 2 is certainly something to look at – it has the potential to greatly simplify this whole arrangement. Meantime, the quick-fix conservative approach for Shairport Sync is to avoid sending a I have not yet checked, but do iOS or iTunes/Music return a |
It is not just the value 2, it is all values above 1. There is no special treatment of 2 in itself. Both iTunes and forked-daapd can and will return cmsr values higher than 2. I will test with iTunes later to confirm. |
Hmm. Just a quick check using macOS Music as the client using a slightly hacked case 'cmsr': // revision number
t = sp - item_size;
debug(1," Revision Number sent: %u, received: %u", revision_number, ntohl(*(uint32_t *)(t)));
if (revision_number == 1) {
revision_number = 2;
debug (1,"Setting revision number to 2.");
} else
revision_number = ntohl(*(uint32_t *)(t));
break; No sign of either the long poll or any timeout problem.
I'm afraid I'm somewhat mystified. Maybe I'm doing something really dense – I'll check later. |
I have now tested with iTunes (I assume it behaves like Music) using Wireshark, and I think I can explain what you are observing, and why it is in a way working with iTunes and Music. So, as you know, the response to your initial request from both servers has a cmsr value that you are supposed to use as a revision number for the next request. This value can be 2 or something else, that doesn't make a difference to what happens next. shairport-sync will make the next request, and what happens then depends on the server: iTunes will return a 403 Forbidden, and forked-daapd will not reply within the timeout because it is a long poll. Note that the 403 doesn't mean "nothing changed", it actually means forbidden. That is because iTunes doesn't accept the request. You can check that by dumping the traffic between Remote and iTunes: Remote doesn't get a 403, iTunes accepts the request and iTunes "hangs", just like forked-daapd. However, despite the 403's it still works for shairport-sync and I think that is due to the below. shairport-sync will continue polling with the same revision number, and it will get 403's/timeouts. For iTunes, however, at some point it will get an actual reply with a new cmsr. If I understand correctly, shairport-sync interprets this as a status update, and indeed it is, because the reason iTunes now gave an answer was that the server side revision number increased due to an event. The thing is that iTunes only authorizes the request if you call it with the correct, current revision number. If you call it with any other revision number it will return which one to use (via the cmsr tag). And that is what happened after the server side event, the revision number was no longer the current, so iTunes returned a new one. I don't know how you would get iTunes to accept the request, but since there are non-Apple Android remotes I suppose it must be possible. |
Thanks for that. It explains the significance of the revision numbers but it does not explain the absence of the long poll, which is the crux of the problem here. For that, I have a conjecture – the player has two different modes. For a regular remote control (e.g. the Remote app) to work with iTunes, it has to be PIN validated, but Shairport Sync does not use PIN validation. Instead, Therefore, it may be that the client has a mode for interacting with PIN-validated remotes like Remote and this mode includes the long poll behaviour. The client has another mode for interacting with the output device, e.g. Shairport Sync, but this mode does not provide the long poll. Further, I'm guessing that Forked Daapd has the same behaviour whether the remote device is PIN-validated or not. If I'm right, then two things follow:
|
I just pushed an update to the |
Yes I agree, you could remove the timeout. I haven't tested your commit, but is shairport-sync ok with the dacp thread blocking on dacp_send_command()? I imagine that could be a problem, so perhaps you would need to implement async/non-blocking requests? I don't think it is likely that iTunes/Music has different modes of operation, that would entail bad design for number of reasons:
You are right that iTunes does authorize Remotes and speakers in different ways. Remotes are auth'ed with pairing + session-id's, and speakers with active remote. I haven't come across other speakers that use playstatusupdate. If such exist it would be interesting to catch the traffic from them. If I was you I would still try using rev number 1. That will also avoid the 403's from iTunes, and I don't suppose you are interested in them? |
Just referring back to this; perhaps I should have explained. Each of these line 627 messages was output when a status request was accepted. The revision numbers being sent were always the correct ones. It's just that when nothing had changed -- and thus the change the revision number referred to hadn't happened yet -- the player sent back a 403 rather then entering a long poll. As I outlined in the previous post, I guess the difference in behaviour between immediately returning a 403 or entering a long poll is due to different modes of operation within the player because of the different styles of access of the remote status requester. |
Yes, I understood the output, I saw the same behavior. I don't know if you saw the post I just made, but as you can see I don't believe that long polling is only for remotes. |
Hmm. I'd really like to avoid this. You're right though. Long polling is blocking. It means that Shairport Sync could not send commands to Forked Daapd, so that's not going to work.
Well, they don't prevent it – it seems to be available, but it seems it is available only if PIN-validated. And long polling means you can't, for example, use a single thread to deal with traffic. You are right about the lag. The network issue is moot – iTunes generates lots of repetitive traffic, e.g. the OPTIONS message. Any by comparison with the overall traffic it is a non-issue.
Code 403 can be used to indicate that the server is refusing to take action, which is exactly the case here. Remember that the source has to be the output device itself, so authentication isn't an issue. If the source of the request has a different IP number, is does not get back 403 (offhand, I can't remember what it is, but it's different). Anyway, authentication issues would be signalled with with a 401 code, I'd have thought. My guess is that it's a legacy behaviour.
Revision Number 1 always responds with a complete status, whether there have been changes or not. This means that the remote device has to sift through it to see if any actual change has occurred. It's not a big deal to accommodate this in something like Shairport Sync but would be an energy issue on a power-constrained device like a battery powered remote.
Well it does explain what is actually out there. It does seem that the endpoint acts differently depending on how you interact with it.
I haven't come across them either – I'm just trying to make sense, and take advantage, of what's out there now.
Yes. Overall, it seems as if Forked Daapd behaves differently from iOS/iTunes/Music. What I might have to do is make a special case for Forked Daapd only – use Revision Number 1 for it always and continue to use the existing system for all other players.
As I mentioned above, I think that remotes get long polling because they are PIN-validated. |
Ok, maybe we have different viewpoints here :-)
I don't it is plausible that Apple wants speakers to PIN validate (they often have no UI) just to do long polling. So I still don't believe in the two modes of operation. I think the reason for the 403 is very simple, either there's some additional data iTunes wants, or there is simply a bug in iTunes that isn't a priority because speakers using playstatusupdate are rare.
You can absolutely use a single thread, but you would need an event loop.
I don't really agree that it is the case here. In any case you are not supposed to try again if you get a 403, which is what shairport-sync is doing.
That is not necessary, you can just check if the cmsr changed. |
Great stuff, thanks.
You may be right, of course, but the behaviour is there a long time.
That's a good point.
Sure, but you have to parse it first. Sigh. It would be great to have a spec... :) |
I just pushed another update to the Hopefully it addresses the immediate problem. It works nicely for me, and I'd be grateful if you would give it a try. |
I have tried with your update, and it works very nicely. I have tested it against both forked-daapd master and the branch logavoidrepeat1 I made to fix the issues reported by @tomgadow, and it works with both. I have also added some commits to the branch, one includes the libevent workaround mentioned above that should protect against missing cleanup (too many open files), and the other makes sure that forked-daapd increases cmsr after playback events, even if there are no current update requests (so like it appears iTunes does). So I'm optimistic the issue is about to be solved, but it would of course be great if you could confirm @tomgadow |
Summary aka TLDR;Both fixes fixed the "Too many open files"-issue for me. The long story@mikebrady wrote:
This seems to match. Debian uses all three config flags. Myself I turned off mqtt. I wasn't to keen on installing mqtt-dev libraries. @ejurgensen wrote:
This is correct. There is no string
It is interesting that you mention libevent. I checked and had three libevent packages installed on my Debian testing.
and
I checked forked-daapd before and after removing libevent v2.0.21 and it used in both cases
which is linked to
which is part of Debian package I checked my log files and found the first The latest forked-daapd (v26.4) in Debian testing is from 2019-01-07. Shairport-sync v3.1.7 came 2018-01-26 and v3.2.1 2018-09-15. I found out that v3.2 is the first version which has this issue but was never in Debian testing so I take v3.2.1 as a reference. You see the dates don't add upp. I should have experienced the problems att least since somewhere in 2018 and not in late 2019. Libevent v2.1.8 came 2017-08-06 and v2.1.11 2019-11-17. I found in the dpkg-logfile that I installed libevent v2.1.11 on 2019-11-24. The same day my problems began. So libevent can play a role as culprit in this case. There are maybe some changes in connection handling between v2.1.8 and v2.1.11? My testing resultsshairport-sync mikebrady/shairport-sync@adbba7d against 04ccb4dI tested the latest development shairport-sync against logavoidrepeat1 forked-daapd. Result: The "Too many open files"-issue has gone. Shairport-sync seems to ask playstatusupdate every time with revision-number=1 and gets an answer directly:
The open file count is stable at 265 files. shairport-sync mikebrady/shairport-sync@b1a5056 against 59d7db1I tested shairport-sync v3.3.5 against the latest logavoidrepeat1 forked-daapd. Result: The "Too many open files"-issue has gone:
Inspite of shairport-sync's 0.5 sec timeout during playstatusupdate paket with revision-number=5 the open file count stays stable ±1 file. shairport-sync mikebrady/shairport-sync@adbba7d against 59d7db1I tested the latest development shairport-sync against the latest logavoidrepeat1 forked-daapd. Result: The "Too many open files"-issue has gone. This was expected :-). Thanks a lot guys! You are great! Now we have all time in the world to decide if shairport-sync should keep "revision-number=1" in the future or should go back to 1+ and save a tiny bit of bandwith and delay. As I understand it is not easy to decide whats right or wrong without a written specification. I hope some speaker device in the future will show us the way :-). Greetings! |
Thanks for excellent feedback, that is very much appreciated. I have also tested the changes with the remotes I had available, and I didn't find any regressions. So I have now merged the fixes in the branch into forked-daapd's master. Also many thanks to you @mikebrady, hope to get in touch with you again some other time, we probably have many shared interests and issues. Airplay 2 for instance :-) |
Thanks @tomgadow and @ejurgensen. It was a pleasure and a learning experience to help to resolve this. |
Hi!
TLDR;
shairport-sync (at least version 3.3.5) can block forked-daapd. Forked-daapd will in this case fill the logfile and the harddisk with GBs of the same error message in a couple of hours.
The long story
My machine runs Debian testing with all updates and forked-daapd is installed in version 26.4. I tested the latest version (27.0), too with the same problem.
Sometimes forked-daapd seems to "crash". It uses 100% CPU time and in the logfile one can found:
In this case it is not possible to connect to forked-daapd with new commands but if I catch the error early I can stop the running song with an already connected client. Everything is then back to normal.
If I did not catch the error in time my connected client will loose the connection and I only can kill the forked-daapd process.
The warning repeats a couple of million times per minute so it's easy to fill a 100 GB harddisk within a couple of hours.
The open file limit for the process is:
And during the "crash" I can see the process has really reached this limit:
I can see that the file count goes up with about 40 files every minute and resets with every new song playing to about 250 open files. Because of this reset at the start of a new media file I only see the problem if I am listening to an internet stream.
Then the lsof list has about 1000 entries like this one:
I got the latest forked-daapd (27.0) from GitHub and tested:
And started as root:
27.0 has the same problem - the open files counted up. Strace did not help either. The output was quiet after everything had startet but the open files counted up.
I have three clients in the network.
After I stopped shairport-sync 3.3.5 everything worked fine. The open files count stopped completely.
I checked with tcpdump and can see that shairport-sync 3.3.5 opens 3 new connections about every 1-2 seconds:
shairport-sync opens a new connection for every question - I don't see this behaviour with 2.8.6. That version seems to reuse the connections - The first two connections get answered and closed via FIN, ACK-sequence.
The third connection gets canceled every time via RST, ACK sent from shairport-sync.
I would say the third connection is the culprit. Maybe forked-daapd does not catch the reset flag and keeps the connection/file open?
Can you confirm this?
Would it be possible for you to implement that
This would be great :-)
Thanks for your help!
Tom
PS: I will investigate further on the shairport-sync side and keep you updated.
The text was updated successfully, but these errors were encountered: