Skip to content
This repository has been archived by the owner on Oct 10, 2019. It is now read-only.

Fetch feeds in parallel ? #581

Open
kaihendry opened this issue Jul 18, 2017 · 24 comments
Open

Fetch feeds in parallel ? #581

kaihendry opened this issue Jul 18, 2017 · 24 comments

Comments

@kaihendry
Copy link

newsbeuter 2.9 - http://www.newsbeuter.org/
Copyright (C) 2006-2010 Andreas Krennmair

newsbeuter is free software and licensed under the MIT/X Consortium License.
Type `newsbeuter -vv' for more information.

Compilation date/time: Mar  8 2017 21:56:57
System: Linux 4.11.9-1-ARCH (x86_64)
Compiler: g++ 6.3.1 20170306
ncurses: ncurses 6.0.20170527 (compiled with 6.0)
libcurl: libcurl/7.54.1 OpenSSL/1.1.0f zlib/1.2.11 libpsl/0.17.0 (+libicu/59.1) libssh2/1.8.0 nghttp2/1.23.1 (compiled with 7.53.1)
SQLite: 3.19.3 (compiled with 3.17.0)
libxml2: compiled with 2.9.4

It would appear at least from the UI that refreshing feeds goes one by one. Why? It should do them all in parallel, no?

@polyzen
Copy link

polyzen commented Jul 18, 2017

reload-threads (parameters: ; default value: 1)
The number of parallel reload threads that shall be started when all feeds are reloaded. (example: reload-threads 3)

@kaihendry
Copy link
Author

Why is this a parameter? If I have 20 feeds, I expect 20 reload threads, no?

@polyzen
Copy link

polyzen commented Jul 18, 2017

It depends on your network's capabilities.

@kaihendry
Copy link
Author

I'm pretty sure my network can handle many hundreds of parallel connections like any other network...

@polyzen
Copy link

polyzen commented Jul 18, 2017 via email

@Minoru
Copy link
Collaborator

Minoru commented Jul 18, 2017

Really like the reload-threads 0 idea, though the default should still be kept low (perhaps 20?) because each thread takes up memory and not all of us got gigabytes of it laying around.

And we should limit the number of connections per host like browsers do, though in our case we can set it to 1. I wager that:

  • it's rare to fetch multiple feeds from one host; and
  • in such cases, the host itself is well-hosted (Google's Feedburner, BBC etc.), so it'll respond quick enough anyway.

Anyone willing to draw up a PR?

The issue fixed by this commit appears to be in another bugtracker.

Yes, it's on Google Code. No details there, though.

@polyzen
Copy link

polyzen commented Jul 18, 2017

should limit the number of connections per host like browsers do, though in our case we can set it to 1

An option like reload-threads-per-server would be nice for this. My feeds include ~20 YouTube feeds and ~30 GitHub feeds. Firefox still defaults to 6 here, but newsbeuter seems happy fetching these all simultaneously.

Edit: I do see a quick CPU spike across my 4 cores when reloading all feeds.

Anyone willing to draw up a PR?

If I could. :)

@Minoru
Copy link
Collaborator

Minoru commented Jul 18, 2017

An option like reload-threads-per-server would be nice for this.

I think users will just jack it up real high in hopes of getting their feeds a second earlier. Which might even not happen, i.e. it'll be slower, because of TCP slow start.

newsbeuter seems happy fetching these all simultaneously

I consider this a bug :)

@polyzen
Copy link

polyzen commented Jul 18, 2017

FWIW, it appears to check them all in under a second.

@polyzen
Copy link

polyzen commented Aug 4, 2017

Actually I've found it misses several feeds when fetching all at once.

@Minoru
Copy link
Collaborator

Minoru commented Aug 4, 2017

What do you mean by "misses"? How do you check that?

@polyzen
Copy link

polyzen commented Aug 5, 2017

I had enabled delete-read-articles-on-quit, which I didn't know would cause most (if not all) of my feeds to re-add articles, and some were missing until I lowered reload-threads to 30. Have not played with reload-threads since then.

@polyzen
Copy link

polyzen commented Aug 5, 2017

After launching just now (with auto-reload enabled), I've found at least three more feeds that "weren't previously fetched".

Minoru added a commit that referenced this issue Aug 9, 2017
This hides the articles from the user (effectively deleting them), but
still keeps the article in the cache so that future updates don't bring
them back. The articles will be deleted for real only when they fall out
of the feed.

Kudos to @polyzen for bringing this up in #581 (comment)
@Minoru
Copy link
Collaborator

Minoru commented Aug 9, 2017

I had enabled delete-read-articles-on-quit, which I didn't know would cause most (if not all) of my feeds to re-add articles

I just pushed a fix to master, can you please test? Commit ID is cc5d03e

I'm not sure what's up with articles missing if you have reload-threads set to some high(-ish) value. If you enable error-log, would you see any errors there? (I assume the problem is intermittent and running with -dnewsbeuter.log -l6 isn't an option due to the size of the log.)

@polyzen
Copy link

polyzen commented Aug 10, 2017

@Minoru, thank you, it does work.

Will keep an eye on the error log.

@polyzen
Copy link

polyzen commented Aug 10, 2017

[2017-08-10 15:28:22] Error while retrieving https://www.youtube.com/feeds/videos.xml?channel_id=UCR-QYzXrZF8yFarK8wZbHog: HTTP response code said error
[2017-08-10 15:28:22] Error while retrieving https://protonmail.com/blog/feed/: SSL connect error                      
[2017-08-10 15:28:22] Error while retrieving https://www.youtube.com/feeds/videos.xml?channel_id=UC2DjFE7Xf11URZqWBigcVOQ: HTTP response code said error
[2017-08-10 15:28:23] Error while retrieving https://www.youtube.com/feeds/videos.xml?channel_id=UC7pp40MU_6rLK5pvJYG3d0Q: HTTP response code said error

Edit: Got basically the same results at the next auto-reload. ProtonMail apparently borked SSL for their feed. No errors when I :set reload-threads 30 and manually reload-all.

Edit2: 8 YT feed errors after setting reload-threads to 30 in the config and restarting newsbeuter. None after lowering it to 20 and a subsequent restart. Only 1 error after setting it back to 30 and restarting. Not getting errors for ProtonMail anymore..

Edit3: Another auto-reload: 2 more YT errors and the ProtonMail error is back.

Edit4: I receive YT errors nearly every auto-reload. reload-threads has been set to 10 for the last two auto-reloads.

@Minoru
Copy link
Collaborator

Minoru commented Aug 11, 2017

I'll try to add the actual HTTP code to the log tomorrow—right now it's quite useless. Knowing what kind of errors it is will help understand what can/should we do about them. I wager it's 429 because you're fetching a lot of stuff simultaneously.

@polyzen
Copy link

polyzen commented Aug 12, 2017

Today with reload-threads set to 30:

[2017-08-11 18:33:36] Error while retrieving https://lwn.net/headlines/newrss: Peer certificate cannot be authenticated with given CA certificates
[2017-08-11 19:33:36] Error while retrieving https://lwn.net/headlines/newrss: Peer certificate cannot be authenticated with given CA certificates
[2017-08-11 20:33:37] Error while retrieving https://lwn.net/headlines/newrss: Peer certificate cannot be authenticated with given CA certificates
[2017-08-11 21:33:36] Error while retrieving https://lwn.net/headlines/newrss: Peer certificate cannot be authenticated with given CA certificates
[2017-08-11 22:33:36] Error while retrieving https://lwn.net/headlines/newrss: Peer certificate cannot be authenticated with given CA certificates

@Minoru
Copy link
Collaborator

Minoru commented Aug 12, 2017

(Edited your comment to move the log excerpt from external Pastebin into the comment itself. Had enough Paste- and imagebin links expire on me to not trust any. Nothing short of GitHub's death should take this issue tracker down :)

SSL errors are easily explained if you look at LWN's certificate—it has been issued yesterday, on August 11th. Apparently they forgot to do it and the HTTPS was down for a few hours. No worries.

@Minoru
Copy link
Collaborator

Minoru commented Aug 12, 2017

@polyzen, I just added the HTTP code to the error message. Let's see what causes those fetch failures for you!

@polyzen
Copy link

polyzen commented Aug 13, 2017

The error log was empty all day, until 3 404's from YT about an hour ago.

@Minoru
Copy link
Collaborator

Minoru commented Aug 13, 2017

Hmm. A fluke on YouTube's site? That's weird, but I can't do anything about it.

I'm not even sure what we're looking for anymore. Apparently reload-threads doesn't produce a ton of errors, so... problem solved?

@polyzen
Copy link

polyzen commented Aug 13, 2017

Thank you. Will continue to play around with the setting.

@polyzen
Copy link

polyzen commented Aug 14, 2017

Not seeing any noteworthy errors while fetching all feeds at once.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants