Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aimd pipelining in Segment fetcher #24

Merged
merged 2 commits into from
Jul 8, 2019

Conversation

dev-ritik
Copy link
Contributor

Updates SegmentFetcher.java to use AIMD pipe-lining as described here

The updated code is similar to this c++ implementation of segment-fetcher

@jefft0
Copy link
Contributor

jefft0 commented Mar 28, 2019

Can you provide backwards compatible versions of fetch, so that we don't break existing applications?

@jefft0
Copy link
Contributor

jefft0 commented Mar 28, 2019

The name of the Options class too general. It appears to be very specific to the AIMD algorithm. Please either

  • Rename it to something more specific like AimdOptions, or
  • Make it an inner class of SegmentFetcher
    public class SegmentFetcher {
        public static class Options {
        }
    }

@agawande
Copy link

agawande commented Mar 28, 2019

I too like the idea of inner class similar to cxx. Maybe you can add a simple unit or integration test.

Another thing I realized is whether we should look into getting rid of selectors (ndn-cxx will switch sometime in the future, some relevant info here: https://redmine.named-data.net/issues/4556). Never mind I was looking at the old code.

@jefft0
Copy link
Contributor

jefft0 commented Apr 3, 2019

The existing programs that use the SegmentFetcher are the examples that get information from NFD, because NFD sends the response as segmented. These example require NFD running on the local computer. In the root of original jNDN code, I do:

mvn install
cd examples
mvn test -DclassName=TestListRib

This prints:

Express interest /localhost/nfd/rib/list
RIB:
  /localhop/nfd/localhop/nfd route={faceId=264 (origin=0 cost=0 ChildInherit)}
  /localhost/nfd/localhost/nfd route={faceId=264 (origin=0 cost=0 ChildInherit)}

But when I do the same with your code, the response is empty:

Express interest /localhost/nfd/rib/list
RIB:

The SegmentFetcher should be backwards compatible with existing code. Here is where TestListRib uses the SegmentFetcher:

@dev-ritik dev-ritik changed the title Aimd pipelining in Segment fetcher [WIP]Aimd pipelining in Segment fetcher Apr 13, 2019
@dev-ritik
Copy link
Contributor Author

@jefft0 , on running the snippet you provided above, on the current HEAD of the branch, this was printed,

Express interest /localhost/nfd/rib/list
RIB:
  /ndn/ndn route={faceId=263 (origin=255 cost=0 ChildInherit)}
  /localhost/nfd/localhost/nfd route={faceId=261 (origin=0 cost=0 ChildInherit)}

I have added the complete output of the last command here

@jefft0
Copy link
Contributor

jefft0 commented Apr 15, 2019

Still doesn't work for me. What platform are you running? For example, macOS 10.13 or Ubuntu 16.04?

@jefft0
Copy link
Contributor

jefft0 commented Apr 15, 2019

... (I ask because I can try it on the same platform.)

@dev-ritik
Copy link
Contributor Author

@jefft0 I have modified the code to fix the bug. Now it prints,

Express interest /localhost/nfd/rib/list
RIB:
  /ndn/ndn route={faceId=265 (origin=255 cost=0 ChildInherit)}
  /localhost/nfd/localhost/nfd route={faceId=258 (origin=0 cost=0 ChildInherit)}

To my surprise, it was not working for as well me in the same branch today. Sorry for the troubles.
I hope the present state of code would pass the tests.

@jefft0
Copy link
Contributor

jefft0 commented Apr 15, 2019

The TestListRib example works for me now, too. Thanks.

@dev-ritik dev-ritik changed the title [WIP]Aimd pipelining in Segment fetcher Aimd pipelining in Segment fetcher May 14, 2019
@Pesa
Copy link
Member

Pesa commented Jun 4, 2019

It's really hard to review this as a sequence of 10 incremental changes on top of each other. Can you squash all the commits together into one and force-push?

@jefft0
Copy link
Contributor

jefft0 commented Jun 4, 2019

I ran a few tests and the updated SegmentFetcher is working.

@dev-ritik
Copy link
Contributor Author

@Pesa I squashed the changes into 1.

@dev-ritik
Copy link
Contributor Author

The current implementation leaves the validation part of the packet to the user as earlier, unlike the ndn-cxx. I shall update the PR with the Validator soon.

src/net/named_data/jndn/util/RttEstimator.java Outdated Show resolved Hide resolved
src/net/named_data/jndn/util/RttEstimator.java Outdated Show resolved Hide resolved
src/net/named_data/jndn/util/RttEstimator.java Outdated Show resolved Hide resolved
src/net/named_data/jndn/util/RttEstimator.java Outdated Show resolved Hide resolved
src/net/named_data/jndn/util/SegmentFetcher.java Outdated Show resolved Hide resolved
src/net/named_data/jndn/util/SegmentFetcher.java Outdated Show resolved Hide resolved
src/net/named_data/jndn/util/SegmentFetcher.java Outdated Show resolved Hide resolved
src/net/named_data/jndn/util/SegmentFetcher.java Outdated Show resolved Hide resolved
src/net/named_data/jndn/util/SegmentFetcher.java Outdated Show resolved Hide resolved
src/net/named_data/jndn/util/SegmentFetcher.java Outdated Show resolved Hide resolved
src/net/named_data/jndn/util/SegmentFetcher.java Outdated Show resolved Hide resolved
src/net/named_data/jndn/util/SegmentFetcher.java Outdated Show resolved Hide resolved
@dev-ritik
Copy link
Contributor Author

@Pesa Thanks for the review. I have resolved them, PTAL!

Copy link
Member

@Pesa Pesa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash the commits together again.

src/net/named_data/jndn/util/RttEstimator.java Outdated Show resolved Hide resolved
src/net/named_data/jndn/util/SegmentFetcher.java Outdated Show resolved Hide resolved
src/net/named_data/jndn/util/SegmentFetcher.java Outdated Show resolved Hide resolved
src/net/named_data/jndn/util/SegmentFetcher.java Outdated Show resolved Hide resolved
src/net/named_data/jndn/util/SegmentFetcher.java Outdated Show resolved Hide resolved
@@ -855,6 +862,9 @@ private void clean() {
private final Face face_;
private RttEstimator rttEstimator_;

private ScheduledThreadPoolExecutor scheduledThreadPoolExecutor_ =
(ScheduledThreadPoolExecutor) Executors.newScheduledThreadPool(5);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why you need threads to solve this particular problem... I admit I don't know the internals of jndn and how it normally schedules events, but it sounds odd to me that it doesn't have an event loop or other async facility...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally, jNDN uses callLater to schedule when to check for a timed-out interest.
https://github.com/named-data/jndn/blob/master/src/net/named_data/jndn/Node.java#L604

This not part of the "supported" public API, but from within the library it is OK to call it. It is a public method of Face. If the application is using the normal face with an eventLoop, then the callLater mechanism uses this. If the application is using a subclass of Face which has an ThreadPool, then it overrides callLater to use that. So you can use callLater.
https://github.com/named-data/jndn/blob/master/src/net/named_data/jndn/Face.java#L1433

Copy link
Member

@Pesa Pesa Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jefft0, that sounds like a much better idea.

I have another question: if a thread pool is used, does that mean that scheduled callbacks may be executed in another thread and therefore concurrently with the rest of the fetcher? If so, the fetcher internals need to be thread-safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Java is not like Boost asio where the idea is to dispatch everything to run on the same thread. Instead, in Java you dispatch to run on "some other" thread, and you never know which one. Especially, there is no way to get the async socket code to run on a particular thread. (I wish it wasn't like this, but that's how they designed Java.) So yes, the internals in a callback need to be thread safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, FTR, Asio doesn't force you to run everything on the same thread, in fact, it leaves complete control on where callbacks are executed to the application.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what will happen when we want to port this to NDN-CPP. What is the equivalent of ScheduledThreadPoolExecutor in C++. Will an application which simply wants to fetch some segmented content be forced to install Boost's thread handling library? The same question in PyNDN? Will an application which simply wants to fetch some simple segmented content be forced to install something like Trollius, which may be incompatible with the existing application. This is what we wanted to avoid by using this simplest call later mechanism possible. If you do complicate the SegmentFetcher in this want, you need to make sure that the API makes it easy to avoid the complicated thread management.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ritik, no, forget about the ScheduledThreadPoolExecutor, what I meant is to use the callLater mechanism to schedule the global periodic RTO check. This check would fire every few milliseconds and go through all pending Interests and check which ones have timed out (if any), then reschedule itself.

@jefft0, unless I'm missing something, this has no impact on the public API whatsoever, it's just an implementation detail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. In the case of Java, the complicated Threading support is already part of the standard library, so it is no extra burden for the application to include it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and for ndn-cxx and pyndn, I assume they have a similar callLater internal mechanism that we can use for this..?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, the only problem is cpp, I'm sure python already includes threading and async stuff in the standard library.

src/net/named_data/jndn/util/SegmentFetcher.java Outdated Show resolved Hide resolved
src/net/named_data/jndn/util/SegmentFetcher.java Outdated Show resolved Hide resolved
@dev-ritik
Copy link
Contributor Author

@Pesa @jefft0 PTAL at the updated code. If everything is fine, I will resolve other reviews.


private void afterNackOrTimeout(Interest interest) {
private void afterNack(Interest interest) {
if (System.currentTimeMillis() >= timeLastSegmentReceived_ + options_.maxTimeout) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about this logic? now it's not being checked anymore on timeouts (only on nacks), or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For interest lifetime timeouts, I am giving a timeout error to the user and ending the process! Should I implement something else? (The fetcher is based on cxx fetcher which doesn't implement interest lifetime timeout. Should I change the logic and go with that in the chunks?)
There is some problem with method naming and organisation and I shall fix it. I will include this check in the timeout too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For interest lifetime timeouts, I am giving a timeout error to the user and ending the process

That's not good. You can't abort the process at the first timeout. The fetcher has to keep retransmitting until it hits the maxTimeout value specified in the options (same as in the Nack case).

The fetcher is based on cxx fetcher which doesn't implement interest lifetime timeout. Should I change the logic and go with that in the chunks?

No, that part is acceptable(*). I was talking about maxTimeout and the fact that it seems to be applied only to Nacks but not to RTO timeouts in the latest code.

(*) well, there is one case that the current ndn-cxx code doesn't handle in the best possible way: the ndn-cxx fetcher assumes that the interest lifetime is greater than the estimated RTO, which is true in the normal case. However, if the app sets a very short lifetime, it's possible for the lifetime to expire before the RTO, but the fetcher won't react to this event (e.g. decrement nSegmentsInFlight) until the RTO timeout triggers later on. You don't need to replicate this limitation in the jndn version, in fact you should implement a better logic if you can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohk very well.
So is there any more expected difference in implementation of the RTO and interest lifetime timeouts apart from slightly different error message?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhm yeah the logic should be essentially identical (unless I'm forgetting about something), even the error message can be the same.

@dev-ritik
Copy link
Contributor Author

@Pesa @jefft0 PTAL

Copy link
Member

@Pesa Pesa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't know when I'll be able to take another more detailed look at this, I got busy with other things. I only have a couple of inline comments, plus a general feeling that error handling is still somewhat messy and hard to follow and reason about. More test coverage, especially for corner cases, would help... dunno what jndn polices are wrt this.

@@ -77,6 +84,7 @@
* - `SEGMENT_VERIFICATION_FAILED`: if any retrieved segment fails
* the user-provided VerifySegment callback or KeyChain verifyData.
* - `IO_ERROR`: for I/O errors when sending an Interest.
* - 'NACK_ERROR': Unknown network error occurred,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace trailing comma with period, also the description isn't really accurate... it's not "unknown", we know there was a NACK with an unhandled reason, so something like "unknown/unhandled NACK received"

* Interest: /{prefix}/{version}/{segment=(N+1))}
*
* 6. Call the OnComplete callback with a blob that concatenates the content
* 4. Call the OnComplete callback with a blob that concatenates the content
* from all the segmented objects.
*
* If an error occurs during the fetching process, the OnError callback is called
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(adding this comment on a semi-random line due to github limitations)

I just realized that the declared semantics for the INTEREST_TIMEOUT error code is completely different from the ndn-cxx segment fetcher. The current jndn semantics are unusable (and useless in most cases) from an application perspective. The app wants to know when the transfer as a whole failed due to the global timeout (which is usually several seconds or minutes), it doesn't care about any single Interest timing out. That's why the fetcher exists in the first place, to do retransmissions and hide these details from apps.

@jefft0 how do you want to proceed here? maybe you want to fix it after this PR is merged?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is OK to fix it after the PR is merged.

src/net/named_data/jndn/util/SegmentFetcher.java Outdated Show resolved Hide resolved
@Pesa
Copy link
Member

Pesa commented Jun 30, 2019

I don't see any major problems with the latest patch. But in any case I don't have time to do a more detailed review at the moment and I don't want to hold this up forever, so @jefft0 feel free to merge if you're satisfied with it.

My only concern (but you can ignore me) is that the tests are rather minimal. I wish they were covering a lot more cases, especially the weird edge cases, because the internal pipelining logic is non-trivial and it's easy to break something when touching the code in the future.

In cases like the test file, where response data is sent immediately as and when interest is received, updating the pendingSegments_ after sending
interest led to error.
@jefft0
Copy link
Contributor

jefft0 commented Jul 4, 2019

I'll look at this when I return to the office on Monday.

@jefft0
Copy link
Contributor

jefft0 commented Jul 8, 2019

For me, TestSegmentFetcher runs, and the SegmentFetcher is backward compatible with the existing examples. I'm ready to merge the code. We can make further fixes later. Sound good?

@Pesa
Copy link
Member

Pesa commented Jul 8, 2019

Sure

@jefft0 jefft0 merged commit 64538a3 into named-data:master Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants