-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aimd pipelining in Segment fetcher #24
Conversation
Can you provide backwards compatible versions of fetch, so that we don't break existing applications?
|
The name of the Options class too general. It appears to be very specific to the AIMD algorithm. Please either
|
I too like the idea of inner class similar to cxx. Maybe you can add a simple unit or integration test.
|
The existing programs that use the SegmentFetcher are the examples that get information from NFD, because NFD sends the response as segmented. These example require NFD running on the local computer. In the root of original jNDN code, I do:
This prints:
But when I do the same with your code, the response is empty:
The SegmentFetcher should be backwards compatible with existing code. Here is where TestListRib uses the SegmentFetcher:
|
@jefft0 , on running the snippet you provided above, on the current HEAD of the branch, this was printed,
I have added the complete output of the last command here |
Still doesn't work for me. What platform are you running? For example, macOS 10.13 or Ubuntu 16.04? |
... (I ask because I can try it on the same platform.) |
@jefft0 I have modified the code to fix the bug. Now it prints,
To my surprise, it was not working for as well me in the same branch today. Sorry for the troubles. |
The TestListRib example works for me now, too. Thanks. |
It's really hard to review this as a sequence of 10 incremental changes on top of each other. Can you squash all the commits together into one and force-push? |
I ran a few tests and the updated SegmentFetcher is working. |
4638ab6
to
e242900
Compare
@Pesa I squashed the changes into 1. |
The current implementation leaves the validation part of the packet to the user as earlier, unlike the ndn-cxx. I shall update the PR with the Validator soon. |
@Pesa Thanks for the review. I have resolved them, PTAL! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please squash the commits together again.
a6e4b0a
to
bdaeeb4
Compare
@@ -855,6 +862,9 @@ private void clean() { | |||
private final Face face_; | |||
private RttEstimator rttEstimator_; | |||
|
|||
private ScheduledThreadPoolExecutor scheduledThreadPoolExecutor_ = | |||
(ScheduledThreadPoolExecutor) Executors.newScheduledThreadPool(5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why you need threads to solve this particular problem... I admit I don't know the internals of jndn and how it normally schedules events, but it sounds odd to me that it doesn't have an event loop or other async facility...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internally, jNDN uses callLater to schedule when to check for a timed-out interest.
https://github.com/named-data/jndn/blob/master/src/net/named_data/jndn/Node.java#L604
This not part of the "supported" public API, but from within the library it is OK to call it. It is a public method of Face. If the application is using the normal face with an eventLoop, then the callLater mechanism uses this. If the application is using a subclass of Face which has an ThreadPool, then it overrides callLater to use that. So you can use callLater.
https://github.com/named-data/jndn/blob/master/src/net/named_data/jndn/Face.java#L1433
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jefft0, that sounds like a much better idea.
I have another question: if a thread pool is used, does that mean that scheduled callbacks may be executed in another thread and therefore concurrently with the rest of the fetcher? If so, the fetcher internals need to be thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Java is not like Boost asio where the idea is to dispatch everything to run on the same thread. Instead, in Java you dispatch to run on "some other" thread, and you never know which one. Especially, there is no way to get the async socket code to run on a particular thread. (I wish it wasn't like this, but that's how they designed Java.) So yes, the internals in a callback need to be thread safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, FTR, Asio doesn't force you to run everything on the same thread, in fact, it leaves complete control on where callbacks are executed to the application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what will happen when we want to port this to NDN-CPP. What is the equivalent of ScheduledThreadPoolExecutor in C++. Will an application which simply wants to fetch some segmented content be forced to install Boost's thread handling library? The same question in PyNDN? Will an application which simply wants to fetch some simple segmented content be forced to install something like Trollius, which may be incompatible with the existing application. This is what we wanted to avoid by using this simplest call later mechanism possible. If you do complicate the SegmentFetcher in this want, you need to make sure that the API makes it easy to avoid the complicated thread management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ritik, no, forget about the ScheduledThreadPoolExecutor
, what I meant is to use the callLater
mechanism to schedule the global periodic RTO check. This check would fire every few milliseconds and go through all pending Interests and check which ones have timed out (if any), then reschedule itself.
@jefft0, unless I'm missing something, this has no impact on the public API whatsoever, it's just an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. In the case of Java, the complicated Threading support is already part of the standard library, so it is no extra burden for the application to include it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and for ndn-cxx and pyndn, I assume they have a similar callLater
internal mechanism that we can use for this..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, the only problem is cpp, I'm sure python already includes threading and async stuff in the standard library.
|
||
private void afterNackOrTimeout(Interest interest) { | ||
private void afterNack(Interest interest) { | ||
if (System.currentTimeMillis() >= timeLastSegmentReceived_ + options_.maxTimeout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about this logic? now it's not being checked anymore on timeouts (only on nacks), or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For interest lifetime timeouts, I am giving a timeout error to the user and ending the process! Should I implement something else? (The fetcher is based on cxx fetcher which doesn't implement interest lifetime timeout. Should I change the logic and go with that in the chunks?)
There is some problem with method naming and organisation and I shall fix it. I will include this check in the timeout too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For interest lifetime timeouts, I am giving a timeout error to the user and ending the process
That's not good. You can't abort the process at the first timeout. The fetcher has to keep retransmitting until it hits the maxTimeout
value specified in the options (same as in the Nack case).
The fetcher is based on cxx fetcher which doesn't implement interest lifetime timeout. Should I change the logic and go with that in the chunks?
No, that part is acceptable(*). I was talking about maxTimeout
and the fact that it seems to be applied only to Nacks but not to RTO timeouts in the latest code.
(*) well, there is one case that the current ndn-cxx code doesn't handle in the best possible way: the ndn-cxx fetcher assumes that the interest lifetime is greater than the estimated RTO, which is true in the normal case. However, if the app sets a very short lifetime, it's possible for the lifetime to expire before the RTO, but the fetcher won't react to this event (e.g. decrement nSegmentsInFlight
) until the RTO timeout triggers later on. You don't need to replicate this limitation in the jndn version, in fact you should implement a better logic if you can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohk very well.
So is there any more expected difference in implementation of the RTO and interest lifetime timeouts apart from slightly different error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uhm yeah the logic should be essentially identical (unless I'm forgetting about something), even the error message can be the same.
2067b9c
to
4537cfe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't know when I'll be able to take another more detailed look at this, I got busy with other things. I only have a couple of inline comments, plus a general feeling that error handling is still somewhat messy and hard to follow and reason about. More test coverage, especially for corner cases, would help... dunno what jndn polices are wrt this.
@@ -77,6 +84,7 @@ | |||
* - `SEGMENT_VERIFICATION_FAILED`: if any retrieved segment fails | |||
* the user-provided VerifySegment callback or KeyChain verifyData. | |||
* - `IO_ERROR`: for I/O errors when sending an Interest. | |||
* - 'NACK_ERROR': Unknown network error occurred, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace trailing comma with period, also the description isn't really accurate... it's not "unknown", we know there was a NACK with an unhandled reason, so something like "unknown/unhandled NACK received"
* Interest: /{prefix}/{version}/{segment=(N+1))} | ||
* | ||
* 6. Call the OnComplete callback with a blob that concatenates the content | ||
* 4. Call the OnComplete callback with a blob that concatenates the content | ||
* from all the segmented objects. | ||
* | ||
* If an error occurs during the fetching process, the OnError callback is called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(adding this comment on a semi-random line due to github limitations)
I just realized that the declared semantics for the INTEREST_TIMEOUT
error code is completely different from the ndn-cxx segment fetcher. The current jndn semantics are unusable (and useless in most cases) from an application perspective. The app wants to know when the transfer as a whole failed due to the global timeout (which is usually several seconds or minutes), it doesn't care about any single Interest timing out. That's why the fetcher exists in the first place, to do retransmissions and hide these details from apps.
@jefft0 how do you want to proceed here? maybe you want to fix it after this PR is merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is OK to fix it after the PR is merged.
e759c2f
to
5743d96
Compare
I don't see any major problems with the latest patch. But in any case I don't have time to do a more detailed review at the moment and I don't want to hold this up forever, so @jefft0 feel free to merge if you're satisfied with it. My only concern (but you can ignore me) is that the tests are rather minimal. I wish they were covering a lot more cases, especially the weird edge cases, because the internal pipelining logic is non-trivial and it's easy to break something when touching the code in the future. |
In cases like the test file, where response data is sent immediately as and when interest is received, updating the pendingSegments_ after sending interest led to error.
I'll look at this when I return to the office on Monday. |
For me, TestSegmentFetcher runs, and the SegmentFetcher is backward compatible with the existing examples. I'm ready to merge the code. We can make further fixes later. Sound good? |
Sure |
Updates
SegmentFetcher.java
to use AIMD pipe-lining as described hereThe updated code is similar to this c++ implementation of segment-fetcher