-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document Track Processor Interface #1613
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very good, thank you!
It's basically LGTM from me (I only looked at the advanced.rst
after Controlling Task Execution Behavior With Custom Schedulers
, I assume the rest are the same). The only additional ask is whether we could have a high level description of DocumentSetPreparator
(can also be done in a follow up PR, since this has been in the making for a while).
docs/advanced.rst
Outdated
self.track = None | ||
|
||
def on_after_load_track(self, track): | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the three dots here make me think that we are deliberately skipping code to make the example more readable. The actual codebase as of now though doesn't defined this method at all, so maybe we should just replace ...
with pass
to prevent other readers of these docs wondering the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, sorry, there is some leftover here. I intended to document the duck-typed version of the default processor, but you can see it is also still subclassing here. I had pickling issues at the point where I tried to make TrackProcessor
an ABC, and so partially reverted that change
docs/advanced.rst
Outdated
|
||
In this case, you can see by default we do nothing here for ``on_after_load_track`` to mutate the track, but yield a tuple of the ``prepare_docs`` | ||
function and its parameters for each corpus in the track ``corpora``. After this is called, these tuples are given to each TrackProcessor worker actor | ||
to be executed in parallel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we explain a bit more what the expected params typically are? e.g. here we see the config object, the track, which are easy to understand, but it's unclear what the DocumentSetPreparator
does (passed via preparator
). A high level explanation would be beneficial in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thank you!
Closes #1534
In this PR, I have done the following:
Template Language
becomesTemplating Track Files with Jinja2
)A note for reviewers, the "meat" of the change here is at the bottom of
advanced.rst
, which by default is collapsed by GH as being too large. The rest is mostly extracted/moved content and some cleaned up typing in doc strings.