Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly short circuit core worker Get() on exception #5672

Merged
merged 3 commits into from
Sep 12, 2019

Conversation

edoakes
Copy link
Contributor

@edoakes edoakes commented Sep 10, 2019

Why are these changes needed?

We should be short circuiting calls to Get() in the core worker when we see an exception, but this wasn't propagated out of the plasma store provider (was causing some Java test errors).

Checks

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16926/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16936/
Test FAILed.

@zhijunfu
Copy link
Contributor

LGTM, thanks. I left a minor comment.

@@ -30,7 +30,8 @@ Status CoreWorkerMemoryStoreProvider::Put(const RayObject &object,
Status CoreWorkerMemoryStoreProvider::Get(
const std::unordered_set<ObjectID> &object_ids, int64_t timeout_ms,
const TaskID &task_id,
std::unordered_map<ObjectID, std::shared_ptr<RayObject>> *results) {
std::unordered_map<ObjectID, std::shared_ptr<RayObject>> *results,
bool *got_exception) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got_exception can be removed since it can be determined from the result map when there's an exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but that would require iterating over the full list of results even in success cases, which I would want to avoid especially because the IsException check loops over a list of types itself.

std::string error_string = std::to_string(ray::rpc::TASK_EXECUTION_EXCEPTION);
char error_buffer[error_string.size()];
size_t len = error_string.copy(error_buffer, error_string.size(), 0);
buffers_with_exception.emplace_back(
Copy link
Contributor

@pcmoritz pcmoritz Sep 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the above just a very intricate way of doing error_string.data() below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not, we need a non-const reference to the data. We could also do const_cast if you'd prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah no, let's not do that. Probably a const version of the constructor would be in order

LocalMemoryBuffer(uint8_t *data, size_t size, bool copy_data = false)
and an immutable version of the buffer, let's leave as is for this PR.

const TaskID &task_id,
std::unordered_map<ObjectID, std::shared_ptr<RayObject>> *results) = 0;
/// \param[out] got_exception Set to true if any of the fetched results were an
/// exception. \return Status.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline before \return Status.

@pcmoritz pcmoritz merged commit 0bf79cf into ray-project:master Sep 12, 2019
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16985/
Test FAILed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants