Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI: Gracefully handle stat errors #4833

Merged
merged 6 commits into from
Nov 9, 2018

Conversation

DingoEatingFuzz
Copy link
Contributor

Sometimes requests error. That's just life, man.

This adds a check before appending stat frames to see if the response is ok. If the response isn't ok, then no frame is appended (since there is no frame to append).

That's enough to make sure there are no runtime errors, but it isn't very informative since it doesn't surface information about the underlying problem.

To address the lack of insight, frame misses are tracked as well. So if X consecutive stats requests fail in a row, then the tracker "pauses".

Pausing is a pre-existing tracker feature that adds a null data frame so any visualization of the data can account for missing data. The line charts do this by showing gaps.

Put this all together and you get a tracker that continuously polls every 2 seconds or so and handles errors like a champ. However, if too many errors occur in a row, the user is informed that the data isn't just incredibly consistent, it's actually not updating.

1. Check if the response is a 4xx/5xx
2. If it is, skip the append step and track a frame miss
3. If enough frame misses occur in a row, treat it as a pause

A "pause" is when a null data frame is added, which shows up as a
gap in line charts.
Copy link
Contributor

@johncowen johncowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey,

I've had a brief look over the code, looks nice and clean! I've added a few trivial-ish comments, one is actually related to earlier today.

I'm guessing I should see a break in the graphs when I get 500's? I was just wondering what the best way to see this was? I'm assuming I need to use mirage not the demo site here? (I like the new USE_MIRAGE setting by the way, handy!)

When I use mirage I just don't see anything in the allocation graphs?

Please shout me offline if that's easier, appreciate I've done a quite a bit of reviewing and asked a lot of questions of you today, thanks for bearing with me

John

// used: 401 * 1024 * 1024,
// percent: 401 / 512,
// };

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you generally don't like lingering comments, so thought I'd check you'd noticed this, not a biggie for me especially as its in tests, maybe it helps to help folks understand what's happening here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yep! This should be deleted. This code is now expected of the caller of the behavior.

.then(res => res.json())
.then(frame => this.append(frame));
.then(jsonWithDefault({ error: true }))
.then(frame => this.handleResponse(frame));
} catch (error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More ember-data adapter like things that normally you might use ember-data for (see my comment from earlier to do on another PR). Actually now I've seen this again, did you say somewhere there was a decision made not to use ember-data for this? Does the same go for the non-ember-data usage in system?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I wrote out the rationale on the PR that introduced stat trackers.

This may sound a lot like what Ember Data does, but this isn't using Ember Data. This data isn't stateful (no writes), and it isn't necessarily globally persistent (created and deleted on demand). Although fetching and normalizing the data does overlap in behavior with adapters and serializers, given the nature of client and allocation stats, I think it makes more sense to keep this data out of the Ember Data store.

And yes, similar rationale for things like leader and regions. They're one-off calls that aren't going to change, have no relationships, have no computed properties, aren't CRUD, etc. If that changes some day, I'll convert them into ED models.

return wait();
})
.then(() => {
assert.equal(tracker.get('frameMisses'), 0, 'Mises reset');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit: missing an s

@DingoEatingFuzz DingoEatingFuzz force-pushed the b-ui-gracefully-handle-stat-errors branch from 564601c to c95821e Compare November 7, 2018 00:19
@DingoEatingFuzz
Copy link
Contributor Author

I'm guessing I should see a break in the graphs when I get 500's? I was just wondering what the best way to see this was? I'm assuming I need to use mirage not the demo site here?

Yes, to using mirage, yes-ish to getting a 500. You need to get 5 500s in a row. Right now, the Mirage configuration for client stats works acceptably well. Basically there's an 80% chance of failure, so you still have to get lucky for a gap to appear, but if you wait long enough it should.

The stats situation in Mirage right now is doubly sad. First, it's live data and neither I nor Mirage have a good story for mutating data over time. Second, current stat utilization is a function of the resources available to a client or allocation. This type of modeling is possible but tricky in Mirage, so I cut corners instead. The result is possible reporting of >100% resource usage which results in a visually funky line chart.

@johncowen
Copy link
Contributor

Hmm weird I can never get the graphs to show anything when using mirage.

I've just done a completely fresh clone, checked out this branch, yarn'ed yarn start'ed. I then tried the allocation graphs and the task group graphs and I get nothing drawn in them, even after waiting for a 'bit'. Things are streaming in in the console, but no visuals?

I don't mind if I need to tweak some things in the code to so I can see it, but let me know what I need to do if so.

@DingoEatingFuzz
Copy link
Contributor Author

The allocation and task group graphs won't work. You have to go to Clients > Client Detail. Those graphs should (kinda) work.

// of the developer.
const jsonWithDefault = defaultResponse => res =>
res.ok ? res.json() : copy(defaultResponse, true);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice, I just realized it depends on the res having an ok property. So, tiny super nit here, but would naming it differently or putting it in a subfolder to make it clearer that the res needs an ok property and json method.

Isn't that a Response?

https://developer.mozilla.org/en-US/docs/Web/API/Response/ok

So maybe responseWithDefault or maybe put it in utils/http/response/... or similar?

The commenting here helps, but if I'm in a file that uses this I might think that this just requires a json blob, whereas it needs a Response

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I considered this interface gotcha, but decided ultimately that I was probably overthinking it, and it isn't as big of a footgun as it seems under a microscope.

If it comes up again, I'll considering moving it.

@johncowen
Copy link
Contributor

Cool, great stuff I can see it now, sorry!

I added one more comment, but another super minor suggestion thing more than anything, and I'm going to approve this.

The only thing I noticed is some strange artifacts maybe at the top of the graph, the tooltip heads up there also. Not sure if this is due to mirage data, or whether you need to clamp things somewhere? Guessing it's best to do that in another PR though.

screen_shot_2018-11-08_at_10_13_22

@DingoEatingFuzz
Copy link
Contributor Author

Thanks for your diligence in making sure to see the feature on the screen. As for the visual glitches outside of the chart area, I'm aware of it and at first figured it wasn't worth addressing because it requires data >100% to have happen. However, I plan on addressing it in a future PR now that I know it's possible for some constraints to have soft limits 😬

@DingoEatingFuzz DingoEatingFuzz merged commit c45efcc into master Nov 9, 2018
@DingoEatingFuzz DingoEatingFuzz deleted the b-ui-gracefully-handle-stat-errors branch November 9, 2018 22:35
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants