-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UI: Gracefully handle stat errors #4833
Conversation
1. Check if the response is a 4xx/5xx 2. If it is, skip the append step and track a frame miss 3. If enough frame misses occur in a row, treat it as a pause A "pause" is when a null data frame is added, which shows up as a gap in line charts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey,
I've had a brief look over the code, looks nice and clean! I've added a few trivial-ish comments, one is actually related to earlier today.
I'm guessing I should see a break in the graphs when I get 500's? I was just wondering what the best way to see this was? I'm assuming I need to use mirage not the demo site here? (I like the new USE_MIRAGE
setting by the way, handy!)
When I use mirage I just don't see anything in the allocation graphs?
Please shout me offline if that's easier, appreciate I've done a quite a bit of reviewing and asked a lot of questions of you today, thanks for bearing with me
John
// used: 401 * 1024 * 1024, | ||
// percent: 401 / 512, | ||
// }; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you generally don't like lingering comments, so thought I'd check you'd noticed this, not a biggie for me especially as its in tests, maybe it helps to help folks understand what's happening here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yep! This should be deleted. This code is now expected of the caller of the behavior.
.then(res => res.json()) | ||
.then(frame => this.append(frame)); | ||
.then(jsonWithDefault({ error: true })) | ||
.then(frame => this.handleResponse(frame)); | ||
} catch (error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More ember-data adapter like things that normally you might use ember-data for (see my comment from earlier to do on another PR). Actually now I've seen this again, did you say somewhere there was a decision made not to use ember-data for this? Does the same go for the non-ember-data usage in system
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I wrote out the rationale on the PR that introduced stat trackers.
This may sound a lot like what Ember Data does, but this isn't using Ember Data. This data isn't stateful (no writes), and it isn't necessarily globally persistent (created and deleted on demand). Although fetching and normalizing the data does overlap in behavior with adapters and serializers, given the nature of client and allocation stats, I think it makes more sense to keep this data out of the Ember Data store.
And yes, similar rationale for things like leader and regions. They're one-off calls that aren't going to change, have no relationships, have no computed properties, aren't CRUD, etc. If that changes some day, I'll convert them into ED models.
return wait(); | ||
}) | ||
.then(() => { | ||
assert.equal(tracker.get('frameMisses'), 0, 'Mises reset'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nit: missing an s
564601c
to
c95821e
Compare
Yes, to using mirage, yes-ish to getting a 500. You need to get 5 500s in a row. Right now, the Mirage configuration for client stats works acceptably well. Basically there's an 80% chance of failure, so you still have to get lucky for a gap to appear, but if you wait long enough it should. The stats situation in Mirage right now is doubly sad. First, it's live data and neither I nor Mirage have a good story for mutating data over time. Second, current stat utilization is a function of the resources available to a client or allocation. This type of modeling is possible but tricky in Mirage, so I cut corners instead. The result is possible reporting of >100% resource usage which results in a visually funky line chart. |
Hmm weird I can never get the graphs to show anything when using mirage. I've just done a completely fresh clone, checked out this branch, yarn'ed yarn start'ed. I then tried the allocation graphs and the task group graphs and I get nothing drawn in them, even after waiting for a 'bit'. Things are streaming in in the console, but no visuals? I don't mind if I need to tweak some things in the code to so I can see it, but let me know what I need to do if so. |
The allocation and task group graphs won't work. You have to go to Clients > Client Detail. Those graphs should (kinda) work. |
// of the developer. | ||
const jsonWithDefault = defaultResponse => res => | ||
res.ok ? res.json() : copy(defaultResponse, true); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice, I just realized it depends on the res
having an ok
property. So, tiny super nit here, but would naming it differently or putting it in a subfolder to make it clearer that the res
needs an ok
property and json
method.
Isn't that a Response
?
https://developer.mozilla.org/en-US/docs/Web/API/Response/ok
So maybe responseWithDefault
or maybe put it in utils/http/response/...
or similar?
The commenting here helps, but if I'm in a file that uses this I might think that this just requires a json blob, whereas it needs a Response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I considered this interface gotcha, but decided ultimately that I was probably overthinking it, and it isn't as big of a footgun as it seems under a microscope.
If it comes up again, I'll considering moving it.
Cool, great stuff I can see it now, sorry! I added one more comment, but another super minor suggestion thing more than anything, and I'm going to approve this. The only thing I noticed is some strange artifacts maybe at the top of the graph, the tooltip heads up there also. Not sure if this is due to mirage data, or whether you need to clamp things somewhere? Guessing it's best to do that in another PR though. |
Thanks for your diligence in making sure to see the feature on the screen. As for the visual glitches outside of the chart area, I'm aware of it and at first figured it wasn't worth addressing because it requires data >100% to have happen. However, I plan on addressing it in a future PR now that I know it's possible for some constraints to have soft limits 😬 |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Sometimes requests error. That's just life, man.
This adds a check before appending stat frames to see if the response is ok. If the response isn't ok, then no frame is appended (since there is no frame to append).
That's enough to make sure there are no runtime errors, but it isn't very informative since it doesn't surface information about the underlying problem.
To address the lack of insight, frame misses are tracked as well. So if X consecutive stats requests fail in a row, then the tracker "pauses".
Pausing is a pre-existing tracker feature that adds a null data frame so any visualization of the data can account for missing data. The line charts do this by showing gaps.
Put this all together and you get a tracker that continuously polls every 2 seconds or so and handles errors like a champ. However, if too many errors occur in a row, the user is informed that the data isn't just incredibly consistent, it's actually not updating.