Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics for processing #57

Closed
jonnor opened this issue Feb 3, 2016 · 2 comments
Closed

Metrics for processing #57

jonnor opened this issue Feb 3, 2016 · 2 comments

Comments

@jonnor
Copy link
Member

jonnor commented Feb 3, 2016

Right now we don't have any metrics apart from the ones provided by MsgFlo, which only tracks in/outports and durations. Because we don't use error ports, this does not even capture error rates.

Info could look like this.

graph: 'crop'
runtime: 'noflo' | 'imgflo'
client: 'api token id'
requested: Timestamp, when image is requested
completed: Timestamp
error: null or 'some message'

Job end2end time becomes completed-requested. Error rate is ratio of events with error == null and error != null. And we should be able to create statistics segmented on graph/runtime/client.

Ideally we'd also add the following, to be able to determine where time is spent.

started: Timestamp, when image is starting being processed
downloaded: Timestamp, when input image(s) have been downloaded
processed: Timestamp, when image has been (but results not yet uploaded/cached)

So there state machine is. -> requested (-> queued) -> started -> downloaded -> processed (-> cached) -> completed. At each step there is also a possibility of going to error state.

Event ImgfloImageComputed would be emitted by workers when they have completed processing an image. Cache hits would be a separate event ImgfloCacheHit, emitted where we check cache (frontend and in worker before processing).
Cache hit/miss ratio then becomes ratio of these two events.

This could be sent using NewRelic by default, though probably the interface used in the code should be metrics-provider agnostic.

@jonnor
Copy link
Member Author

jonnor commented Feb 3, 2016

Eventually it would be nice to also have size of image in pixels and size of image in bytes, so we can evaluate times relative to this, and evaluate them on eachother to evaluate compression rates.

@jonnor
Copy link
Member Author

jonnor commented Jun 15, 2016

Might not be able to do comparisons between different events (computed, cached) in NewRelic. So probably better to have single ImgfloCacheCheck event, with some indicator if it was hit or not.

  • cached: Should be enum, initial values can be s3, false. Then future we can add ipfs, ref IPFS cache/CDN #62.
  • by: Which component did the cache check. Important because we check both in frontend and in worker (to deduplicate requests while in-flight/queue).
  • error ? if check errored

jonnor added a commit that referenced this issue Jul 20, 2016
jonnor added a commit that referenced this issue Jul 20, 2016
This covers the basics needed to see total time, including
that spent waiting in the queue.
References #57
@jonnor jonnor closed this as completed in 1e07739 Jul 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant