Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "save a trace" functionality. #1093

Closed
mjbryant opened this issue Apr 19, 2016 · 17 comments
Closed

Add "save a trace" functionality. #1093

mjbryant opened this issue Apr 19, 2016 · 17 comments
Labels

Comments

@mjbryant
Copy link

mjbryant commented Apr 19, 2016

It'd be really nice if individual traces could be tagged to not age out of Cassandra through the UI.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Apr 22, 2016 via email

@yurishkuro
Copy link
Contributor

Since the UI has been decoupled from the server side, another possible solution is to have a Save As, so that the user can save the trace json as a file on local disk and later load it into the UI. It's especially useful considering that tracing ultimately is used to troubleshoot perf issues, so one could save a trace, attach it to a ticket, and someone else can later load the trace and see it in the UI.

We already have the JSON button, so Save As is there, but we don't have a Load function in the UI.

@codefromthecrypt
Copy link
Member

We already have the JSON button, so Save As is there, but we don't have a
Load function in the UI.

"We" are a privileged minority until #1060 :P

@yurishkuro
Copy link
Contributor

huh, I didn't realize, thought it was in already.

@codefromthecrypt
Copy link
Member

PS there's now a download button on a trace.

@codefromthecrypt
Copy link
Member

We had a note in #1222 about just making a separate non-bucketed index for elasticsearch and just move docs to it. There's no TTL support in mysql anyway, so noop there. Cassandra might take some thinking.

We can address this by making it possible to query the "infinite" index routinely, and we could also make a "request save" api, which either moves the trace there or returns a message if unsupported.

cc @openzipkin/elasticsearch @openzipkin/cassandra

@codefromthecrypt
Copy link
Member

also in mysql I suppose we could double the table-count to provide an "infinite" index separate from the routine one (cc @jcarres-mdsol)

@michaelsembwever
Copy link
Member

michaelsembwever commented Oct 7, 2016

Cassandra
Simplest approach is to rewrite the data with new TTL=-1

Technical: This will leave some old sstables around and prevent them from being wiped off disk in one go (tombstone compactions will instead be required to clean them out). But I can't see this being a visible issue to the zipkin operator.

@codefromthecrypt
Copy link
Member

thx @michaelsembwever

We'd likely want some signal to imply that the trace is special. One way is to add a binary annotation (tag) to the saved trace, like..

"representative" -> "fastestest"
"representative" -> "discovery name failure"

We wouldn't care what the values are, but it allows zipkin query of "representative" to include them, and also would allow any UI to distinguish them from something else (cc @rogeralsing)

This would also permit those doing modeling or analysis research to ask folks for representative traces in some easy-to-grab fashion (cc @adrianco @rfonseca)

@dangets
Copy link
Contributor

dangets commented Oct 7, 2016

Moving discussion from #1222 - Briefly I'm trying to design a way for generic saving and eviction of threads and I proposed adding some methods - SpanStore.setTraceExpiration(long traceId, Date date) & SpanConsumer.setDefaultTraceExpiration(int amount, TimeUnit unit). It would be up to the implementations on if they could actually handle ttl eviction and how granular the units could be. It would be valid for these to be noops if the store couldn't support it - though this might be confusing on the UI.

I could see MySql having a ttl column in the Spans table that could be used, Elasticsearch could just drop daily indexes, etc...

I not a huge fan of this as it is backwards incompatible with existing implementations, but throwing it out there.

@codefromthecrypt
Copy link
Member

I thought about this a bit over the weekend. Here's what that ended up as.

  • once we start worrying about retention policy for "saved tweets" we approach the complexity of the span store (again)
  • this is compounded by a need for an api change, which would break folks to support a feature that hasn't been requested widely
  • the more complexity we add our component apis, the harder future changes get

I've an alternative proposal: do it all on the client

Instead of creating a secondary tier storage in our api, simply deploy twice. Ex use a keyspace/DB for the transient trace depot and another for the "permanent" one.

Ex. index=zipkin (for transient) and index=zipkin-4ever for permanent.

The second is fronted by vanilla zipkin-servers that don't run any collectors except http. The act of "saving a trace" is just taking the json from the transient one and POSTing it to the permanent one.

Someone could later change the zipkin-ui (or as some sort of plugin) to query across both and/or create a automatic flow (such as a button which clicked posts to the "permanent" zipkin). cc @rogeralsing @eirslett

This automatically solves any future needs around retention, as the same mechanics can be used. The only difference is that in the case of cassandra, the keyspace should be affected prior to use, notably to remove the TTL (or set it to a very long value). The best win is that there's no code impact on server components. They remain simple and probably more "microservice" as a result.

thoughts?

@basvanbeek
Copy link
Member

Sounds good to me if we take the plugin approach for zipkin-ui, that way people have the biggest flexibility to tie this together with their usage concerns and make their own infrastructure choices with the greatest ease of use.

@mansu
Copy link

mansu commented Oct 10, 2016

@adriancole making favoriting/saving a tracing part of the client and then the client copy the traces between the 2 stores is my preferred approach for the following reasons:
(a) the spans will remain immutable
(b) In addition to the UI, other applications can also write to this end point to save traces for ever.
(c) with an additional annotation on the root span we can create save or favorite feature.

However, I prefer making this a feature of the current backend instead of having separate clusters though. I think that way the backend would be easier to operate. Multiple clusters increase operational overhead in large organizations.

@codefromthecrypt
Copy link
Member

Thanks for the feedback. I have one question on your comment.

However, I prefer making this a feature of the current backend instead of
having separate clusters though. I think that way the backend would be
easier to operate. Multiple clusters increase operational overhead in large
organizations.
By backend, if you mean storage, I think this is already possible because
you can use different index in the same cluster.

If by backend you mean zipkin servers that would really complicate
configuration as currently they are designed for a single storage
component. I don't think this feature is worth complexifying that as it is
quite easy to spin up api servers.

@codefromthecrypt
Copy link
Member

FYI "permanent traces" will eventually clash, even if unlikely for some. While not a strict dependency, this is certainly related to the 128bit trace id work #1262

@codefromthecrypt
Copy link
Member

ps the original version of zipkin had a "favorite" button (trivia)

@jorgheymans
Copy link
Contributor

As of Zipkin 2.21, trace archiving is now supported: https://github.com/openzipkin/zipkin/tree/master/zipkin-server#trace-archival .

In the screenshot below you can see the 'Archive Trace' button appearing once everything is configured:

image

Note there is an ongoing discussion whether queries should fan out to archival instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants