-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should lens fan-out queries to the "archive" zipkin #3023
Comments
I think there are many implications on doing a mixed search, one is
performance, another one is sorting, counting, dedup, also lookback might
not make sense for archived (maybe serviceName either since we won't
retrieve serviceNames/spanNames from archive or will we?). I strongly
believe if we aim to search archived in a zipkin (which can be detected
with the `OPTIONS` call) the easiest is that we tell lens to call one API
or another based on a URI (e.g /zipkin and /zipkin/archived). In such cases
/zipkin/archived will work the same as /zipkin but different zipkin API and
appending the tag query to the call.
…On Thu, 12 Mar 2020, 01:17 Adrian Cole, ***@***.***> wrote:
#3018 <#3018> adds the archive
button which allows someone to POST the current trace to an archive server.
One idea around permalinks was to use a proxy behind the UI, to query
multiple sources when you know the archive is zipkin. However, this is more
infrastructure
@mchandramouli <https://github.com/mchandramouli> suggests:
Zipkin lens [can] search both live and archive cluster (beyond live
cluster time range) so archived traces can appear in the same UI
#3018 (comment)
<#3018 (comment)>
It is possible that the "archive" is write-only, for example a Zipkin
clone or APM endpoint that doesn't support our api. However, an OPTIONS
request could let us know that. It is unlikely we'll know what timerange is
valid on the other source as the "archive" could be just another wider
scoped service, and we don't even know the reason for archival is time
related.
However, we can consider impact of doing this:
- CORS - we have permissive cross-origin settings, so a default zipkin
service will accept cross-origin requests.. lens could query two zipkin
services
- conflicts - it is possible two services has the same data, but we
could prefer the archive on conflict
- performance - the browser will get multiple queries back and that
would double the bandwith on overlap. Some tuning or configuration might be
needed to say when to use the archive.
On the latter, if we do end up with a standard tag, querying archive would
be as easy as looking for the tag "zipkin.archive" or whatever it ends up
named as. This would be a cheap way to reduce volatility on normal queries.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3023>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAXOYAXLBNLSXWVAJYHOPOLRHASZNANCNFSM4LGBJYSQ>
.
|
Thanks, @jcchavezs I probably wasn't clear. what I meant was that lens in its api code could embed the logic, just calling two apis like mux does. Main difference in simplification is that we are talking about 1+1 apis, not 1+N which I agree is more complex. Any logic about merging applies agree, but I don't think that and counts are complex when there are only 2 sources. Since only 2, conflict resolution logic to pick one trace instead of merging them. Also, if a problem that someone asked for 10 traces and somehow that period had overlap and some from archive, deciding which to throw out is something we can do predictably also (ex. the newest in that period, as we do in cassandra fan-outs in the backend). Concretely, if this is bound only to "archive" then it we get the luxury of simple conflict resolution. lookback isn't an issue really, where it matters is set on the backend. the lookback in lens is just about the default duration of the trace query. For example, you can override this now with 1970 as we do in our testdata readme I think what I caused most confusion with was by mentioning OPTIONS and not explaining why. What I meant was that we can use OPTIONS to try to detect if the archiveUrl is a normal zipkin server, and then no logic happens at all if we know it not to be. That was just a simplication.. again all UI logic is exactly the same except the api handler, it is only moving the code in lua into javascript, which would have exactly the same concerns. These clarifications don't imply I personally think we should do this, just want to help ease this topic as I think path based would be even worse to do in our api as then it messes with internal routing logic, etc. easier for a proxy to just mount the other /zipkin under /zipkin-archive imho In other words, if I have a vote, it is to do the least impact if we are to do something like this, and so far affecting the api javascript code only seems the least impact. |
Some real impacts of proxying regardless of lua or in-browser: pollution of the service/span/auto-complete name list or a choice to preferprobably one thing @jcchavezs implies is that if you merge names, maybe you don't want to use service-span names from a very long time ago. Note: this is the same issue on the archive itself, so effectively if you have a problem there, it escalates the issue here. This is because our names queries do not include a timerange.. they are independent of time. For the "firehose" mode thing, where the source has TTL 1hr and archive has a short 3-7 day TTL. I do not expect this to be a concern. If people are using this to access traces from several years ago, if we did this, they shouldn't fan-out (ex not offer an archiveUrl property only archivePostUrl) |
an alternate to doing this is also not doing this and using existing tech. We've had an environment property for a long time, to differentiate zipkins. Ex. one zipkin can set to "1hr" and another to "7days", they would have different URLS, possibly on the same host if using The other is transparently through proxying the api with mux, however that is additional infrastructure even if small to achieve. My 2p: The only compelling reason to do this in-browser is in attempt to reduce infrastructure, while providing a natural permalink (subject to the TTL of the archive). There could be sites that are unable to run a proxy, but we've not heard of them so far, especially in context of an unreleased UI feature for archive button. That in mind, this issue is probably fair game to let sit, allowing us to focus energy on non-speculative change. |
@adriancole @jcchavezs Maybe I should try to clarify what I meant by |
@mchandramouli I get you but it requires knowledge of this data pushed up into the UI, specific to the idea that this setup is commonly why the archive button appears. It would reduce dupes, but not prevent them, but yeah it is indeed a complexity worth mentioning. thanks! |
@adriancole I was wondering the same - we cannot keep UI clean and not be aware of backend internals (like ttl). Anyways, thought I will put it out on the table. I will watch this issue and see where we go with this. |
I'm initially inclined to this alternative:
-> 2 different instances (active and archive) with
I'd assume sites with this need will have an infrastructure big enough to justify running an archive instance in parallel. Another alternative that is appealing to me is to have this functionality as part of the storage API. This will involve supporting multiple storages plugged to the same instance (e.g. traces->elasticsearch, dependencies->elasticsearch, archive->cassandra). Thinking out loud, in this scenario I'd envision Lens having a query flag to lookup from traces or archive endpoint. It could even show archive/favorite traces as top results if we support to return results from both endpoints. Doing dual-read from Lens would be too complex imho--following your comments above. But other could weight this complexity better than me. |
I'm going to close this issue as it helps the team focus on issues to progress. We can re-open it if it becomes actionable with new information, more hands, more demand. Thanks folks! |
#3018 adds the archive button which allows someone to POST the current trace to an archive server. One idea around permalinks was to use a proxy behind the UI, to query multiple sources when you know the archive is zipkin. However, this is more infrastructure
@mchandramouli suggests:
#3018 (comment)
It is possible that the "archive" is write-only, for example a Zipkin clone or APM endpoint that doesn't support our api. However, an OPTIONS request could let us know that. It is unlikely we'll know what timerange is valid on the other source as the "archive" could be just another wider scoped service, and we don't even know the reason for archival is time related.
However, we can consider impact of doing this:
On the latter, if we do end up with a standard tag, querying archive would be as easy as looking for the tag "zipkin.archive" or whatever it ends up named as. This would be a cheap way to reduce volatility on normal queries.
The text was updated successfully, but these errors were encountered: