-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amazon S3 as a storage backend #2633
Comments
Currently what Jaeger defines as "storage backend" must support (a) trace lookup by ID, and (b) indexing & searching. The former is relatively easy to do with S3 (you may even be able to use Grafana Tempo directly for that), but indexing/searching is not. |
Yeah, I was also looking at Grafana Labs' Tempo storage option for S3, which is why I wanted to ask if such an equivalent existed for Jaeger. |
In Tempo the querier is responsible for inspecting the backend to find the trace. If you have questions about it, it would make more sense to discuss in that repo. |
Reopening this to link to @muhammadn 's work mentioned in #638 (comment) |
Thanks @yurishkuro! 🙌 @jkowall So far for testing on my local machine on a 100Mbit connection to S3 for Jaeger UI is slow but bearable since we don't look into the trace data frequently. We also had implemented Thanos on our infrastructure which we used to ingest the telemetric data and store on S3 and read again from S3 to Thanos and to Grafana so we had expected this on Jaeger-S3 plugin to Jaeger-UI. We can live with it since there are a lot of cost savings to this. I have already added tag searches on Jaeger-S3 just a few minutes ago so that's done. Another thing i have to fix is the time is somehow skewed and will be fixing this in a couple of days and this is the only one pending work before we run into production. Jaeger-S3 should also be able to support storing the trace data in GCS (Google Cloud Storage) and Azure Storage Blobs and theoretically support anything that Cortex supports including Cassandra (Jaeger already has this built in but we use Cortex and Cortex supports Cassandra) Amazon DynamoDB and Google BigTable. |
Very cool! If you are looking for a trace id do you pull each db independently and search them? Are you coordinating parallelism somehow or do you do it one at a time? |
Hey @joe-elliott! Thanks! Watched your Tempo video on Youtube FOSDEM and it's great to hear that you're making progress! I have not gone through the internals of cortex on how it manages the data on object storage but from what i understand is that the indexes are stored locally and the data (in chunks) is stored in the object storage. Despite that i am seeing indexes being stored on s3 as eventhough the boltdb-shipper documentation says indexes local and data on the object storage. The data is stored in very small chunks, ~2KiB per chunk (a file), so from my understanding is that cortex does not really pull the whole db but only part of it from the chunks (i haven't checked it out but probably by timestamp - which is indexed). The data is usually pulled when i try to find it but i realised finding by trace id is pretty snappy as data is already fetched earlier. As for the current working state, i had set the data to be fetched to be 24 hours old but i can reduce it (in Anyway, i've fixed the clock skew issue. I think i will release the plugin binaries in my repo in a couple of days once i can minimise any bugs. Now, you can use helm/k8s/jaeger operator (i've removed the plugin's dependency on my jaeger fork) |
@csp197 @joe-elliott v1.0.0 of jaeger-s3 is released! 🎉 https://github.com/muhammadn/jaeger-s3/releases/tag/v1.0.0 |
I've created an S3 plugin that uses S3 and Athena https://github.com/johanneswuerbach/jaeger-s3 and we are currently using it successfully in a setup with ~5000 spans/s and 14 days retention. The plugin supports all querying capabilities of the Jaeger UI and can also generate the dependency graph. Feedback welcome :-) |
Are there any updates to this issue? .. Is it somehow planned that Jaeger would support S3 / Object storage natively? |
@sherifkayad I've released version 2. This is a breaking change from version 1 where the data is not backwards compatible. A lot of rewrite and thoughts on re-architecting jaeger-s3. Now it uses a lot less memory and can scale better. I had renamed it to jaeger-objectstorage as we now support not just s3 but also AzureBlob and Google GCS. storing indexes using DynamoDB is also supported. |
Update 2021-04-20: there is a plugin https://github.com/muhammadn/jaeger-s3/
Is it feasible to add Amazon S3 as a storage backend for Jaeger?
Related to #638
The text was updated successfully, but these errors were encountered: