Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClickHouse as a core storage backend #4196

Open
yurishkuro opened this issue Jan 31, 2023 · 29 comments
Open

ClickHouse as a core storage backend #4196

yurishkuro opened this issue Jan 31, 2023 · 29 comments

Comments

@yurishkuro
Copy link
Member

yurishkuro commented Jan 31, 2023

Summary

Build first-class support for ClickHouse  as an official Jaeger backend. ClickHouse is an open-source column-oriented database for OLAP use cases. It is highly efficient and performant for high volumes of ingestion and search making it a good database for tracing and logging data specifically. It can also do aggregates very quickly which will come in handy for several features in Jaeger.

Benefits to the users:

  • Efficient backend
  • Powerful search
  • Analytics capability, e.g. the possibility to support the APM function (Monitoring tab in Jaeger) directly from ClickHouse

Background

This is a continuation of #1438. Currently, there is one (or more) community-supported plugins for ClickHouse as a Jaeger storage backend. This ticket is about making it one of the core backends supported by Jaeger, on the level of Cassandra/Elasticsearch.

Scope

  • Design table schema, document trade-offs
  • Implement plugin/storage/clickhouse module and integrate it with storage.Factory
    • ClickHouse is not optimized for individual inserts, need to address that (cf. OTel Collector ClickHouse exporter)
    • For deployments with Kafka, investigate Kafka Connector for ClickHouse for direct ingestion
  • Address schema creation problem (similar to plugin/storage/cassandra/schema)
  • Address management needs if any (e.g., ES has index rollover, not sure if ClickHouse needs something similar)
  • Add relevant documentation to the Jaeger website

Expected outcomes

  • Design table schema, document trade-offs
  • Implement ClickHouse support as core storage backend
  • Address schema creation problem / tooling
  • Add relevant documentation to the Jaeger website

Stretch goals

@yurishkuro
Copy link
Member Author

yurishkuro commented Jan 31, 2023

@abhi1287
Copy link

abhi1287 commented Feb 1, 2023

Suggestion: It would be nice to somehow retain the type information of custom tags.
The otel implementation you linked above simply stores a vector of string key value pairs.
While jaeger does the same with elastic search, it is a pain in running aggregate queries over the custom tags in spans.

@yurishkuro
Copy link
Member Author

@abhi1287 can you book a separate issue for this (it sounds like a cross-storage problem) and give examples of custom tags that need numeric aggregations?

@siddharthsingh025
Copy link

Is anyone working on this ??@yurishkuro

@yurishkuro
Copy link
Member Author

@siddharthsingh025 this ticket is in support of GSOC program that starts in the summer.

@siddharthsingh025
Copy link

@yurishkuro , I would like to contribute / participate for gsoc 2023 by working with Jaegar. I researched a lot on this project and I would really like to contribute to this project. I am familiar with Golang and SQL database design. I also have a good knowledge about opentelemetry. Looking forward for your guidance on this project. Thanks!

@yurishkuro
Copy link
Member Author

@siddharthsingh025 please see GSOC's guidelines on applying, and you can specify the Jaeger project as a preference.

@octonawish-akcodes
Copy link

I am interested in this project any resources and slack link ?

@yanyanran
Copy link

@yurishkuro Hello! I am very interested in this project and would like to contribute to Jaeger, I am familiar with Golang and love to learn and delve into distributed systems, I think it is really interesting. I also learned a lot about the SQL database, and I can devote myself to the project. Can you give me some advice and guidance? Thank u!

@Reireirei0
Copy link

Reireirei0 commented Feb 28, 2023

@yurishkuro , I'd like to contribute to Jaeger. It just so happens that I am currently maintaining a clickhouse query service in ByteDance as an intern. This service is called rigel and is also written in golang. With your guidance, I am pretty confident in finishing this feature. Looking forward to your guidance! Thanks!

@Nandini99-git
Copy link

Hey @yurishkuro , I am interested in this project for GSOC 2023. I want to work on this project. I am fresher but I am familiar with the tools and technology which is required for this project.

@yurishkuro
Copy link
Member Author

Applications will be open from March-20 to April-4: https://summerofcode.withgoogle.com/

@GauriBhandari
Copy link

Hey @yurishkuro I am a student from India and I would like to contribute to this issue. I wanted to know if this is still available? If it is available then I would like to write a proposal for this.

@yurishkuro
Copy link
Member Author

It's available, the applications aren't open till tomorrow. The choice of the proposal will be made according to the GSOC's timeline.

@james-ryans
Copy link
Contributor

Hi, I’m James Ryans from Indonesia. I might be too late to introduce myself but hopefully my appearance will be noticed. I wanted to share the references that I used to write my proposal which might help you to onboard on this project, which consists of:

  1. The idea of Jaeger and its history (https://www.uber.com/blog/distributed-tracing)
  2. The architecture of Jaeger (https://www.jaegertracing.io/docs/1.43/architecture/)
  3. Learn the conversation of Jaeger ClickHouse as a storage backend #1438 issue (ClickHouse as a storage backend #1438)
  4. Learn how the community implemented the ClickHouse as written in (ClickHouse as a core storage backend #4196 (comment))
  5. Grasp some idea of Cassandra schema (https://github.com/jaegertracing/jaeger/tree/main/plugin/storage/cassandra)
  6. Deep dive into ClickHouse indexing to design the schema (https://clickhouse.com/docs/en/optimize/sparse-primary-indexes)

And hi @yurishkuro, I’ve summited my proposal at GSoC platform. Do you mind if I send you my google docs proposal link to your Slack so that we can discuss there?

@yurishkuro
Copy link
Member Author

@meneketehe sure, send it

@GauriBhandari
Copy link

Hey @yurishkuro is this project still available for gsoc? I am unable to see it on gsoc's organization dashboard. Could you please help

@ChillOrb
Copy link
Contributor

ChillOrb commented Mar 30, 2023

Hey @yurishkuro is this project still available for gsoc? I am unable to see it on gsoc's organization dashboard. Could you please help

Hey Gauri , it's under CNCF

  1. Go to CNCF
  2. View Ideas list
  3. Scroll to Jaeger

@ihanwen99
Copy link

Dear team@yurishkuro, I am a master's student from Shanghai Jiao Tong University and TUM. I have great interest in this project. Do you still accept students to apply to GSoC with this project? Thank you very much.

@yurishkuro
Copy link
Member Author

We're working on GSoC timeline, applications are open until Apr 4

@siddharthsingh025

This comment was marked as resolved.

@jkowall jkowall changed the title [Feature]: ClickHouse as one of core storage backends [Feature]: ClickHouse as a core storage backend Apr 1, 2023
@yurishkuro
Copy link
Member Author

@haanhvu will be working on this as part of GSoC

@nextrevision
Copy link

Is one of the goals of this resulting integration to have compatibility with the otel exporter (https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/clickhouseexporter)? It's referenced above, but is that as a purely technical reference or schema compatibly goal?

@yurishkuro
Copy link
Member Author

@nextrevision it's TBD. There is a blog post from ClickHouse criticizing some design choices in OTEL exporter, we are taking this into consideration.

@nextrevision
Copy link

Understandable, thanks for the link and looking forward to trying it out

@GetRohitansh
Copy link

Is this issue open or nearing its completion, I would like to contribute

@haanhvu
Copy link
Contributor

haanhvu commented Sep 16, 2023

Is this issue open or nearing its completion, I would like to contribute

We finished the first stage of benchmarking and making design decisions. We'll publish the benchmark report soon.

@egege
Copy link

egege commented Jul 11, 2024

Now in the middle of 2024, what is the progress of the mission? I want to be involved.

@jkowall
Copy link
Contributor

jkowall commented Jul 11, 2024

This will be officially supported in jaeger v2 which is due in beta before the end of the year. There are lots of good things coming with v2, you can learn more about it from the last Kubecon presentation we did : https://www.youtube.com/watch?v=WNfesi_T0Bs

@yurishkuro yurishkuro moved this to In Progress in Roadmap Nov 14, 2024
@yurishkuro yurishkuro changed the title [Feature]: ClickHouse as a core storage backend ClickHouse as a core storage backend Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests