Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observability stack #11

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

omrishiv
Copy link
Contributor

Adding the initial implementation of the observability stack. This includes:

  • Prometheus - metrics
  • Loki - logging
    • Promtail - log delivery
  • Opencost - cost accounting
  • Grafana - visualization
  • Alertmanager - alerting

It uses the ref-implementation for SSO, with user1 being able to log in to Grafana as an admin. New users can be created and roles assigned for various purposes.

Addresses #10

The observability stack is built upon:
- Prometheus - metrics
- Loki - logging
- Promtail - log delivery

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interested in why promtail is used and something like fluentbit or opentelemetry collector isn't being used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Promtail is really easy to use if only Loki is being used. That being said, I may look at switching to fluent bit as I've used it in another project recently.

Is there a compelling reason to move from promtail to either? Grafana Agent could be another option as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fluentbit is more popular among EKS end users also supported when using fargate

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opentelemety for logs is very new and not a lot end users have adopted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a pretty simple swap for fluebtbit. Let me take a look at.

@@ -0,0 +1,50 @@
apiVersion: argoproj.io/v1alpha1
kind: Application

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could move these to applicationsets to make it easier for folks to move to adopt in production easier now that idpbuilder supports appsets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is definitely something Manabu and I spoke about. I was waiting for the example to see how to conform this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it was already discussed, But I would question Loki too? Loki is GPL I would preferred OpenSearch is Apache 2

That's a typical stack that I see in fully open source for logs+traces fluentbit+opensearch

For metrics I see opentelemetry-collector-daemonset+prometheus

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csantanapr thanks for raising this.

I am not a lawyer, so please correct me if I'm wrong: my understanding with AGPL is that we can't modify the source of the application without copy left. I believe many have built observability stacks on top of the Grafana stack, which is AGPL for the core components (Loki, Grafana, Tempo, Mimir). As we are not modifying the source, we should be ok to use it. Again, please please please correct me if I'm wrong.

This is a valid concern and I believe the flexibility of working in stacks allows us to create another implementation that relies on other tooling. What I do believe is that we need to come to an agreement on what our opinionated stack is. If this is not it, I'm ok with that, but let's discuss this during the next community meeting so we can figure out how we want to proceed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we discussed this and yes since Grafana is already AGPL, choosing Loki as another AGPL project is less of a concern. That said, I agree with the discussion above that we should also think about using OpenSearch given its popularity. Publishing it as an alternative observability stack sounds good.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will say we do use the Otel Collector as a daemonset for logs and prometheus metrics and eventually for traces.

I do understand that the community is much more invested in fluentbit for logging so I think a standard of OpenSearch + FluentBit with OpenTelemetry Collector Daemonset for Prometheus Metrics/Otel Traces seems to be a good pattern for me.

Copy link
Contributor Author

@omrishiv omrishiv Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If It makes sense, I suggest:

  1. Swapping promtail for FluentBit and keeping the rest of this stack the same. This gives us a Grafana based stack to work with (though, there's an argument to be made to swap promtail for Grafana Agent so we are closer to the LGTM implementation)
  2. Creating another stack based on Opensearch + Fluenbit, OTEL/prom as an alternative. This gives us the opportunity for testing how we have substitutable stacks.

We can have both live under /observability and have /observability/grafana-stack and /observability/otel-stack

I'm ok working on all of this and would be happy to put the otel stack together as well. Does that make sense to do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants