Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix static converters #2270

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ Main (unreleased)

- Fixed an issue where the `otelcol.processor.interval` could not be used because the debug metrics were not set to default. (@wildum)

- Fix conversion of static config to Alloy for `discovery.azure` and `otelcol.exporter.prometheus`. (@wildum)

### Other changes

- Change the stability of the `livedebugging` feature from "experimental" to "generally available". (@wildum)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,19 +25,27 @@ func toDiscoveryAzure(sdConfig *prom_azure.SDConfig) *azure.Arguments {
return nil
}

return &azure.Arguments{
args := &azure.Arguments{
Environment: sdConfig.Environment,
Port: sdConfig.Port,
SubscriptionID: sdConfig.SubscriptionID,
OAuth: toDiscoveryAzureOauth2(sdConfig.ClientID, sdConfig.TenantID, string(sdConfig.ClientSecret)),
ManagedIdentity: toManagedIdentity(sdConfig),
RefreshInterval: time.Duration(sdConfig.RefreshInterval),
ResourceGroup: sdConfig.ResourceGroup,
ProxyConfig: common.ToProxyConfig(sdConfig.HTTPClientConfig.ProxyConfig),
FollowRedirects: sdConfig.HTTPClientConfig.FollowRedirects,
EnableHTTP2: sdConfig.HTTPClientConfig.EnableHTTP2,
TLSConfig: *common.ToTLSConfig(&sdConfig.HTTPClientConfig.TLSConfig),
}

// Only one auth method is allowed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user supplied a config with both auth methods, isn't that technically a problem with the config and not with the converters? I'm ok for this check to remain, but not sure if it's overkill to check for invalid configuration in the converter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that setting two auth methods is possible in static mode because you must use the authentication_method field to specify your method (default OAuth). The problem here is that no matter which auth method the user would set, the converter would always create the two auth blocks, resulting in an invalid config.

switch sdConfig.AuthenticationMethod {
case "ManagedIdentity":
args.ManagedIdentity = toManagedIdentity(sdConfig)
default:
args.OAuth = toDiscoveryAzureOauth2(sdConfig.ClientID, sdConfig.TenantID, string(sdConfig.ClientSecret))
}

return args
}

func ValidateDiscoveryAzure(sdConfig *prom_azure.SDConfig) diag.Diagnostics {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,11 @@ discovery.azure "prometheus1" {
tenant_id = "tenant"
client_secret = "secret"
}

managed_identity {
client_id = "client"
}
}

discovery.azure "prometheus2" {
subscription_id = "subscription"

oauth {
client_id = "client"
tenant_id = "tenant"
client_secret = "secret"
}

managed_identity {
client_id = "client"
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ scrape_configs:
client_secret: "secret"
proxy_url: "proxy"
follow_redirects: false
authentication_method: "ManagedIdentity"

remote_write:
- name: "remote1"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,6 @@ discovery.azure "prometheus1" {
tenant_id = "tenant1"
client_secret = "secret1"
}

managed_identity {
client_id = "client1"
}
}

discovery.azure "prometheus1_2" {
Expand All @@ -20,10 +16,6 @@ discovery.azure "prometheus1_2" {
tenant_id = "tenant2"
client_secret = "secret2"
}

managed_identity {
client_id = "client2"
}
}

discovery.relabel "prometheus1" {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,6 @@ discovery.azure "prometheus2" {
tenant_id = "tenant"
client_secret = "secret"
}

managed_identity {
client_id = "client"
}
}

discovery.relabel "prometheus1" {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,6 @@ discovery.azure "fun" {
tenant_id = "tenant"
client_secret = "secret"
}

managed_identity {
client_id = "client"
}
}

local.file_match "fun" {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,13 +91,20 @@ func toremotewriteexporterConfig(cfg *remotewriteexporter.Config, forwardTo []st
defaultArgs := &prometheus.Arguments{}
defaultArgs.SetToDefault()

return &prometheus.Arguments{
args := &prometheus.Arguments{
IncludeTargetInfo: defaultArgs.IncludeTargetInfo,
IncludeScopeInfo: defaultArgs.IncludeScopeInfo,
IncludeScopeLabels: defaultArgs.IncludeScopeLabels,
GCFrequency: cfg.StaleTime,
GCFrequency: defaultArgs.GCFrequency,
ForwardTo: forwardTo,
AddMetricSuffixes: defaultArgs.AddMetricSuffixes,
ResourceToTelemetryConversion: defaultArgs.ResourceToTelemetryConversion,
}

// Override default only if > 0 because GCFrequency of 0 is not allowed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, do we really need to check if the incoming config is invalid? Also, if we set a different GC frequency than what the user specified, we're technically not converting their exact config anymore. Maybe it'd make more sense to just refuse to convert the config, or to log a warning.

But IMO we could just convert it as it is and let Alloy fail when it loads it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug deeper into this one because I could not understand why it would be set to 0 by default and I found out that when we moved to Alloy we transformed some static components into empty shells: https://github.com/grafana/alloy/pull/55/files#diff-1b127540731f7403cbe5211de129eddfc35579d88d6bcbc371520642a9aa5486. The problem with this component is that the default values were set in the newRemoteWriteExporter() function. This StaleTime default value is actually supposed to be 15 minutes. I got rid of the check in the converter and set the correct value in the yaml component. This way, if you convert a config that uses the default yaml value, it will be correctly converted to 15m in Alloy.

if cfg.StaleTime > 0 {
args.GCFrequency = cfg.StaleTime
}

return args
}
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,6 @@ discovery.azure "metrics_agent_promobee" {
tenant_id = "tenant"
client_secret = "secret"
}

managed_identity {
client_id = "client"
}
proxy_url = "proxy"
}

Expand All @@ -21,10 +17,6 @@ discovery.azure "metrics_agent_promobee_2" {
tenant_id = "tenant"
client_secret = "secret"
}

managed_identity {
client_id = "client"
}
proxy_url = "proxy"
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,6 @@ discovery.azure "_0_default_prometheus1" {
tenant_id = "tenant1"
client_secret = "secret1"
}

managed_identity {
client_id = "client1"
}
}

discovery.lightsail "_0_default_prometheus1" {
Expand Down Expand Up @@ -118,8 +114,7 @@ prometheus.relabel "_0_default" {
}

otelcol.exporter.prometheus "_0_default" {
gc_frequency = "0s"
forward_to = [prometheus.relabel._0_default.receiver]
forward_to = [prometheus.relabel._0_default.receiver]
}

otelcol.exporter.loadbalancing "_0_default" {
Expand Down Expand Up @@ -204,8 +199,7 @@ prometheus.relabel "_1_default" {
}

otelcol.exporter.prometheus "_1_default" {
gc_frequency = "0s"
forward_to = [prometheus.relabel._1_default.receiver]
forward_to = [prometheus.relabel._1_default.receiver]
}

otelcol.exporter.otlp "_1_0" {
Expand Down
33 changes: 32 additions & 1 deletion internal/converter/internal/test_common/testing.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package test_common
import (
"bufio"
"bytes"
"context"
"fmt"
"io/fs"
"os"
Expand All @@ -12,6 +13,7 @@ import (
"strings"
"testing"

"github.com/grafana/alloy/internal/component"
"github.com/grafana/alloy/internal/converter/diag"
"github.com/grafana/alloy/internal/featuregate"
alloy_runtime "github.com/grafana/alloy/internal/runtime"
Expand All @@ -20,6 +22,7 @@ import (
cluster_service "github.com/grafana/alloy/internal/service/cluster"
http_service "github.com/grafana/alloy/internal/service/http"
"github.com/grafana/alloy/internal/service/labelstore"
"github.com/grafana/alloy/internal/service/livedebugging"
remotecfg_service "github.com/grafana/alloy/internal/service/remotecfg"
"github.com/prometheus/client_golang/prometheus"
"github.com/stretchr/testify/require"
Expand Down Expand Up @@ -214,14 +217,42 @@ func attemptLoadingAlloyConfig(t *testing.T, bb []byte) {
clusterService,
labelstore.New(nil, prometheus.DefaultRegisterer),
remotecfgService,
livedebugging.New(),
},
EnableCommunityComps: true,
})
err = f.LoadSource(cfg, nil, "")

// This is a bit of an ugly workaround but the spanmetrics connector is starting a go routine in its start function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is referring to the start function of OTel Collector components? Do we call this when we call Alloy'sNew() function for a component? I think those New() functions aren't meant to have side effects such as opening files and starting goroutines. They should only do this once Run is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we often have the pattern that the New() function is calling the Update function and this creates some side-effects but only for the tests. Currently we start the Otel components in the Run() function and that's where the race problems appeared. This change: #2262 is making the otel components start in the New() function, which is solving the problem at runtime but it can be a bit more annoying for testing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out... I added a comment to the other PR. New is not supposed to have such side effects.

// If the goroutine is not stopped, it will result in a panic. To stop it we need to run the Alloy controller to run the component and then stop it.
// We cannot run the Alloy controller for all tests because some components such as the statsd_exporter will panic after being stopped and some other components have wrong config.
runAlloyController := false
components, err := f.ListComponents("", component.InfoOptions{})
require.NoError(t, err)
for _, component := range components {
if component.ComponentName == "otelcol.connector.spanmetrics" {
runAlloyController = true
break
}
}

if runAlloyController {
ctx, cancel := context.WithCancel(context.Background())
done := make(chan struct{})
go func() {
f.Run(ctx)
close(done)
}()
defer func() {
cancel()
<-done
}()
}

// Many components will fail to build as e.g. the cert files are missing, so we ignore these errors.
// This is not ideal, but we still validate for other potential issues.
if err != nil && strings.Contains(err.Error(), "Failed to build component") {
// Tests that require the Alloy controller to run should not have build error else the components won't run.
if !runAlloyController && err != nil && strings.Contains(err.Error(), "Failed to build component") {
t.Log("ignoring error: " + err.Error())
return
}
Expand Down
Loading