Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: claim request failed with proxy #19096

Open
kn-ulf opened this issue Nov 27, 2024 · 11 comments
Open

[Bug]: claim request failed with proxy #19096

kn-ulf opened this issue Nov 27, 2024 · 11 comments
Labels
bug needs triage Issues which need to be manually labelled priority/high Super important issue

Comments

@kn-ulf
Copy link

kn-ulf commented Nov 27, 2024

Bug description

Hi,
I have to use a proxy to connect to internet resources, that's why there are always environment variables set for the proxy.
[root@localhost ~]# env | grep -i proxy
HTTP_PROXY=http://sanitized.proxy.url:80
FTP_PROXY=http://sanitized.proxy.url:80
https_proxy=http://sanitized.proxy.url:80
http_proxy=http://sanitized.proxy.url:80
no_proxy=127.0.0.1,localhost,.sanitzed.url,10.0.0.0/8
NO_PROXY=127.0.0.1,localhost,.sanitzed.url,10.0.0.0/8
HTTPS_PROXY=http://sanitized.proxy.url:80
ftp_proxy=http://sanitized.proxy.url:80
[root@localhost ~]#

after installation of the netdata agent (I tested with stable version 2.0.1, 2.0.2, and 2.0.3) the logging shows:

Nov 27 09:59:28 localhost netdata[1706469]: CLAIM: Request failed with error: Couldn't connect to server
Nov 27 09:59:33 localhost netdata[1706469]: CLAIM: Request failed with error: Couldn't connect to server
Nov 27 09:59:38 localhost netdata[1706469]: CLAIM: Request failed with error: SSL connect error
Nov 27 09:59:38 localhost netdata[1706469]: CLAIM: Request failed with error: SSL connect error
Nov 27 09:59:39 localhost netdata[1706469]: CLAIM: Request failed with error: SSL connect error
Nov 27 09:59:40 localhost netdata[1706469]: CLAIM: Unable to find our claimed_id, setting state to AGENT_UNCLAIMED

the netdata configuration shows:

[root@localhost ~]# netdatacli dumpconfig | grep -A6 "\[cloud\]"
[cloud]
        # conversation log = no
        # scope = full
        # query thread count = 4
        # proxy = env

[ml]
[root@localhost ~]#

I read out of this setting that if there is a proxy env variable set claim should use it?
I see also this line in the logs:

 Nov 27 09:59:40 localhost netdata[1707334]: level=info msg="env HTTP_PROXY '', HTTPS_PROXY ''" plugin=go.d component=agent

I wonder why the proxy is empty here?

If I try to reach with curl, I can reach it:

[root@localhost ~]# curl -v  https://app.netdata.cloud
* Rebuilt URL to: https://app.netdata.cloud/
* Uses proxy env variable no_proxy == '127.0.0.1,localhost,.sanitzed.url,10.0.0.0/8'
* Uses proxy env variable https_proxy == 'http://sanitized.proxy.url:80'
...
> CONNECT app.netdata.cloud:443 HTTP/1.1
> Host: app.netdata.cloud:443
> User-Agent: curl/7.61.1
> Proxy-Connection: Keep-Alive
>
...
< HTTP/2 200
< accept-ranges: bytes
< access-control-allow-credentials: true
< cache-control: no-cache
< content-length: 2911
< content-type: text/html
< date: Wed, 27 Nov 2024 10:00:38 GMT
< etag: "6746deb2-b5f"
< expires: Thu, 01 Jan 1970 00:00:01 GMT
< last-modified: Wed, 27 Nov 2024 08:56:18 GMT
< server: nginx
< vary: Accept-Encoding
< x-content-type-options: nosniff
< x-frame-options: SAMEORIGIN
<
<!doctype html><html><head><script>window.envSettings = {
        apiUrl: "https://app.netdata.cloud",
        cloudUrl: "https://app.netdata.cloud",
        demoSlug: "netdata-demo",
...

I have also tried to set the proxy in the configuration directly like (and restarted netdata after setting the proxy):

[root@localhost ~]# netdatacli dumpconfig | grep -A6 "\[cloud\]"
[cloud]
        # conversation log = no
        # scope = full
        # query thread count = 4
        proxy = http://sanitized.proxy.url:80

[ml]
[root@localhost ~]#

But claim fails with the same error message.

Expected behavior

I would expect that claim should be able to register the agent in the netdata cloud and reach netdata services using the proxy environment variables. Or at least when I configure the proxy in the netdata.conf file.

Steps to reproduce

  1. Install fresh actual RHEL8 or Rocky Linux or I would guess this issue should be independent from the OS you choose.
  2. Set proxy environment variables - from my point of view it does not matter which proxy software you use.
  3. Disable direct internet access. Internet access should only be possible using the proxy.
  4. Install a stable netdata agent. I would expect you should see the same error messages.

Installation method

kickstart.sh

System info

[root@localhost ~]# uname -a; grep -HvE "^#|URL" /etc/*release
Linux localhost 4.18.0-553.22.1.el8_10.x86_64 #1 SMP Wed Sep 11 18:02:00 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux
/etc/os-release:NAME="Red Hat Enterprise Linux"
/etc/os-release:VERSION="8.10 (Ootpa)"
/etc/os-release:ID="rhel"
/etc/os-release:ID_LIKE="fedora"
/etc/os-release:VERSION_ID="8.10"
/etc/os-release:PLATFORM_ID="platform:el8"
/etc/os-release:PRETTY_NAME="Red Hat Enterprise Linux 8.10 (Ootpa)"
/etc/os-release:ANSI_COLOR="0;31"
/etc/os-release:CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
/etc/os-release:
/etc/os-release:REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
/etc/os-release:REDHAT_BUGZILLA_PRODUCT_VERSION=8.10
/etc/os-release:REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
/etc/os-release:REDHAT_SUPPORT_PRODUCT_VERSION="8.10"
/etc/redhat-release:Red Hat Enterprise Linux release 8.10 (Ootpa)
/etc/system-release:Red Hat Enterprise Linux release 8.10 (Ootpa)
[root@localhost ~]#

Netdata build info

[root@localhost ~]# netdata -W buildinfo
Packaging:
    Netdata Version ____________________________________________ : v2.0.3
    Installation Type __________________________________________ : binpkg-rpm
    Package Architecture _______________________________________ : x86_64
    Package Distro _____________________________________________ :
    Configure Options __________________________________________ : dummy-configure-command
Default Directories:
    User Configurations ________________________________________ : /etc/netdata
    Stock Configurations _______________________________________ : /usr/lib/netdata/conf.d
    Ephemeral Databases (metrics data, metadata) _______________ : /var/cache/netdata
    Permanent Databases ________________________________________ : /var/lib/netdata
    Plugins ____________________________________________________ : /usr/libexec/netdata/plugins.d
    Static Web Files ___________________________________________ : /usr/share/netdata/web
    Log Files __________________________________________________ : /var/log/netdata
    Lock Files _________________________________________________ : /var/lib/netdata/lock
    Home _______________________________________________________ : /var/lib/netdata
Operating System:
    Kernel _____________________________________________________ : Linux
    Kernel Version _____________________________________________ : 4.18.0-553.22.1.el8_10.x86_64
    Operating System ___________________________________________ : Red Hat Enterprise Linux
    Operating System ID ________________________________________ : rhel
    Operating System ID Like ___________________________________ : fedora
    Operating System Version ___________________________________ : 8.10 (Ootpa)
    Operating System Version ID ________________________________ : none
    Detection __________________________________________________ : /etc/os-release
Hardware:
    CPU Cores __________________________________________________ : 8
    CPU Frequency ______________________________________________ : 2893000000
    RAM Bytes __________________________________________________ : 50316021760
    Disk Capacity ______________________________________________ : 1661722846208
    CPU Architecture ___________________________________________ : x86_64
    Virtualization Technology __________________________________ : vmware
    Virtualization Detection ___________________________________ : systemd-detect-virt
Container:
    Container __________________________________________________ : none
    Container Detection ________________________________________ : systemd-detect-virt
    Container Orchestrator _____________________________________ : none
    Container Operating System _________________________________ : none
    Container Operating System ID ______________________________ : none
    Container Operating System ID Like _________________________ : none
    Container Operating System Version _________________________ : none
    Container Operating System Version ID ______________________ : none
    Container Operating System Detection _______________________ : none
Features:
    Built For __________________________________________________ : Linux
    Netdata Cloud ______________________________________________ : YES
    Health (trigger alerts and send notifications) _____________ : YES
    Streaming (stream metrics to parent Netdata servers) _______ : YES
    Back-filling (of higher database tiers) ____________________ : YES
    Replication (fill the gaps of parent Netdata servers) ______ : YES
    Streaming and Replication Compression ______________________ : YES (zstd gzip)
    Contexts (index all active and archived metrics) ___________ : YES
    Tiering (multiple dbs with different metrics resolution) ___ : YES (5)
    Machine Learning ___________________________________________ : YES
Database Engines:
    dbengine (compression) _____________________________________ : YES (zstd)
    alloc ______________________________________________________ : YES
    ram ________________________________________________________ : YES
    none _______________________________________________________ : YES
Connectivity Capabilities:
    ACLK (Agent-Cloud Link: MQTT over WebSockets over TLS) _____ : YES
    static (Netdata internal web server) _______________________ : YES
    h2o (web server) ___________________________________________ : YES
    WebRTC (experimental) ______________________________________ : NO
    Native HTTPS (TLS Support) _________________________________ : YES
    TLS Host Verification ______________________________________ : YES
Libraries:
    LZ4 (extremely fast lossless compression algorithm) ________ : NO
    ZSTD (fast, lossless compression algorithm) ________________ : YES
    zlib (lossless data-compression library) ___________________ : YES
    Brotli (generic-purpose lossless compression algorithm) ____ : NO
    protobuf (platform-neutral data serialization protocol) ____ : YES (system)
    OpenSSL (cryptography) _____________________________________ : YES
    libdatachannel (stand-alone WebRTC data channels) __________ : NO
    JSON-C (lightweight JSON manipulation) _____________________ : YES
    libcap (Linux capabilities system operations) ______________ : NO
    libcrypto (cryptographic functions) ________________________ : YES
    libyaml (library for parsing and emitting YAML) ____________ : YES
    libmnl (library for working with netfilter) ________________ : YES
Plugins:
    apps (monitor processes) ___________________________________ : YES
    cgroups (monitor containers and VMs) _______________________ : YES
    cgroup-network (associate interfaces to CGROUPS) ___________ : YES
    proc (monitor Linux systems) _______________________________ : YES
    tc (monitor Linux network QoS) _____________________________ : YES
    diskspace (monitor Linux mount points) _____________________ : YES
    freebsd (monitor FreeBSD systems) __________________________ : NO
    macos (monitor MacOS systems) ______________________________ : NO
    statsd (collect custom application metrics) ________________ : YES
    timex (check system clock synchronization) _________________ : YES
    idlejitter (check system latency and jitter) _______________ : YES
    bash (support shell data collection jobs - charts.d) _______ : YES
    debugfs (kernel debugging metrics) _________________________ : YES
    cups (monitor printers and print jobs) _____________________ : YES
    ebpf (monitor system calls) ________________________________ : YES
    freeipmi (monitor enterprise server H/W) ___________________ : YES
    nfacct (gather netfilter accounting) _______________________ : NO
    perf (collect kernel performance events) ___________________ : YES
    slabinfo (monitor kernel object caching) ___________________ : YES
    Xen ________________________________________________________ : NO
    Xen VBD Error Tracking _____________________________________ : NO
Exporters:
    AWS Kinesis ________________________________________________ : NO
    GCP PubSub _________________________________________________ : NO
    MongoDB ____________________________________________________ : YES
    Prometheus (OpenMetrics) Exporter __________________________ : YES
    Prometheus Remote Write ____________________________________ : YES
    Graphite ___________________________________________________ : YES
    Graphite HTTP / HTTPS ______________________________________ : YES
    JSON _______________________________________________________ : YES
    JSON HTTP / HTTPS __________________________________________ : YES
    OpenTSDB ___________________________________________________ : YES
    OpenTSDB HTTP / HTTPS ______________________________________ : YES
    All Metrics API ____________________________________________ : YES
    Shell (use metrics in shell scripts) _______________________ : YES
Debug/Developer Features:
    Trace All Netdata Allocations (with charts) ________________ : NO
    Developer Mode (more runtime checks, slower) _______________ : NO
[root@localhost ~]#

Additional info

No response

@kn-ulf kn-ulf added bug needs triage Issues which need to be manually labelled labels Nov 27, 2024
@ilyam8
Copy link
Member

ilyam8 commented Nov 27, 2024

[root@localhost ~]# env | grep -i proxy

Perhaps these environment variables are set only for root?

Can you check

## Change PID to Netdata's PID
sudo cat /proc/PID/environ | tr '\0' '\n'

@kn-ulf
Copy link
Author

kn-ulf commented Nov 27, 2024

Thank you for looking into this.

[root@localhost ~]# cat /run/netdata/netdata.pid
1706469
[root@localhost ~]#
[root@localhost ~]#  cat /proc/1706469/environ | tr '\0' '\n'
LANG=en_US.UTF-8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
HOME=/root
LOGNAME=root
USER=root
SHELL=/bin/sh
INVOCATION_ID=29efe723bc404724b5a574f7e6b0d19a
JOURNAL_STREAM=9:12365470
RUNTIME_DIRECTORY=/run/netdata
[root@localhost ~]# ps aux | grep 1706469
netdata  1706469  5.9  0.5 1171992 255764 ?      SNsl 09:59   9:38 /usr/sbin/netdata -P /run/netdata/netdata.pid -D
[root@localhost ~]#

Interessting, looks like the netdata-agent (process) is also using the user root?
I don't see the proxy env variables here and I think its related to the fact that you set nologin for the netdata user.
The proxy variables will be set via profile.d and this is not called if you have no shell as far as I know?
But if I set the proxy hard into the configuration like I described in the end of the issue description, I get the same error message and claim also failed.
Is there a way to sent the env variables also for the netdata user without shell (nologin)?

@ilyam8 ilyam8 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 27, 2024
@ilyam8
Copy link
Member

ilyam8 commented Nov 27, 2024

Netdata runs as the netdata user.

@kn-ulf
Copy link
Author

kn-ulf commented Nov 27, 2024

Here I show you if I specify the proxy direct in the configuration claim also fails.

[root@localhost ~]# netdatacli dumpconfig | grep -A5 "\[cloud"
[cloud]
        # conversation log = no
        # scope = full
        # query thread count = 4
        # proxy = env

[root@localhost ~]# systemctl restart netdata
[root@localhost ~]# netdatacli dumpconfig | grep -A5 "\[cloud"
[cloud]
        #| >>> [cloud].proxy <<<
        #| datatype: text, default value: env
        proxy = http://sanitized.proxy.url:80

        # conversation log = no
        # scope = full
[root@localhost ~]#
Nov 27 12:53:36 localhost netdata[2095499]: CLAIM: Request failed with error: SSL connect error
Nov 27 12:53:36 localhost netdata[2095499]: CLAIM: Request failed with error: SSL connect error
Nov 27 12:53:37 localhost netdata[2095499]: CLAIM: Request failed with error: SSL connect error
Nov 27 12:53:38 localhost netdata[2095499]: CLAIM: Request failed with error: SSL connect error
Nov 27 12:53:39 localhost netdata[2095499]: CLAIM: Request failed with error: SSL connect error
Nov 27 12:53:39 localhost netdata[2095499]: CLAIM: Unable to find our claimed_id, setting state to AGENT_UNCLAIMED

@ilyam8
Copy link
Member

ilyam8 commented Nov 27, 2024

@kn-ulf @stelfrag will try to reproduce it. I will reopen the issue if needed.

@sashwathn sashwathn reopened this Nov 27, 2024
@kn-ulf
Copy link
Author

kn-ulf commented Nov 27, 2024

I was able to set the env variables also for the netdata user:

[root@localhost ~]# cat /etc/systemd/system/netdata.service.d/override.conf
[Service]
Environment="HTTP_PROXY=http://sanitized.proxy.url:80"
Environment="HTTPS_PROXY=http://sanitized.proxy.url:80"
[root@localhost ~]#
[root@localhost ~]# systemctl restart netdata
[root@localhost ~]# cat /proc/`cat /run/netdata/netdata.pid`/environ | tr '\0' '\n'
LANG=en_US.UTF-8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
HOME=/root
LOGNAME=root
USER=root
SHELL=/bin/sh
INVOCATION_ID=fb596538ceef4267983cf25624c75a84
JOURNAL_STREAM=9:16373782
RUNTIME_DIRECTORY=/run/netdata
HTTP_PROXY=http://sanitized.proxy.url:80
HTTPS_PROXY=http://sanitized.proxy.url:80

Nov 27 14:30:19 localhost netdata[2326084]: level=info msg="env HTTP_PROXY 'http://sanitized.proxy.url:80', HTTPS_PROXY 'http://sanitized.proxy.url:80'" plugin=go.d component=agent

But claim still fails:

Nov 27 14:30:15 localhost netdata[2325653]: CLAIM: Request failed with error: SSL connect error
Nov 27 14:30:16 localhost netdata[2325653]: CLAIM: Request failed with error: SSL connect error
Nov 27 14:30:16 localhost netdata[2325653]: CLAIM: Request failed with error: SSL connect error
Nov 27 14:30:17 localhost netdata[2325653]: CLAIM: Request failed with error: SSL connect error
Nov 27 14:30:18 localhost netdata[2325653]: CLAIM: Request failed with error: SSL connect error
Nov 27 14:30:19 localhost netdata[2325653]: CLAIM: Unable to find our claimed_id, setting state to AGENT_UNCLAIMED

@shyamvalsan shyamvalsan added the priority/high Super important issue label Nov 27, 2024
@ktsaou
Copy link
Member

ktsaou commented Nov 27, 2024

Hi @kn-ulf,

This should not be happening. I am sorry for the frustration this may have cause.

Let me review the code and I will write here my findings.

First, let me assure you that we will make it work. Netdata uses libcurl for claiiming, so as long as curl works, Netdata will finally work too.

There are a 4 different methods for claiming:

1. Via the UI

In this case the proxy configuration comes from the /var/lib/netdata/cloud.d/cloud.conf, [global].proxy and the default value is env, which means "let libcurl decide".

However, I see in the code this:

    // backwards compatibility, from when proxy was in netdata.conf
    // netdata.conf has bigger priority

So, in this case, if you have set [cloud].proxy in netdata.conf, it uses this value and ignores /var/lib/netdata/cloud.d/cloud.conf.

2. Via /etc/netdata/claim.conf

This file is usually generated by our installer. The proxy configuration comes from [global].proxy and (oops!) the default value is empty, which means "proxy disabled".

This is for sure a bug. The default should be env.

3. Via environment variables

This is usually the preferred method when Netdata runs in a container (k8s, docker, etc), because it is easier for CI/CD. Proxy comes from the environment variable NETDATA_CLAIM_PROXY and the default is empty, which means "proxy disabled".

For my taste, the default should have been env, so that users can set the libcurl variables and ignore NETDATA_CLAIM_PROXY. But anyway, this is less of a bug, because the environment inside the container can only be altered via the same way.

4. A backwards compatible way from split files in /var/lib/netdata.

This is not used any more for new users, but we have to keep backwards compatibility for old users, so it still exists.

In this case, proxy is retrieved the same way Option 1 above does.


Option 1 is manual. Users need to trigger it by hand.

The other options are tried in the sequence I gave them above, every time Netdata starts. So, first it tries claim.conf, then environment variables, then split files.


Since curl works for you, I guess that if you set the libcurl variables for the netdata user and set [cloud].proxy = env in netdata.conf, or /etc/netdata/claim.conf, it should work too.

If it still does not work and you can install somewhere a nightly build of Netdata, we can some more debugging information at the logs, to find what exactly is used as a proxy.

@ktsaou
Copy link
Member

ktsaou commented Nov 27, 2024

I merged a change to log the proxy Netdata uses and also change the default in all cases to env.
I think tomorrows nightly will have these changes...

@ilyam8
Copy link
Member

ilyam8 commented Nov 28, 2024

Tested Netdata v2.0.0-126-nightly with Tinyproxy: claiming and connection successful.

Setup:

  • Node 1 (Netdata): No direct internet access.
  • Node 2 (Tinyproxy): Acts as a proxy for Node 1's internet traffic.

For claiming I used claim.conf:

$ cat -pp /opt/netdata/etc/netdata/claim.conf

[global]
	# url = https://app.netdata.cloud
	proxy = http://10.10.10.21:8888
	token = TOKEN
	rooms = ROOM_ID

@kn-ulf
Copy link
Author

kn-ulf commented Nov 28, 2024

Hi,
thank you for your fast response and for the effort you put into that topic.

Let me try to sort the topics a little bit and please correct me if I got it wrong.

1. Manual configuration in config file(s)

I understood that there is a way to set proxy manually in the configuration file(s).
One file is /etc/netdata/netdata.conf and the option is:

[Cloud]
     proxy = http://your.proxy.url:8080

and the second file, to make claim work, is /etc/netdata/claim.conf and the option is:

[Global]
    proxy = http://your.proxy.url:8080

and only if I adjust both files with both options shown above claiming and netdata will work using the proxy.
And this should work also with older versions you only need to know that this options exists and that you need to set both, correct? And this is what @ilyam8 successful tested with v2.0.0-126-nightly above, right?

2. Set proxy as environment variable

I understood that I have to make sure the netdata user environment and only this environment needs to have the proxy variable set.
This can be done by creating a file called /etc/systemd/system/netdata.service.d/override.conf (manually or by using systemctl edit netdata) and adding:

[Service]
Environment="HTTP_PROXY=http://sanitized.proxy.url:80"
Environment="HTTPS_PROXY=http://sanitized.proxy.url:80"

As far as I can say that's not documented anywhere at netdata because its not netdata specific how to set environment variables for users with without shell access (nologin).
I understood that this way should have worked but there was a bug for claim which you @ktsaou addressed in the current nightly to set proxy default to env, so this should work now. But it was not yet tested correct?

Again sorry if I mixed or misunderstood things here, and thank you for your help.

@ktsaou
Copy link
Member

ktsaou commented Nov 28, 2024

@kn-ulf please don't use capital letters in .conf files for sections. cloud, not Cloud and global, not Global.

On a vanilla netdata installation (so the default settings is proxy = env for the first 2 methods - even on the stable release - on currently nightly releases it is env for all methods), both of the ways you describe should work.

Also settings proxy = YOUR_PROXY to either of the following should also work:

In /etc/netdata/netdata.conf:

[cloud]
   proxy = YOUR_PROXY

In /etc/netdata/claim.conf:

[global]
   proxy = YOUR_PROXY

If you set both of them, note that netdata.conf will be preferred.

As far as I can say that's not documented anywhere at netdata because its not netdata specific how to set environment variables for users with without shell access (nologin).

You mean about setting the proxy for systemd services? Yes, it is not documented because this is supposed to be systemd (initrc, openrc, etc) documentation. Each system has its own unique ways.

I understood that this way should have worked but there was a bug for claim which you @ktsaou addressed in the current nightly to set proxy default to env, so this should work now. But it was not yet tested correct?

I only set it to env by default, for the methods 3 and 4. I am not sure they influence your setup.

The key change I made for you, was to log the proxy used when libcurl fails. So, this log:

Nov 27 14:30:15 localhost netdata[2325653]: CLAIM: Request failed with error: SSL connect error

Should now have information about the proxy used (env, none, or a proxy url).

@netdata netdata deleted a comment from yuvashrikarunakaran Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs triage Issues which need to be manually labelled priority/high Super important issue
Projects
None yet
Development

No branches or pull requests

5 participants