Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dashboard] add RAY_PROMETHEUS_HEADERS env for carrying additional headers to Prometheus #49353

Merged
merged 7 commits into from
Jan 5, 2025

Conversation

rueian
Copy link
Contributor

@rueian rueian commented Dec 19, 2024

Why are these changes needed?

Some users from Slack channels have an authentication proxy in front of their Prometheus server and that proxy requires requests to have specific HTTP headers to pass. Currently, we don't have a way for them to customize headers for those requests.

This PR introduces a new RAY_PROMETHEUS_HEADERS environment variable, next to the existing RAY_PROMETHEUS_HOST, to allow users to add custom headers for requests to Prometheus from Ray Dashboard and Grafana (ref)

The format of RAY_PROMETHEUS_HEADERS should be a JSON string in one of the following:

  1. {"H1": "V1", "H2": "V2"}
  2. [["H1", "V1"], ["H2", "V2"], ["H2", "V3"]]

The second format supports multiple headers with the same name (which is valid by HTTP spec) while the first format doesn't due to the aiohttp limitation. i.e. the format like{"H2": ["V2", "V3"]} is not supported.

A Grafana datasource.yml example of the first format

Screenshot 2024-12-21 at 08 20 18

A Grafana datasource.yml example of the second format

Screenshot 2024-12-21 at 08 18 44

A request dump example from Ray Dashboard to Prometheus

Screenshot 2024-12-21 at 10 19 12

A request dump example from Grafana to Prometheus

Screenshot 2024-12-21 at 10 58 17

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@rueian rueian force-pushed the dashboard-prom-headers branch 2 times, most recently from b667fec to 4b9357a Compare December 19, 2024 13:52
@kevin85421 kevin85421 self-assigned this Dec 19, 2024
@rueian rueian force-pushed the dashboard-prom-headers branch 2 times, most recently from a1b293a to 8e42e75 Compare December 20, 2024 06:20
@rueian rueian changed the title [Dashboard] add RAY_PROMETHEUS_HEADERS env for carrying additional headers to Prometheus (#2669) [Dashboard] add RAY_PROMETHEUS_HEADERS env for carrying additional headers to Prometheus Dec 21, 2024
@rueian rueian force-pushed the dashboard-prom-headers branch from 8e42e75 to 2b39527 Compare December 21, 2024 02:25
Comment on lines -25 to +46
GRAFANA_DATASOURCE_TEMPLATE = """apiVersion: 1

datasources:
- name: {prometheus_name}
url: {prometheus_host}
type: prometheus
isDefault: true
access: proxy
"""
def GRAFANA_DATASOURCE_TEMPLATE(
prometheus_name, prometheus_host, jsonData, secureJsonData
):
return yaml.safe_dump(
{
"apiVersion": 1,
"datasources": [
{
"name": prometheus_name,
"url": prometheus_host,
"type": "prometheus",
"isDefault": True,
"access": "proxy",
"jsonData": jsonData,
"secureJsonData": secureJsonData,
}
],
}
)
Copy link
Contributor Author

@rueian rueian Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using yaml.safe_dump is better because a simple string template is now not enough for complex values in jsonData and secureJsonData and there might also be some values that require escaping to fit into a yaml file.

@rueian rueian marked this pull request as ready for review December 21, 2024 03:25
@rueian rueian requested a review from a team as a code owner December 21, 2024 03:25
@jcotant1 jcotant1 added observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling dashboard Issues specific to the Ray Dashboard labels Dec 23, 2024
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with our Prometheus setup, but the Ray head node has several metric endpoints, including 8080, Autoscaler metrics, and Dashboard metrics. Are they in separate codepaths?

https://github.com/ray-project/kuberay/blob/a788963f36bee9e0c5b7b58a20be0c9ae10f3b04/config/prometheus/podMonitor.yaml#L46-L63

@@ -13,6 +13,8 @@

DEFAULT_PROMETHEUS_HOST = "http://localhost:9090"
PROMETHEUS_HOST_ENV_VAR = "RAY_PROMETHEUS_HOST"
DEFAULT_PROMETHEUS_HEADERS = "{}"
PROMETHEUS_HEADERS_ENV_VAR = "RAY_PROMETHEUS_HEADERS"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add the comments about the format here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the comment below and next to the parse_prom_headers function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with our Prometheus setup, but the Ray head node has several metric endpoints, including 8080, Autoscaler metrics, and Dashboard metrics. Are they in separate codepaths?

Yes, they are in the different code paths. Prometheus will periodically scrape metrics from these metrics endpoints, while this PR targets the endpoints of the Prometheus server.

@@ -73,6 +76,10 @@ def __init__(self, dashboard_head):
self.prometheus_host = os.environ.get(
PROMETHEUS_HOST_ENV_VAR, DEFAULT_PROMETHEUS_HOST
)
self.prometheus_headers = os.environ.get(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do some basic validation of the format before using the env var?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new parse_prom_headers function is added for both validation and parsing. This is what it looks like when the validation fails:

export RAY_PROMETHEUS_HEADERS='[["XAuth"],["XAuth2"]]'
▶ ray start --head --include-dashboard=true --metrics-export-port=8080

Local node IP: 127.0.0.1
2024-12-28 13:32:56,919	ERROR services.py:1353 -- Failed to start the dashboard , return code 1
2024-12-28 13:32:56,920	ERROR services.py:1378 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#logging-directory-structure' to find where the log file is.
Traceback (most recent call last):
  ...
  File "/Users/ruian/Code/python/ray/python/ray/dashboard/modules/metrics/metrics_head.py", line 97, in __init__
    self.prometheus_headers = parse_prom_headers(
                              ^^^^^^^^^^^^^^^^^^^
  File "/Users/ruian/Code/python/ray/python/ray/dashboard/modules/metrics/metrics_head.py", line 74, in parse_prom_headers
    raise ValueError(
ValueError: RAY_PROMETHEUS_HEADERS should be a JSON string in one of the formats:
1) An object with string keys and string values.
2) an array of string arrays with 2 string elements each.

@rueian rueian force-pushed the dashboard-prom-headers branch from 2b39527 to 7a654e6 Compare December 28, 2024 05:35
@rueian rueian requested a review from alanwguo December 28, 2024 13:44
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test?

return parsed
raise ValueError(
f"{PROMETHEUS_HEADERS_ENV_VAR} should be a JSON string in one of the formats:\n"
+ "1) An object with string keys and string values.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the example to the ValueError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

# parse_prom_headers will make sure the input is in one of the following formats:
# 1. {"H1": "V1", "H2": "V2"}
# 2. [["H1", "V1"], ["H2", "V2"], ["H2", "V3"]]
def parse_prom_headers(prometheus_headers):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file seems to be highly similar to metrics_head.py. What are the differences between them? Perhaps @alanwguo knows the difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this file is used at:

subprocess.run(
[
"python",
"prometheus_metrics.py",
str(time_taken),
"--path",
os.environ["METRICS_OUTPUT_JSON"],
],
timeout=metrics_timeout,
check=True,
)
and
def save_metrics(self, start_time: float, timeout: float = 900):
self.run_prepare_command(
f"python prometheus_metrics.py {start_time}", timeout=timeout
)

According to their commit messages, these usages are related to CI procedures.

It also looks like the script intentionally does not import Ray, therefore, I copied the parse_prom_headers function from metrics_head.py to here instead of importing it. But if we don't need this in the CI procedures, I will just remove the change. Do we need this in the CI procedures?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the change from the _prometheus_metrics.py. It seems we will never need this feature in the release procedures.

@rueian
Copy link
Contributor Author

rueian commented Jan 3, 2025

Should we add a test?

Sure! How about adding the test to python/ray/dashboard/tests/test_dashboard.py? I have added one there and used the DashboardHead._load_modules(), which is responsible for loading the MetricsHead and the DataHead, to test the validation.

@rueian rueian force-pushed the dashboard-prom-headers branch from b630a53 to d8d1e53 Compare January 3, 2025 04:59
@rueian rueian force-pushed the dashboard-prom-headers branch from d8d1e53 to be3b9e5 Compare January 3, 2025 05:27

# Test the unsupported case.
with pytest.raises(ValueError):
os.environ[PROMETHEUS_HEADERS_ENV_VAR] = '{"H1": "V1", "H2": ["V1", "V2"]}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use monkeypatch.setenv to avoid env var leak

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint. Updated.

@rueian rueian force-pushed the dashboard-prom-headers branch from 1c19a57 to 626e098 Compare January 4, 2025 02:56
@kevin85421 kevin85421 added the go add ONLY when ready to merge, run all tests label Jan 4, 2025
@kevin85421
Copy link
Member

Trigger the whole CI tests.

@rueian
Copy link
Contributor Author

rueian commented Jan 4, 2025

Trigger the whole CI tests.

Hi @kevin85421, the previous premerge check failed on minimal installation tests and I pushed a fix for that. Could you help trigger the whole CI test again? Thank you.

@kevin85421
Copy link
Member

@rueian With the go label on the PR, all tests will be executed.

@jjyao jjyao merged commit d37edf9 into ray-project:master Jan 5, 2025
5 checks passed
roshankathawate pushed a commit to roshankathawate/ray that referenced this pull request Jan 7, 2025
roshankathawate pushed a commit to roshankathawate/ray that referenced this pull request Jan 9, 2025
…aders to Prometheus (ray-project#49353)

Signed-off-by: Rueian <[email protected]>
Signed-off-by: Roshan Kathawate <[email protected]>
anyadontfly pushed a commit to anyadontfly/ray that referenced this pull request Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dashboard Issues specific to the Ray Dashboard go add ONLY when ready to merge, run all tests observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants