-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hostname as default labels for promtail #634
Comments
This should be possible already using command line flags with PR #510 |
@slim-bean this seems not working. I've tried |
looks like |
I would like to create a new issue. |
For people trying to add hostname as a label, you can use the flag labels:
host: ${HOSTNAME} |
To clarify what was written above, you need to use the following as a command line parameter:
As is documented here: |
As far as I tested, this doesn't work using systemd unit definition for starting promtail as HOSTNAME environment variable is empty (at least in my servers). I solved using %H (systemd hostname variable) as HOSTNAME variable. This makes possible to expand this variable in promtail configuration. Hope this helps someone. |
For people trying to only add hostname as a label, you can just add this as a command line parameter: -client.external-labels=hostname=$(HOSTNAME) |
If you're starting promtail in a systemd unit file, the hostname environment variable is not available by default. I added this
Thanks everyone! |
If using the Promtail helm chart you can add the following to values.yaml:
...which should add the |
I've tried all the above to no avail:
my promtail version = 2.9.1 |
I have this on my service file and it works:
|
Aleph now exposes prometheus metrics on the port 9100 feat(bash_snippets#Do relative import of a bash library): Do relative import of a bash library If you want to import a file `lib.sh` that lives in the same directory as the file that is importing it you can use the next snippet: ```bash source "$(dirname "$(realpath "$0")")/lib.sh" ``` If you use `source ./lib.sh` you will get an import error if you run the script on any other place that is not the directory where `lib.sh` lives. feat(bash_snippets#Check the battery status): Check the battery status This [article gives many ways to check the status of a battery](https://www.howtogeek.com/810971/how-to-check-a-linux-laptops-battery-from-the-command-line/), for my purposes the next one is enough ```bash cat /sys/class/power_supply/BAT0/capacity ``` feat(bash_snippets#Check if file is being sourced): Check if file is being sourced Assuming that you are running bash, put the following code near the start of the script that you want to be sourced but not executed: ```bash if [ "${BASH_SOURCE[0]}" -ef "$0" ] then echo "Hey, you should source this script, not execute it!" exit 1 fi ``` Under bash, `${BASH_SOURCE[0]}` will contain the name of the current file that the shell is reading regardless of whether it is being sourced or executed. By contrast, `$0` is the name of the current file being executed. `-ef` tests if these two files are the same file. If they are, we alert the user and exit. Neither `-ef` nor `BASH_SOURCE` are POSIX. While `-ef` is supported by ksh, yash, zsh and Dash, BASH_SOURCE requires bash. In zsh, however, `${BASH_SOURCE[0]}` could be replaced by `${(%):-%N}`. feat(bash_snippets#Parsing bash arguments): Parsing bash arguments Long story short, it's nasty, think of using a python script with [typer](typer.md) instead. There are some possibilities to do this: - [The old getops](https://www.baeldung.com/linux/bash-parse-command-line-arguments) - [argbash](https://github.com/matejak/argbash) library - [Build your own parser](https://medium.com/@Drew_Stokes/bash-argument-parsing-54f3b81a6a8f) ci: also commit the not by ai badge in the CI fix(alertmanager): Add another source on how to silence alerts If previous guidelines don't work for you, you can use the [sleep peacefully guidelines](https://samber.github.io/awesome-prometheus-alerts/sleep-peacefully) to tackle it at query level. feat(documentation#references): Add diátaxis as documentation writing guideline [Diátaxis: A systematic approach to technical documentation authoring](https://diataxis.fr/) feat(ecc): Check if system is actually using ECC Another way is to run `dmidecode`. For ECC support you'll see: ```bash $: dmidecode -t memory | grep ECC Error Correction Type: Single-bit ECC # or Error Correction Type: Multi-bit ECC ``` No ECC: ```bash $: dmidecode -t memory | grep ECC Error Correction Type: None ``` You can also test it with [`rasdaemon`](rasdaemon.md) feat(faster#Prometheus metrics): Prometheus metrics Use [`prometheus-fastapi-instrumentator`](https://github.com/trallnag/prometheus-fastapi-instrumentator) feat(privileges#Videos): Add nice video on male privileges [La intuición femenina, gracias al lenguaje](https://twitter.com/almuariza/status/1772889815131807765?t=HH1W17VGbQ7K-_XmoCy_SQ&s=19) feat(ffmpeg#Reduce the video size): Reduce the video size If you don't mind using `H.265` replace the libx264 codec with libx265, and push the compression lever further by increasing the CRF value — add, say, 4 or 6, since a reasonable range for H.265 may be 24 to 30. Note that lower CRF values correspond to higher bitrates, and hence produce higher quality videos. ```bash ffmpeg -i input.mp4 -vcodec libx265 -crf 28 output.mp4 ``` If you want to stick to H.264 reduce the bitrate. You can check the current one with `ffprobe input.mkv`. Once you've chosen the new rate change it with: ```bash ffmpeg -i input.mp4 -b 3000k output.mp4 ``` Additional options that might be worth considering is setting the Constant Rate Factor, which lowers the average bit rate, but retains better quality. Vary the CRF between around 18 and 24 — the lower, the higher the bitrate. ```bash ffmpeg -i input.mp4 -vcodec libx264 -crf 20 output.mp4 ``` feat(icsx5): Introduce ICSx5 [ICSx5](https://f-droid.org/packages/at.bitfire.icsdroid/) is an android app to sync calendars. **References** - [Source](https://github.com/bitfireAT/icsx5) - [F-droid](https://f-droid.org/packages/at.bitfire.icsdroid/) feat(haproxy#Automatically ban offending traffic): Automatically ban offending traffic Check these two posts: - https://serverfault.com/questions/853806/blocking-ips-in-haproxy - https://www.loadbalancer.org/blog/simple-denial-of-service-dos-attack-mitigation-using-haproxy-2/ feat(haproxy#Configure haproxy logs to be sent to loki): Configure haproxy logs to be sent to loki In the `fronted` config add the next line: ``` # For more options look at https://www.chrisk.de/blog/2023/06/haproxy-syslog-promtail-loki-grafana-logfmt/ log-format 'client_ip=%ci client_port=%cp frontend_name=%f backend_name=%b server_name=%s performance_metrics=%TR/%Tw/%Tc/%Tr/%Ta status_code=%ST bytes_read=%B termination_state=%tsc haproxy_metrics=%ac/%fc/%bc/%sc/%rc srv_queue=%sq backend_queue=%bq user_agent=%{+Q}[capture.req.hdr(0)] http_hostname=%{+Q}[capture.req.hdr(1)] http_version=%HV http_method=%HM http_request_uri="%HU"' ``` At the bottom of [chrisk post](https://www.chrisk.de/blog/2023/06/haproxy-syslog-promtail-loki-grafana-logfmt/) is a table with all the available fields. [Programming VIP also has an interesting post](https://programming.vip/docs/loki-configures-the-collection-of-haproxy-logs.html). feat(haproxy#Reload haproxy): Reload haproxy - Check the config is alright ```bash service haproxy configtest # Or /usr/sbin/haproxy -c -V -f /etc/haproxy/haproxy.cfg ``` - Reload the service ```bash service haproxy reload ``` If you want to do a better reload you can [drop the SYN before a restart](https://serverfault.com/questions/580595/haproxy-graceful-reload-with-zero-packet-loss), so that clients will resend this SYN until it reaches the new process. ```bash iptables -I INPUT -p tcp --dport 80,443 --syn -j DROP sleep 1 service haproxy reload iptables -D INPUT -p tcp --dport 80,443 --syn -j DROP service haproxy reload ``` feat(linux_snippets#Get info of a mkv file): Get info of a mkv file ```bash ffprobe file.mkv ``` feat(loki#Alert when query returns no data): Alert when query returns no data Sometimes the queries you want to alert happen when the return value is NaN or No Data. For example if you want to monitory the happy path by setting an alert if a string is not found in some logs in a period of time. ```logql count_over_time({filename="/var/log/mail.log"} |= `Mail is sent` [24h]) < 1 ``` This won't trigger the alert because the `count_over_time` doesn't return a `0` but a `NaN`. One way to solve it is to use [the `vector(0)`](grafana/loki#7023) operator with [the operation `or on() vector(0)`](https://stackoverflow.com/questions/76489956/how-to-return-a-zero-vector-in-loki-logql-metric-query-when-grouping-is-used-and) ```logql (count_over_time({filename="/var/log/mail.log"} |= `Mail is sent` [24h]) or on() vector(0)) < 1 ``` feat(loki#Monitor loki metrics): Monitor loki metrics Since Loki reuses the Prometheus code for recording rules and WALs, it also gains all of Prometheus’ observability. To scrape loki metrics with prometheus add the next snippet to the prometheus configuration: ```yaml - job_name: loki metrics_path: /metrics static_configs: - targets: - loki:3100 ``` This assumes that `loki` is a docker in the same network as `prometheus`. There are some rules in the [awesome prometheus alerts repo](https://samber.github.io/awesome-prometheus-alerts/rules#loki) ```yaml --- groups: - name: Awesome Prometheus loki alert rules # https://samber.github.io/awesome-prometheus-alerts/rules#loki rules: - alert: LokiProcessTooManyRestarts expr: changes(process_start_time_seconds{job=~".*loki.*"}[15m]) > 2 for: 0m labels: severity: warning annotations: summary: Loki process too many restarts (instance {{ $labels.instance }}) description: "A loki process had too many restarts (target {{ $labels.instance }})\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: LokiRequestErrors expr: 100 * sum(rate(loki_request_duration_seconds_count{status_code=~"5.."}[1m])) by (namespace, job, route) / sum(rate(loki_request_duration_seconds_count[1m])) by (namespace, job, route) > 10 for: 15m labels: severity: critical annotations: summary: Loki request errors (instance {{ $labels.instance }}) description: "The {{ $labels.job }} and {{ $labels.route }} are experiencing errors\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: LokiRequestPanic expr: sum(increase(loki_panic_total[10m])) by (namespace, job) > 0 for: 5m labels: severity: critical annotations: summary: Loki request panic (instance {{ $labels.instance }}) description: "The {{ $labels.job }} is experiencing {{ printf \"%.2f\" $value }}% increase of panics\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: LokiRequestLatency expr: (histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route!~"(?i).*tail.*"}[5m])) by (le))) > 1 for: 5m labels: severity: critical annotations: summary: Loki request latency (instance {{ $labels.instance }}) description: "The {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf \"%.2f\" $value }}s 99th percentile latency\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" ``` And there are some guidelines on the rest of the metrics in [the grafana documentation](https://grafana.com/docs/loki/latest/operations/observability/) **[Monitor the ruler](https://grafana.com/docs/loki/latest/operations/recording-rules/)** Prometheus exposes a number of metrics for its WAL implementation, and these have all been prefixed with `loki_ruler_wal_`. For example: `prometheus_remote_storage_bytes_total` → `loki_ruler_wal_prometheus_remote_storage_bytes_total` Additional metrics are exposed, also with the prefix `loki_ruler_wal_`. All per-tenant metrics contain a tenant label, so be aware that cardinality could begin to be a concern if the number of tenants grows sufficiently large. Some key metrics to note are: - `loki_ruler_wal_appender_ready`: whether a WAL appender is ready to accept samples (1) or not (0) - `loki_ruler_wal_prometheus_remote_storage_samples_total`: number of samples sent per tenant to remote storage - `loki_ruler_wal_prometheus_remote_storage_samples_pending_total`: samples buffered in memory, waiting to be sent to remote storage - `loki_ruler_wal_prometheus_remote_storage_samples_failed_total`: samples that failed when sent to remote storage - `loki_ruler_wal_prometheus_remote_storage_samples_dropped_total`: samples dropped by relabel configurations - `loki_ruler_wal_prometheus_remote_storage_samples_retried_total`: samples re-resent to remote storage - `loki_ruler_wal_prometheus_remote_storage_highest_timestamp_in_seconds`: highest timestamp of sample appended to WAL - `loki_ruler_wal_prometheus_remote_storage_queue_highest_sent_timestamp_seconds`: highest timestamp of sample sent to remote storage. feat(loki#Get a useful Source link in the alertmanager): Get a useful Source link in the alertmanager [This still doesn't work](grafana/loki#4722). Currently for the ruler `external_url` if you use the URL of your Grafana installation: e.g. `external_url: "https://grafana.example.com"` it creates a Source link in alertmanager similar to https://grafana.example.com/graph?g0.expr=%28sum+by%28thing%29%28count_over_time%28%7Bnamespace%3D%22foo%22%7D+%7C+json+%7C+bar%3D%22maxRetries%22%5B5m%5D%29%29+%3E+0%29&g0.tab=1, which isn't valid. This url templating (via `/graph?g0.expr=%s&g0.tab=1`) appears to be coming from prometheus. There is not a workaround yet feat(orgmode#How to deal with recurring tasks that are not yet ready to be acted upon): How to deal with recurring tasks that are not yet ready to be acted upon By default when you mark a recurrent task as `DONE` it will transition the date (either appointment, `SCHEDULED` or `DEADLINE`) to the next date and change the state to `TODO`. I found it confusing because for me `TODO` actions are the ones that can be acted upon right now. That's why I'm using the next states instead: - `INACTIVE`: Recurrent task which date is not yet close so you should not take care of it. - `READY`: Recurrent task which date [is overdue](#how-to-deal-with-overdue-SCHEDULED-and-DEADLINE-tasks), we acknowledge the fact and mark the date as inactive (so that it doesn't clobber the agenda). The idea is that once an INACTIVE task reaches your agenda, either because the warning days of the `DEADLINE` make it show up, or because it's the `SCHEDULED` date you need to decide whether to change it to `TODO` if it's to be acted upon immediately or to `READY` and deactivate the date. `INACTIVE` then should be the default state transition for the recurring tasks once you mark it as `DONE`. To do this, set in your config: ```lua org_todo_repeat_to_state = "INACTIVE", ``` If a project gathers a list of recurrent subprojects or subactions it can have the next states: - `READY`: If there is at least one subelement in state `READY` and the rest are `INACTIVE` - `TODO`: If there is at least one subelement in state `TODO` and the rest may have `READY` or `INACTIVE` - `INACTIVE`: The project is not planned to be acted upon soon. - `WAITING`: The project is planned to be acted upon but all its subelements are in `INACTIVE` state. feat(promtail#Set the hostname label on all logs): Set the hostname label on all logs There are many ways to do it: - [Setting the label in the promtail launch command](https://community.grafana.com/t/how-to-add-variable-hostname-label-to-static-config-in-promtail/68352/11) ```bash sudo ./promtail-linux-amd64 --client.url=http://xxxx:3100/loki/api/v1/push --client.external-labels=hostname=$(hostname) --config.file=./config.yaml ``` This won't work if you're using promtail within a docker-compose because you can't use bash expansion in the `docker-compose.yaml` file - [Allowing env expansion and setting it in the promtail conf](grafana/loki#634). You can launch the promtail command with `-config.expand-env` and then set in each scrape jobs: ```yaml labels: host: ${HOSTNAME} ``` This won't work either if you're using `promtail` within a docker as it will give you the ID of the docker - Set it in the `promtail_config_clients` field as `external_labels` of each promtail config: ```yaml promtail_config_clients: - url: "http://{{ loki_url }}:3100/loki/api/v1/push" external_labels: hostname: "{{ ansible_hostname }}" ``` - Hardcode it for each promtail config scraping config as static labels. If you're using ansible or any deployment method that supports jinja expansion set it that way ```yaml labels: host: {{ ansible_hostname }} ``` fix(roadmap_adjustment): Change the concept of `Task` for `Action` To remove the capitalist productive mindset from the concept fix(roadmap_adjustment#Action cleaning): Action cleaning Marking steps as done make help you get an idea of the evolution of the action. It can also be useful if you want to do some kind of reporting. On the other hand, having a long list of done steps (specially if you have many levels of step indentation may make the finding of the next actionable step difficult. It's a good idea then to often clean up all done items. - For non recurring actions use the `LOGBOOK` to move the done steps. for example: ```orgmode ** DOING Do X :LOGBOOK: - [x] Done step 1 - [-] Doing step 2 - [x] Done substep 1 :END: - [-] Doing step 2 - [ ] substep 2 ``` This way the `LOGBOOK` will be automatically folded so you won't see the progress but it's at hand in case you need it. - For recurring actions: - Mark the steps as done - Archive the todo element. - Undo the archive. - Clean up the done items. This way you have a snapshot of the state of the action in your archive. feat(roadmap_adjustment#Project cleaning): Project cleaning Similar to [action cleaning](#action-cleaning) we want to keep the state clean. If there are not that many actions under the project we can leave the done elements as `DONE`, once they start to get clobbered up we can create a `Closed` section. For recurring projects: - Mark the actions as done - Archive the project element. - Undo the archive. - Clean up the done items. feat(vim_autosave): Manually toggle the autosave function Besides running auto-save at startup (if you have `enabled = true` in your config), you may as well: - `ASToggle`: toggle auto-save
Is your feature request related to a problem? Please describe.
I was trying to use promtail to collect application logs from some hosts. I find filename is one default labels which is useful, however it's still hard to distinguish from hosts.
One solution is to add this label in promtail config as below. However it's not efficient for adding different hostname for tons of hosts. So I was thinking if we could add hostname as one default label like filename since it's really a common need. Or at least we can have one switch config to have this, which will be more efficient for this use cases (non-kubenates use case)
Describe the solution you'd like
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: