Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hostname as default labels for promtail #634

Closed
mizeng opened this issue May 30, 2019 · 13 comments
Closed

Add hostname as default labels for promtail #634

mizeng opened this issue May 30, 2019 · 13 comments

Comments

@mizeng
Copy link
Contributor

mizeng commented May 30, 2019

Is your feature request related to a problem? Please describe.
I was trying to use promtail to collect application logs from some hosts. I find filename is one default labels which is useful, however it's still hard to distinguish from hosts.

One solution is to add this label in promtail config as below. However it's not efficient for adding different hostname for tons of hosts. So I was thinking if we could add hostname as one default label like filename since it's really a common need. Or at least we can have one switch config to have this, which will be more efficient for this use cases (non-kubenates use case)

scrape_configs:
- job_name: system
  entry_parser: raw
  static_configs:
  - targets:
      - localhost
    labels:
      job: varlogs
     hostname: host1
      __path__: /var/log/*log

Describe the solution you'd like

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@slim-bean
Copy link
Collaborator

This should be possible already using command line flags with PR #510

@mizeng
Copy link
Contributor Author

mizeng commented May 31, 2019

@slim-bean this seems not working. I've tried ./promtail --config.file=./docker-config.yaml --client.external-labels=host=${HOSTNAME}, but no host labels showed up in Loki.
even ./promtail --config.file=./docker-config.yaml --client.external-labels=hostname=host1 does not work.

@mizeng
Copy link
Contributor Author

mizeng commented May 31, 2019

looks like ClientConfig and ClientConfigs are mis-used for promtail Config. The Command line configs including ExternalLabels are in ClientConfig, however Promtail only used ClientConfigs.

@mizeng
Copy link
Contributor Author

mizeng commented May 31, 2019

I would like to create a new issue.

@Mike7518
Copy link

For people trying to add hostname as a label, you can use the flag -config.expand-env in the , then env variables in the promtail config with the following syntax :

labels:
      host: ${HOSTNAME}

@fanuch
Copy link

fanuch commented Dec 7, 2022

To clarify what was written above, you need to use the following as a command line parameter:

-config.expand-env=true

As is documented here:
https://grafana.com/docs/loki/v2.7.x/configuration/#use-environment-variables-in-the-configuration

@frjaraur
Copy link

As far as I tested, this doesn't work using systemd unit definition for starting promtail as HOSTNAME environment variable is empty (at least in my servers). I solved using %H (systemd hostname variable) as HOSTNAME variable. This makes possible to expand this variable in promtail configuration. Hope this helps someone.

@MrZXR
Copy link

MrZXR commented Dec 27, 2022

For people trying to only add hostname as a label, you can just add this as a command line parameter:

-client.external-labels=hostname=$(HOSTNAME)

@northben
Copy link

If you're starting promtail in a systemd unit file, the hostname environment variable is not available by default.

I added this Environment line and it's working:

[Service]
Environment="HOSTNAME=%H"
ExecStart=/opt/promtail/promtail -config.file=/opt/promtail/promtail-host-config.yaml -config.expand-env=true

Thanks everyone!

@slyt
Copy link

slyt commented Jun 15, 2023

If using the Promtail helm chart you can add the following to values.yaml:

extraArgs:
  - -client.external-labels=hostname=$(HOSTNAME)

...which should add the hostname label to all logs

@dreyTee
Copy link

dreyTee commented Sep 25, 2023

I've tried all the above to no avail:

  • -client.external-labels=hostname=$(hostname)
  • -config.expand-env
  • -config.expand-env=true
  • used Environment line in unit file
    All resulting labels were empty.

my promtail version = 2.9.1
OS: linux
Any clues?

@MrAkaki
Copy link

MrAkaki commented Nov 11, 2023

I've tried all the above to no avail:

  • -client.external-labels=hostname=$(hostname)
  • -config.expand-env
  • -config.expand-env=true
  • used Environment line in unit file
    All resulting labels were empty.

my promtail version = 2.9.1 OS: linux Any clues?

I have this on my service file and it works:

[Unit]
Description=Promtail service
After=network.target

[Service]
Type=simple
User=promtail
Environment="HOSTNAME=%H"
ExecStart=/usr/local/bin/promtail-linux-amd64 -config.expand-env=true -config.file /usr/local/bin/config-promtail.yml

[Install]
WantedBy=multi-user.target

lyz-code added a commit to lyz-code/blue-book that referenced this issue Apr 4, 2024
Aleph now exposes prometheus metrics on the port 9100

feat(bash_snippets#Do relative import of a bash library): Do relative import of a bash library

If you want to import a file `lib.sh` that lives in the same directory as the file that is importing it you can use the next snippet:

```bash
source "$(dirname "$(realpath "$0")")/lib.sh"
```

If you use `source ./lib.sh` you will get an import error if you run the script on any other place that is not the directory where `lib.sh` lives.

feat(bash_snippets#Check the battery status): Check the battery status

This [article gives many ways to check the status of a battery](https://www.howtogeek.com/810971/how-to-check-a-linux-laptops-battery-from-the-command-line/), for my purposes the next one is enough

```bash
cat /sys/class/power_supply/BAT0/capacity
```
feat(bash_snippets#Check if file is being sourced): Check if file is being sourced

Assuming that you are running bash, put the following code near the start of the script that you want to be sourced but not executed:

```bash
if [ "${BASH_SOURCE[0]}" -ef "$0" ]
then
    echo "Hey, you should source this script, not execute it!"
    exit 1
fi
```

Under bash, `${BASH_SOURCE[0]}` will contain the name of the current file that the shell is reading regardless of whether it is being sourced or executed.

By contrast, `$0` is the name of the current file being executed.

`-ef` tests if these two files are the same file. If they are, we alert the user and exit.

Neither `-ef` nor `BASH_SOURCE` are POSIX. While `-ef` is supported by ksh, yash, zsh and Dash, BASH_SOURCE requires bash. In zsh, however, `${BASH_SOURCE[0]}` could be replaced by `${(%):-%N}`.

feat(bash_snippets#Parsing bash arguments): Parsing bash arguments

Long story short, it's nasty, think of using a python script with [typer](typer.md) instead.

There are some possibilities to do this:

- [The old getops](https://www.baeldung.com/linux/bash-parse-command-line-arguments)
- [argbash](https://github.com/matejak/argbash) library
- [Build your own parser](https://medium.com/@Drew_Stokes/bash-argument-parsing-54f3b81a6a8f)

ci: also commit the not by ai badge in the CI

fix(alertmanager): Add another source on how to silence alerts

If previous guidelines don't work for you, you can use the [sleep peacefully guidelines](https://samber.github.io/awesome-prometheus-alerts/sleep-peacefully) to tackle it at query level.

feat(documentation#references): Add diátaxis as documentation writing guideline

[Diátaxis: A systematic approach to technical documentation authoring](https://diataxis.fr/)

feat(ecc): Check if system is actually using ECC

Another way is to run `dmidecode`. For ECC support you'll see:
```bash
$: dmidecode -t memory | grep ECC
  Error Correction Type: Single-bit ECC
  # or
  Error Correction Type: Multi-bit ECC
```

No ECC:

```bash
$: dmidecode -t memory | grep ECC
  Error Correction Type: None
```

You can also test it with [`rasdaemon`](rasdaemon.md)

feat(faster#Prometheus metrics): Prometheus metrics

Use [`prometheus-fastapi-instrumentator`](https://github.com/trallnag/prometheus-fastapi-instrumentator)

feat(privileges#Videos): Add nice video on male privileges

[La intuición femenina, gracias al lenguaje](https://twitter.com/almuariza/status/1772889815131807765?t=HH1W17VGbQ7K-_XmoCy_SQ&s=19)

feat(ffmpeg#Reduce the video size): Reduce the video size

If you don't mind using `H.265` replace the libx264 codec with libx265, and push the compression lever further by increasing the CRF value — add, say, 4 or 6, since a reasonable range for H.265 may be 24 to 30. Note that lower CRF values correspond to higher bitrates, and hence produce higher quality videos.

```bash
ffmpeg -i input.mp4 -vcodec libx265 -crf 28 output.mp4
```

If you want to stick to H.264 reduce the bitrate. You can check the current one with `ffprobe input.mkv`. Once you've chosen the new rate change it with:

```bash
ffmpeg -i input.mp4 -b 3000k output.mp4
```

Additional options that might be worth considering is setting the Constant Rate Factor, which lowers the average bit rate, but retains better quality. Vary the CRF between around 18 and 24 — the lower, the higher the bitrate.

```bash
ffmpeg -i input.mp4 -vcodec libx264 -crf 20 output.mp4
```

feat(icsx5): Introduce ICSx5

[ICSx5](https://f-droid.org/packages/at.bitfire.icsdroid/) is an android app to sync calendars.

**References**

- [Source](https://github.com/bitfireAT/icsx5)
- [F-droid](https://f-droid.org/packages/at.bitfire.icsdroid/)

feat(haproxy#Automatically ban offending traffic): Automatically ban offending traffic

Check these two posts:

- https://serverfault.com/questions/853806/blocking-ips-in-haproxy
- https://www.loadbalancer.org/blog/simple-denial-of-service-dos-attack-mitigation-using-haproxy-2/

feat(haproxy#Configure haproxy logs to be sent to loki): Configure haproxy logs to be sent to loki

In the `fronted` config add the next line:

```
  # For more options look at https://www.chrisk.de/blog/2023/06/haproxy-syslog-promtail-loki-grafana-logfmt/
  log-format 'client_ip=%ci client_port=%cp frontend_name=%f backend_name=%b server_name=%s performance_metrics=%TR/%Tw/%Tc/%Tr/%Ta status_code=%ST bytes_read=%B termination_state=%tsc haproxy_metrics=%ac/%fc/%bc/%sc/%rc srv_queue=%sq  backend_queue=%bq user_agent=%{+Q}[capture.req.hdr(0)] http_hostname=%{+Q}[capture.req.hdr(1)] http_version=%HV http_method=%HM http_request_uri="%HU"'
```

At the bottom of [chrisk post](https://www.chrisk.de/blog/2023/06/haproxy-syslog-promtail-loki-grafana-logfmt/) is a table with all the available fields.

[Programming VIP also has an interesting post](https://programming.vip/docs/loki-configures-the-collection-of-haproxy-logs.html).

feat(haproxy#Reload haproxy): Reload haproxy

- Check the config is alright
  ```bash

  service haproxy configtest
  # Or
  /usr/sbin/haproxy -c -V -f /etc/haproxy/haproxy.cfg
  ```
- Reload the service
  ```bash
  service haproxy reload
  ```

If you want to do a better reload you can [drop the SYN before a restart](https://serverfault.com/questions/580595/haproxy-graceful-reload-with-zero-packet-loss), so that clients will
resend this SYN until it reaches the new process.

```bash
iptables -I INPUT -p tcp --dport 80,443 --syn -j DROP
sleep 1
service haproxy reload
iptables -D INPUT -p tcp --dport 80,443 --syn -j DROP
service haproxy reload
```

feat(linux_snippets#Get info of a mkv file): Get info of a mkv file

```bash
ffprobe file.mkv
```

feat(loki#Alert when query returns no data): Alert when query returns no data

Sometimes the queries you want to alert happen when the return value is NaN or No Data. For example if you want to monitory the happy path by setting an alert if a string is not found in some logs in a period of time.

```logql
count_over_time({filename="/var/log/mail.log"} |= `Mail is sent` [24h]) < 1
```

This won't trigger the alert because the `count_over_time` doesn't return a `0` but a `NaN`. One way to solve it is to use [the `vector(0)`](grafana/loki#7023) operator with [the operation `or on() vector(0)`](https://stackoverflow.com/questions/76489956/how-to-return-a-zero-vector-in-loki-logql-metric-query-when-grouping-is-used-and)
```logql
(count_over_time({filename="/var/log/mail.log"} |= `Mail is sent` [24h]) or on() vector(0)) < 1
```

feat(loki#Monitor loki metrics): Monitor loki metrics

Since Loki reuses the Prometheus code for recording rules and WALs, it also gains all of Prometheus’ observability.

To scrape loki metrics with prometheus add the next snippet to the prometheus configuration:

```yaml
  - job_name: loki
    metrics_path: /metrics
    static_configs:
    - targets:
      - loki:3100
```

This assumes that `loki` is a docker in the same network as `prometheus`.

There are some rules in the [awesome prometheus alerts repo](https://samber.github.io/awesome-prometheus-alerts/rules#loki)

```yaml
---
groups:
- name: Awesome Prometheus loki alert rules
  # https://samber.github.io/awesome-prometheus-alerts/rules#loki
  rules:
  - alert: LokiProcessTooManyRestarts
    expr: changes(process_start_time_seconds{job=~".*loki.*"}[15m]) > 2
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: Loki process too many restarts (instance {{ $labels.instance }})
      description: "A loki process had too many restarts (target {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: LokiRequestErrors
    expr: 100 * sum(rate(loki_request_duration_seconds_count{status_code=~"5.."}[1m])) by (namespace, job, route) / sum(rate(loki_request_duration_seconds_count[1m])) by (namespace, job, route) > 10
    for: 15m
    labels:
      severity: critical
    annotations:
      summary: Loki request errors (instance {{ $labels.instance }})
      description: "The {{ $labels.job }} and {{ $labels.route }} are experiencing errors\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: LokiRequestPanic
    expr: sum(increase(loki_panic_total[10m])) by (namespace, job) > 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: Loki request panic (instance {{ $labels.instance }})
      description: "The {{ $labels.job }} is experiencing {{ printf \"%.2f\" $value }}% increase of panics\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: LokiRequestLatency
    expr: (histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route!~"(?i).*tail.*"}[5m])) by (le)))  > 1
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: Loki request latency (instance {{ $labels.instance }})
      description: "The {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf \"%.2f\" $value }}s 99th percentile latency\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
```

And there are some guidelines on the rest of the metrics in [the grafana documentation](https://grafana.com/docs/loki/latest/operations/observability/)

**[Monitor the ruler](https://grafana.com/docs/loki/latest/operations/recording-rules/)**

Prometheus exposes a number of metrics for its WAL implementation, and these have all been prefixed with `loki_ruler_wal_`.

For example: `prometheus_remote_storage_bytes_total` → `loki_ruler_wal_prometheus_remote_storage_bytes_total`

Additional metrics are exposed, also with the prefix `loki_ruler_wal_`. All per-tenant metrics contain a tenant label, so be aware that cardinality could begin to be a concern if the number of tenants grows sufficiently large.

Some key metrics to note are:

- `loki_ruler_wal_appender_ready`: whether a WAL appender is ready to accept samples (1) or not (0)
- `loki_ruler_wal_prometheus_remote_storage_samples_total`: number of samples sent per tenant to remote storage
- `loki_ruler_wal_prometheus_remote_storage_samples_pending_total`: samples buffered in memory, waiting to be sent to remote storage
- `loki_ruler_wal_prometheus_remote_storage_samples_failed_total`: samples that failed when sent to remote storage
- `loki_ruler_wal_prometheus_remote_storage_samples_dropped_total`: samples dropped by relabel configurations
- `loki_ruler_wal_prometheus_remote_storage_samples_retried_total`: samples re-resent to remote storage
- `loki_ruler_wal_prometheus_remote_storage_highest_timestamp_in_seconds`: highest timestamp of sample appended to WAL
- `loki_ruler_wal_prometheus_remote_storage_queue_highest_sent_timestamp_seconds`: highest timestamp of sample sent to remote storage.

feat(loki#Get a useful Source link in the alertmanager): Get a useful Source link in the alertmanager

[This still doesn't work](grafana/loki#4722). Currently for the ruler `external_url` if you use the URL of your Grafana installation: e.g. `external_url: "https://grafana.example.com"` it creates a Source link in alertmanager similar to https://grafana.example.com/graph?g0.expr=%28sum+by%28thing%29%28count_over_time%28%7Bnamespace%3D%22foo%22%7D+%7C+json+%7C+bar%3D%22maxRetries%22%5B5m%5D%29%29+%3E+0%29&g0.tab=1, which isn't valid.

This url templating (via `/graph?g0.expr=%s&g0.tab=1`) appears to be coming from prometheus. There is not a workaround yet

feat(orgmode#How to deal with recurring tasks that are not yet ready to be acted upon): How to deal with recurring tasks that are not yet ready to be acted upon

By default when you mark a recurrent task as `DONE` it will transition the date (either appointment, `SCHEDULED` or `DEADLINE`) to the next date and change the state to `TODO`. I found it confusing because for me `TODO` actions are the ones that can be acted upon right now. That's why I'm using the next states instead:

- `INACTIVE`: Recurrent task which date is not yet close so you should not take care of it.
- `READY`: Recurrent task which date [is overdue](#how-to-deal-with-overdue-SCHEDULED-and-DEADLINE-tasks), we acknowledge the fact and mark the date as inactive (so that it doesn't clobber the agenda).

The idea is that once an INACTIVE task reaches your agenda, either because the warning days of the `DEADLINE` make it show up, or because it's the `SCHEDULED` date you need to decide whether to change it to `TODO` if it's to be acted upon immediately or to `READY` and deactivate the date.

`INACTIVE` then should be the default state transition for the recurring tasks once you mark it as `DONE`. To do this, set in your config:

```lua
org_todo_repeat_to_state = "INACTIVE",
```

If a project gathers a list of recurrent subprojects or subactions it can have the next states:

- `READY`: If there is at least one subelement in state `READY` and the rest are `INACTIVE`
- `TODO`:  If there is at least one subelement in state `TODO` and the rest may have `READY` or `INACTIVE`
- `INACTIVE`: The project is not planned to be acted upon soon.
- `WAITING`: The project is planned to be acted upon but all its subelements are in `INACTIVE` state.

feat(promtail#Set the hostname label on all logs): Set the hostname label on all logs

There are many ways to do it:

- [Setting the label in the promtail launch command](https://community.grafana.com/t/how-to-add-variable-hostname-label-to-static-config-in-promtail/68352/11)
  ```bash
  sudo ./promtail-linux-amd64 --client.url=http://xxxx:3100/loki/api/v1/push --client.external-labels=hostname=$(hostname) --config.file=./config.yaml
    ```

  This won't work if you're using promtail within a docker-compose because you can't use bash expansion in the `docker-compose.yaml` file
- [Allowing env expansion and setting it in the promtail conf](grafana/loki#634). You can launch the promtail command with `-config.expand-env` and then set in each scrape jobs:
  ```yaml
  labels:
      host: ${HOSTNAME}
  ```
  This won't work either if you're using `promtail` within a docker as it will give you the ID of the docker
- Set it in the `promtail_config_clients` field as `external_labels` of each promtail config:
  ```yaml
  promtail_config_clients:
    - url: "http://{{ loki_url }}:3100/loki/api/v1/push"
      external_labels:
        hostname: "{{ ansible_hostname }}"
  ```
- Hardcode it for each promtail config scraping config as static labels. If you're using ansible or any deployment method that supports jinja expansion set it that way
  ```yaml
  labels:
      host: {{ ansible_hostname }}
  ```

fix(roadmap_adjustment): Change the concept of `Task` for `Action`

To remove the capitalist productive mindset from the concept

fix(roadmap_adjustment#Action cleaning): Action cleaning

Marking steps as done make help you get an idea of the evolution of the action. It can also be useful if you want to do some kind of reporting. On the other hand, having a long list of done steps (specially if you have many levels of step indentation may make the finding of the next actionable step difficult. It's a good idea then to often clean up all done items.

- For non recurring actions use the `LOGBOOK` to move the done steps. for example:
  ```orgmode
  ** DOING Do X
     :LOGBOOK:
     - [x] Done step 1
     - [-] Doing step 2
       - [x] Done substep 1
     :END:
     - [-] Doing step 2
       - [ ] substep 2
  ```

  This way the `LOGBOOK` will be automatically folded so you won't see the progress but it's at hand in case you need it.

- For recurring actions:
  - Mark the steps as done
  - Archive the todo element.
  - Undo the archive.
  - Clean up the done items.

This way you have a snapshot of the state of the action in your archive.

feat(roadmap_adjustment#Project cleaning): Project cleaning

Similar to [action cleaning](#action-cleaning) we want to keep the state clean. If there are not that many actions under the project we can leave the done elements as `DONE`, once they start to get clobbered up we can create a `Closed` section.

For recurring projects:

  - Mark the actions as done
  - Archive the project element.
  - Undo the archive.
  - Clean up the done items.

feat(vim_autosave): Manually toggle the autosave function

Besides running auto-save at startup (if you have `enabled = true` in your config), you may as well:

- `ASToggle`: toggle auto-save
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

11 participants