Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding Nginx Integration #1493

Merged
merged 1 commit into from
Apr 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions integrations/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Definitions

## Bundle

An OpenSearch Integration Bundle may contain the following:
- dashboards
- visualisations
- configurations
These bundle assets are designed to assist monitor of logs and metrics for a particular resource (device, network element, service ) or group of related resources, such as “Nginx”, or “System”.

---

The Bundle consists of:

* Version
* Metadata configuration file
* Dashboards and visualisations and Notebooks
* Data stream index templates used for the signal's ingestion
* Documentation & information


## Integration

An integration is a type of _bundle_ defining data-streams for ingetion of a resource observed signals using logs, metrics, and traces.

### Structure
As mentioned above, integration is a collection of elements that formulate how to observe a specific data emitting resource - in our case a telemetry data producer.

A typical Observability Integration consists of the following parts:

***Metadata***

* Observability data producer resource
* Supplement Indices (mapping & naming)
* Collection Agent Version
* Transformation schema
* Optional test harnesses repository
* Verified version and documentation
* Category & classification (logs/traces/alerts/metrics)

***Display components***

* Dashboards
* Maps
* Applications
* Notebooks
* Operations Panels
* Saved PPL/SQL/DQL Queries
* Alerts

Since the structured data has an enormous contribution to the understanding of the system behaviour - each resource will define a well-structured mapping it conforms with.

Once input content has form and shape - it can and will be used to calculate and correlate different pieces of data.

The next parts of this document will present **Integrations For Observability** which has a key concept of Observability schema.

It will overview the concepts of observability, will describe the current issues customers are facing with observability and continue to elaborate on how to mitigate them using Integrations and structured schemas.

---

### Creating An Integration

```yaml

integration-template-name
config.json
display
Application.json
Maps.json
Dashboard.json
stored-queries
Query.json
transformation-schemas
transformation.json
samples
resource.access logs
resource.error logs
resource.stats metrics
expected_results
info
documentation
images
```

**Definitions**

- `config.json` defines the general configuration for the entire integration component.
- `display` this is the folder in which the actual visualization components are stored
- `queries` this is the folder in which the actual PPL queries are stored
- `schemas` this is the folder in which the schemas are stored - schema for mapping translations or index mapping.
- `samples` this folder contains sample logs and translated logs are present
- `metadata` this folder contains additional metadata definitions such as security and policies
- `info` this folder contains documentations, licences and external references

---

#### Config

`Config.json` file includes the following Integration configuration see [NginX config](nginx/config.json)

Additional information on the config structure see [Structure](docs/Integration-structure.md)

#### Display:

Visualization contains the relevant visual components associated with this integration.

The visual display component will need to be validated to the schema that it is expected to work on - this may be part of the Integration validation flow...

#### Queries

Queries contains specific PPL queries that precisely demonstrates some common and useful use-case .


266 changes: 266 additions & 0 deletions integrations/nginx/assets/display/sso-logs-dashboard-new.ndjson

Large diffs are not rendered by default.

42 changes: 42 additions & 0 deletions integrations/nginx/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
{
"name": "nginx",
"version": {
"integ": "0.1.0",
"schema": "1.0.0",
"resource": "^1.23.0"
},
"description": "Nginx HTTP server collector",
"identification": "instrumentationScope.attributes.identification",
"catalog": "observability",
"components": [
"communication","http"
],
"collection":[
{
"logs": [{
"info": "access logs",
"input_type":"logfile",
"dataset":"nginx.access",
"labels" :["nginx","access"]
},
{
"info": "error logs",
"input_type":"logfile",
"labels" :["nginx","error"],
"dataset":"nginx.error"
}]
},
{
"metrics": [{
"info": "status metrics",
"input_type":"metrics",
"dataset":"nginx.status",
"labels" :["nginx","status"]
}]
}
],
"repo": {
"github": "https://github.com/opensearch-project/observability/tree/main/integrarions/nginx"
}
}

28 changes: 28 additions & 0 deletions integrations/nginx/info/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
![](nginx.png)

# Nginx Integrations

## What it Nginx ?
Nginx is a popular open-source web server software used by millions of websites worldwide. It was developed to address the limitations of Apache, which is another popular web server software. Nginx is known for its high performance, scalability, and reliability, and is widely used as a reverse proxy server, load balancer, and HTTP cache.

One of the primary advantages of Nginx is its ability to handle large numbers of concurrent connections and requests. It uses an event-driven architecture that allows it to handle multiple connections with minimal resources, making it an ideal choice for high-traffic websites. In addition, Nginx can also serve static content very efficiently, which further improves its performance.

Another important feature of Nginx is its ability to act as a reverse proxy server. This means that it can sit in front of web servers and route incoming requests to the appropriate server based on various criteria, such as the URL or the type of request. Reverse proxying can help improve website performance and security by caching static content, load balancing incoming traffic, and providing an additional layer of protection against attacks.

Nginx is also widely used as a load balancer. In this role, it distributes incoming traffic across multiple web servers to improve performance and ensure high availability. Nginx can balance traffic using a variety of algorithms, such as round-robin or least connections, and can also perform health checks to ensure that requests are only sent to healthy servers.

Finally, Nginx is also an effective HTTP cache. By caching frequently accessed content, Nginx can reduce the load on backend servers and improve website performance. Nginx can cache content based on a variety of criteria, such as the URL, response headers, or response body.

## What is An Nginx Integration ?
As described in the [documentation](../../README.md) Nginx integrations is a bundle of resources, assets and documentations.

An Integration may have multiple ways of ingesting Observability signals, for example nginx logs may arrive via fluent-bit agent or OTEL-logs collector...

## Which are the Nginx Observability providers ?
Observability Providers are agents which can collect nginx logs, metrics and traces information, convert them to `sso` observability schema and send them to opensearch observability data-streams.

### Fluent-Bit
Fluent-bit has a dedicated input plugin for Nginx called `in_tail` which can be used to tail the Nginx access logs and send them to a destination of your choice.
The in_tail plugin reads log files line by line and sends them to Fluent-bit engine to be processed.

See additional details [here](fluet-bit/README.md).
65 changes: 65 additions & 0 deletions integrations/nginx/info/fluet-bit/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
![](fluentbit.png)

## Fluent-bit

Fluent-bit is a lightweight and flexible data collector and forwarder, designed to handle a large volume of log data in real-time.
It is an open-source projectpart of the Cloud Native Computing Foundation (CNCF). and has gained popularity among developers for simplicity and ease of use.

Fluent-bit is designed to be lightweight, which means that it has a small footprint and can be installed on resource-constrained environments like embedded systems or containers. It is written in C language, making it fast and efficient, and it has a low memory footprint, which allows it to consume minimal system resources.

Fluent-bit is a versatile tool that can collect data from various sources, including files, standard input, syslog, and TCP/UDP sockets. It also supports parsing different log formats like JSON, Apache, and Syslog. Fluent-bit provides a flexible configuration system that allows users to tailor their log collection needs, which makes it easy to adapt to different use cases.

One of the main advantages of Fluent-bit is its ability to forward log data to various destinations, including Opensearch, InfluxDB, and Kafka. Fluent-bit provides multiple output plugins that allow users to route their log data to different destinations based on their requirements. This feature makes Fluent-bit ideal for distributed systems where log data needs to be collected and centralized in a central repository.

Fluent-bit also provides a powerful filtering mechanism that allows users to manipulate log data in real-time. It supports various filter plugins, including record modifiers, parsers, and field extraction. With these filters, users can parse and enrich log data, extract fields, and modify records before sending them to their destination.

## Setting Up Fluent-bit agent

For setting up a fluent-bit agent on Nginx, please follow the next instructions

- Install Fluent-bit on the Nginx server. You can download the latest package from the official Fluent-bit website or use your package manager to install it.

- Once Fluent-bit is installed, create a configuration file named fluent-bit.conf in the /etc/fluent-bit/ directory. Add the following configuration to the file:

```text
[SERVICE]
Flush 1
Log_Level info
Parsers_File parsers.conf

[Filter]
Name lua
Match *
code function cb_filter(a,b,c)local d={}local e=os.date("!%Y-%m-%dT%H:%M:%S.000Z")d["observerTime"]=e;d["body"]=c.remote.." "..c.host.." "..c.user.." ["..os.date("%d/%b/%Y:%H:%M:%S %z").."] \""..c.method.." "..c.path.." HTTP/1.1\" "..c.code.." "..c.size.." \""..c.referer.."\" \""..c.agent.."\""d["trace_id"]="102981ABCD2901"d["span_id"]="abcdef1010"d["attributes"]={}d["attributes"]["data_stream"]={}d["attributes"]["data_stream"]["dataset"]="nginx.access"d["attributes"]["data_stream"]["namespace"]="production"d["attributes"]["data_stream"]["type"]="logs"d["event"]={}d["event"]["category"]={"web"}d["event"]["name"]="access"d["event"]["domain"]="nginx.access"d["event"]["kind"]="event"d["event"]["result"]="success"d["event"]["type"]={"access"}d["http"]={}d["http"]["request"]={}d["http"]["request"]["method"]=c.method;d["http"]["response"]={}d["http"]["response"]["bytes"]=tonumber(c.size)d["http"]["response"]["status_code"]=c.code;d["http"]["flavor"]="1.1"d["http"]["url"]=c.path;d["communication"]={}d["communication"]["source"]={}d["communication"]["source"]["address"]="127.0.0.1"d["communication"]["source"]["ip"]=c.remote;return 1,b,d end
call cb_filter

[INPUT]
Name tail
Path /var/log/nginx/access.log
Tag nginx.access
DB /var/log/flb_input.access.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On

[OUTPUT]
Name opensearch
Match nginx.*
Host <OSS_HOST>
Port <OSS_PORT>
Index sso_nginx-access-%Y.%m.%d
```
Here, we specify the input plugin as tail, set the path to the Nginx access log file, and specify a tag to identify the logs in Fluent-bit. We also set some additional parameters such as memory buffer limit and skipping long lines.

For the output, we use the `opensearch` plugin to send the logs to Opensearch. We specify the Opensearch host, port, and index name.

- Modify the Opensearch host and port in the configuration file to match your Opensearch installation.
- Depending on the system where Fluent Bit is installed:
- Start the Fluent-bit service by running the following command:

```text
sudo systemctl start fluent-bit
```
- Verify that Fluent-bit is running by checking its status:
```text
sudo systemctl status fluent-bit
```
20 changes: 20 additions & 0 deletions integrations/nginx/info/fluet-bit/fluent-bit.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
[INPUT]
Name tail
Path /var/log/nginx/access.log
Tag nginx.access
DB /var/log/flb_input.access.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On

[Filter]
Name lua
Match *
code function cb_filter(a,b,c)local d={}local e=os.date("!%Y-%m-%dT%H:%M:%S.000Z")d["observerTime"]=e;d["body"]=c.remote.." "..c.host.." "..c.user.." ["..os.date("%d/%b/%Y:%H:%M:%S %z").."] \""..c.method.." "..c.path.." HTTP/1.1\" "..c.code.." "..c.size.." \""..c.referer.."\" \""..c.agent.."\""d["trace_id"]="102981ABCD2901"d["span_id"]="abcdef1010"d["attributes"]={}d["attributes"]["data_stream"]={}d["attributes"]["data_stream"]["dataset"]="nginx.access"d["attributes"]["data_stream"]["namespace"]="production"d["attributes"]["data_stream"]["type"]="logs"d["event"]={}d["event"]["category"]={"web"}d["event"]["name"]="access"d["event"]["domain"]="nginx.access"d["event"]["kind"]="event"d["event"]["result"]="success"d["event"]["type"]={"access"}d["http"]={}d["http"]["request"]={}d["http"]["request"]["method"]=c.method;d["http"]["response"]={}d["http"]["response"]["bytes"]=tonumber(c.size)d["http"]["response"]["status_code"]=c.code;d["http"]["flavor"]="1.1"d["http"]["url"]=c.path;d["communication"]={}d["communication"]["source"]={}d["communication"]["source"]["address"]="127.0.0.1"d["communication"]["source"]["ip"]=c.remote;return 1,b,d end
call cb_filter

[OUTPUT]
Name os
Match nginx.*
Host <OSS_HOST>
Port <OSS_PORT>
Index sso_nginx-access-%Y.%m.%d
Binary file added integrations/nginx/info/fluet-bit/fluentbit.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added integrations/nginx/info/nginx.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions integrations/nginx/samples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Samples
The sample folder contains any type of sampled data that explains and demonstrates the expected input signals.

Specifically this folder contains two inner folder
- **preloaded** containing a ready-made nginx access logs with detailed instructions on how to load them into the appropriate opensearch data-stream.
- **results** a folder containing the expected json structure that conforms to `sso` simple schema for logs in opensearch.

Any other internal folder can be added that represents additional aspects of this integration expected ingesting content.
36 changes: 36 additions & 0 deletions integrations/nginx/samples/preloaded/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Nginx Dashboard Playground
For the purpose of playing and reviewing the nginx dashboard, this tutorial uses the nginx preloaded access-logs data. This sample data was generated using nginx fluent-bit data generator repo and translated it using the
fluent-bit nginx lua parser - that appears in the test mention below.
- [Fluent-bit](https://github.com/fluent/fluent-bit)
- [Services Playground](../../test/README.md)

The [sample logs](bulk_logs.json) are added here under the preloaded data folder and are ready to be ingested into open search.

## Demo Instructions

1. Start docker-compose docker compose up --build.
This will load both opensearch server & dashboards
- `$ docker compose up`
- Ensure vm.max_map_count has been set to 262144 or higher (`sudo sysctl -w vm.max_map_count=262144`).

2. Load the Simple Schema Logs index templates [Loading Logs](../../../../schema/observability/logs/Usage.md)

- `curl -XPUT localhost:9200/_component_template/http_template -H "Content-Type: application/json" --data-binary @http.mapping`

- `curl -XPUT localhost:9200/_component_template/communication_template -H "Content-Type: application/json" --data-binary @communication.mapping`

- `curl -XPUT localhost:9200/_index_template/logs -H "Content-Type: application/json" --data-binary @logs.mapping`
3. Bulk load the Nginx access logs preloaded data into the `sso_logs-nginx-prod` data_stream
- `curl -XPOST "localhost:9200/sso_logs-nginx-prod/_bulk?pretty&refresh" -H "Content-Type: application/json" --data-binary @bulk_logs.json`

4. We can now load the Nginx dashboards to display the preloaded nginx access logs [dashboards](../../assets/display/sso-logs-dashboard-new.ndjson)
- First add an index pattern `sso_logs-*-*`
- `curl -X POST localhost:5601/api/saved_objects/index-pattern/sso_logs -H 'osd-xsrf: true' -H 'Content-Type: application/json' -d '{ "attributes": { "title": "sso_logs-*-*", "timeFieldName": "@timestamp" } }'`

- Load the [dashboards](../../assets/display/sso-logs-dashboard-new.ndjson)
- `curl -X POST "localhost:5601/api/saved_objects/_import?overwrite=true" -H "osd-xsrf: true" --form [email protected]`
5. Open the dashboard and view the preloaded access logs
- Go to [Dashbords](http://localhost:5601/app/dashboards#/list?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'2023-02-24T17:10:34.442Z',to:'2023-02-24T17:46:44.056Z'))
- data-stream name :`sso_logs-nginx-prod`

![](img/nginx-dashboard.png)
Loading