opensearch-project · YANG-DB · Apr 5, 2023 · Apr 5, 2023
@@ -0,0 +1,113 @@
+# Definitions
+
+## Bundle
+
+An OpenSearch Integration Bundle may contain the following:
+ - dashboards
+ - visualisations
+ - configurations 
+These bundle assets are designed to assist monitor of logs and metrics for a particular resource (device, network element, service ) or group of related resources, such as “Nginx”, or “System”.
+
+---
+
+The Bundle consists of:
+
+* Version 
+* Metadata configuration file 
+* Dashboards and visualisations and Notebooks
+* Data stream index templates used for the signal's ingestion
+* Documentation & information
+
+
+## Integration
+
+An integration is a type of _bundle_ defining data-streams for ingetion of a resource observed signals using logs, metrics, and traces.
+
+### Structure
+As mentioned above, integration is a collection of elements that formulate how to observe a specific data emitting resource - in our case a telemetry data producer.
+
+A typical Observability Integration consists of the following parts:
+
+***Metadata***
+
+    * Observability data producer resource
+    * Supplement Indices (mapping & naming)
+    * Collection Agent Version
+    * Transformation schema 
+    * Optional test harnesses repository
+    * Verified version and documentation 
+    * Category & classification (logs/traces/alerts/metrics)
+
+***Display components***
+
+    * Dashboards 
+    * Maps
+    * Applications
+    * Notebooks
+    * Operations Panels
+    * Saved PPL/SQL/DQL Queries
+    * Alerts
+
+Since the structured data has an enormous contribution to the understanding of the system behaviour - each resource will define a well-structured mapping it conforms with.
+
+Once input content has form and shape - it can and will be used to calculate and correlate different pieces of data.
+
+The next parts of this document will present **Integrations For Observability** which has a key concept of Observability schema.
+
+It will overview the concepts of observability, will describe the current issues customers are facing with observability and continue to elaborate on how to mitigate them using Integrations and structured schemas.
+
+---
+
+###  Creating An Integration
+
+```yaml
+
+integration-template-name
+    config.json
+    display
+        Application.json
+        Maps.json
+        Dashboard.json
+    stored-queries
+      Query.json
+    transformation-schemas
+      transformation.json
+    samples
+      resource.access logs
+      resource.error logs
+      resource.stats metrics
+      expected_results
+    info  
+      documentation
+      images
+```
+
+**Definitions**
+
+- `config.json`  defines the general configuration for the entire integration component.
+- `display`   this is the folder in which the actual visualization components are stored
+- `queries`   this is the folder in which the actual PPL queries are stored
+- `schemas`     this is the folder in which the schemas are stored - schema for mapping translations or index mapping.
+- `samples`     this folder contains sample logs and translated logs are present
+- `metadata` this folder contains additional metadata definitions such as security and policies
+- `info`           this folder contains documentations, licences and external references
+
+---
+
+#### Config
+
+`Config.json` file includes the following Integration configuration see [NginX config](nginx/config.json)
+
+ Additional information on the config structure see [Structure](docs/Integration-structure.md)  
+
+#### Display:
+
+Visualization contains the relevant visual components associated with this integration.
+
+The visual display component will need to be validated to the schema that it is expected to work on - this may be part of the Integration validation flow...
+
+#### Queries
+
+Queries contains specific PPL queries that precisely demonstrates some common and useful use-case .
+
+
@@ -0,0 +1,42 @@
+{
+  "name": "nginx",
+  "version": {
+    "integ": "0.1.0",
+    "schema": "1.0.0",
+    "resource": "^1.23.0"
+  },
+  "description": "Nginx HTTP server collector",
+  "identification": "instrumentationScope.attributes.identification",
+  "catalog": "observability",
+  "components": [
+    "communication","http"
+  ],
+  "collection":[
+    {
+      "logs": [{
+        "info": "access logs",
+        "input_type":"logfile",
+        "dataset":"nginx.access",
+        "labels" :["nginx","access"]
+      },
+        {
+          "info": "error logs",
+          "input_type":"logfile",
+          "labels" :["nginx","error"],
+          "dataset":"nginx.error"
+        }]
+    },
+    {
+      "metrics": [{
+        "info": "status metrics",
+        "input_type":"metrics",
+        "dataset":"nginx.status",
+        "labels" :["nginx","status"]
+      }]
+    }
+  ],
+  "repo": {
+    "github": "https://github.com/opensearch-project/observability/tree/main/integrarions/nginx"
+  }
+}
+
@@ -0,0 +1,28 @@
+![](nginx.png)
+
+# Nginx Integrations 
+
+## What it Nginx ?
+Nginx is a popular open-source web server software used by millions of websites worldwide. It was developed to address the limitations of Apache, which is another popular web server software. Nginx is known for its high performance, scalability, and reliability, and is widely used as a reverse proxy server, load balancer, and HTTP cache.
+
+One of the primary advantages of Nginx is its ability to handle large numbers of concurrent connections and requests. It uses an event-driven architecture that allows it to handle multiple connections with minimal resources, making it an ideal choice for high-traffic websites. In addition, Nginx can also serve static content very efficiently, which further improves its performance.
+
+Another important feature of Nginx is its ability to act as a reverse proxy server. This means that it can sit in front of web servers and route incoming requests to the appropriate server based on various criteria, such as the URL or the type of request. Reverse proxying can help improve website performance and security by caching static content, load balancing incoming traffic, and providing an additional layer of protection against attacks.
+
+Nginx is also widely used as a load balancer. In this role, it distributes incoming traffic across multiple web servers to improve performance and ensure high availability. Nginx can balance traffic using a variety of algorithms, such as round-robin or least connections, and can also perform health checks to ensure that requests are only sent to healthy servers.
+
+Finally, Nginx is also an effective HTTP cache. By caching frequently accessed content, Nginx can reduce the load on backend servers and improve website performance. Nginx can cache content based on a variety of criteria, such as the URL, response headers, or response body.
+
+## What is An Nginx Integration ?
+As described in the [documentation](../../README.md) Nginx integrations is a bundle of resources, assets and documentations. 
+
+An Integration may have multiple ways of ingesting Observability signals, for example nginx logs may arrive via fluent-bit agent or OTEL-logs collector...
+
+## Which are the Nginx Observability providers ?
+Observability Providers are agents which can collect nginx logs, metrics and traces information, convert them to `sso` observability schema and send them to opensearch observability data-streams.
+
+### Fluent-Bit
+Fluent-bit has a dedicated input plugin for Nginx called `in_tail` which can be used to tail the Nginx access logs and send them to a destination of your choice.
+The in_tail plugin reads log files line by line and sends them to Fluent-bit engine to be processed.
+
+See additional details [here](fluet-bit/README.md).
@@ -0,0 +1,65 @@
+![](fluentbit.png)
+
+## Fluent-bit 
+
+Fluent-bit is a lightweight and flexible data collector and forwarder, designed to handle a large volume of log data in real-time.
+It is an open-source projectpart of the Cloud Native Computing Foundation (CNCF). and has gained popularity among developers for simplicity and ease of use.
+
+Fluent-bit is designed to be lightweight, which means that it has a small footprint and can be installed on resource-constrained environments like embedded systems or containers. It is written in C language, making it fast and efficient, and it has a low memory footprint, which allows it to consume minimal system resources.
+
+Fluent-bit is a versatile tool that can collect data from various sources, including files, standard input, syslog, and TCP/UDP sockets. It also supports parsing different log formats like JSON, Apache, and Syslog. Fluent-bit provides a flexible configuration system that allows users to tailor their log collection needs, which makes it easy to adapt to different use cases.
+
+One of the main advantages of Fluent-bit is its ability to forward log data to various destinations, including Opensearch, InfluxDB, and Kafka. Fluent-bit provides multiple output plugins that allow users to route their log data to different destinations based on their requirements. This feature makes Fluent-bit ideal for distributed systems where log data needs to be collected and centralized in a central repository.
+
+Fluent-bit also provides a powerful filtering mechanism that allows users to manipulate log data in real-time. It supports various filter plugins, including record modifiers, parsers, and field extraction. With these filters, users can parse and enrich log data, extract fields, and modify records before sending them to their destination.
+
+## Setting Up Fluent-bit agent 
+
+For setting up a fluent-bit agent on Nginx, please follow the next instructions
+
+- Install Fluent-bit on the Nginx server. You can download the latest package from the official Fluent-bit website or use your package manager to install it.
+
+- Once Fluent-bit is installed, create a configuration file named fluent-bit.conf in the /etc/fluent-bit/ directory. Add the following configuration to the file:
+
+```text
+[SERVICE]
+    Flush        1
+    Log_Level    info
+    Parsers_File parsers.conf
+
+[Filter]
+    Name    lua
+    Match   *
+    code    function cb_filter(a,b,c)local d={}local e=os.date("!%Y-%m-%dT%H:%M:%S.000Z")d["observerTime"]=e;d["body"]=c.remote.." "..c.host.." "..c.user.." ["..os.date("%d/%b/%Y:%H:%M:%S %z").."] \""..c.method.." "..c.path.." HTTP/1.1\" "..c.code.." "..c.size.." \""..c.referer.."\" \""..c.agent.."\""d["trace_id"]="102981ABCD2901"d["span_id"]="abcdef1010"d["attributes"]={}d["attributes"]["data_stream"]={}d["attributes"]["data_stream"]["dataset"]="nginx.access"d["attributes"]["data_stream"]["namespace"]="production"d["attributes"]["data_stream"]["type"]="logs"d["event"]={}d["event"]["category"]={"web"}d["event"]["name"]="access"d["event"]["domain"]="nginx.access"d["event"]["kind"]="event"d["event"]["result"]="success"d["event"]["type"]={"access"}d["http"]={}d["http"]["request"]={}d["http"]["request"]["method"]=c.method;d["http"]["response"]={}d["http"]["response"]["bytes"]=tonumber(c.size)d["http"]["response"]["status_code"]=c.code;d["http"]["flavor"]="1.1"d["http"]["url"]=c.path;d["communication"]={}d["communication"]["source"]={}d["communication"]["source"]["address"]="127.0.0.1"d["communication"]["source"]["ip"]=c.remote;return 1,b,d end
+    call    cb_filter
+
+[INPUT]
+    Name              tail
+    Path              /var/log/nginx/access.log
+    Tag               nginx.access
+    DB                /var/log/flb_input.access.db
+    Mem_Buf_Limit     5MB
+    Skip_Long_Lines   On
+
+[OUTPUT]
+    Name        opensearch
+    Match       nginx.*
+    Host        <OSS_HOST>
+    Port        <OSS_PORT>
+    Index       sso_nginx-access-%Y.%m.%d
+```
+Here, we specify the input plugin as tail, set the path to the Nginx access log file, and specify a tag to identify the logs in Fluent-bit. We also set some additional parameters such as memory buffer limit and skipping long lines.
+
+For the output, we use the `opensearch` plugin to send the logs to Opensearch. We specify the Opensearch host, port, and index name.
+
+   - Modify the Opensearch host and port in the configuration file to match your Opensearch installation.
+   - Depending on the system where Fluent Bit is installed:
+     - Start the Fluent-bit service by running the following command:
+
+```text
+sudo systemctl start fluent-bit
+```
+- Verify that Fluent-bit is running by checking its status:
+```text
+sudo systemctl status fluent-bit
+```
@@ -0,0 +1,20 @@
+[INPUT]
+    Name              tail
+    Path              /var/log/nginx/access.log
+    Tag               nginx.access
+    DB                /var/log/flb_input.access.db
+    Mem_Buf_Limit     5MB
+    Skip_Long_Lines   On
+
+[Filter]
+    Name    lua
+    Match   *
+    code    function cb_filter(a,b,c)local d={}local e=os.date("!%Y-%m-%dT%H:%M:%S.000Z")d["observerTime"]=e;d["body"]=c.remote.." "..c.host.." "..c.user.." ["..os.date("%d/%b/%Y:%H:%M:%S %z").."] \""..c.method.." "..c.path.." HTTP/1.1\" "..c.code.." "..c.size.." \""..c.referer.."\" \""..c.agent.."\""d["trace_id"]="102981ABCD2901"d["span_id"]="abcdef1010"d["attributes"]={}d["attributes"]["data_stream"]={}d["attributes"]["data_stream"]["dataset"]="nginx.access"d["attributes"]["data_stream"]["namespace"]="production"d["attributes"]["data_stream"]["type"]="logs"d["event"]={}d["event"]["category"]={"web"}d["event"]["name"]="access"d["event"]["domain"]="nginx.access"d["event"]["kind"]="event"d["event"]["result"]="success"d["event"]["type"]={"access"}d["http"]={}d["http"]["request"]={}d["http"]["request"]["method"]=c.method;d["http"]["response"]={}d["http"]["response"]["bytes"]=tonumber(c.size)d["http"]["response"]["status_code"]=c.code;d["http"]["flavor"]="1.1"d["http"]["url"]=c.path;d["communication"]={}d["communication"]["source"]={}d["communication"]["source"]["address"]="127.0.0.1"d["communication"]["source"]["ip"]=c.remote;return 1,b,d end
+    call    cb_filter
+
+[OUTPUT]
+    Name        os
+    Match       nginx.*
+    Host        <OSS_HOST>
+    Port        <OSS_PORT>
+    Index       sso_nginx-access-%Y.%m.%d
@@ -0,0 +1,8 @@
+# Samples
+The sample folder contains any type of sampled data that explains and demonstrates the expected input signals.
+
+Specifically this folder contains two inner folder 
+ - **preloaded** containing a ready-made nginx access logs with detailed instructions on how to load them into the appropriate opensearch data-stream.
+ - **results** a folder containing the expected json structure that conforms to `sso` simple schema for logs in opensearch. 
+
+Any other internal folder can be added that represents additional aspects of this integration expected ingesting content.
@@ -0,0 +1,36 @@
+# Nginx Dashboard Playground
+For the purpose of playing and reviewing the nginx dashboard, this tutorial uses the nginx preloaded access-logs data. This sample data was generated using nginx fluent-bit data generator repo and translated it using the
+fluent-bit nginx lua parser - that appears in the test mention below.
+- [Fluent-bit](https://github.com/fluent/fluent-bit)
+- [Services Playground](../../test/README.md)
+
+The [sample logs](bulk_logs.json) are added here under the preloaded data folder and are ready to be ingested into open search.
+
+## Demo Instructions
+
+1. Start docker-compose docker compose up --build.
+This will load both opensearch server & dashboards   
+   - `$ docker compose up`
+   - Ensure vm.max_map_count has been set to 262144 or higher (`sudo sysctl -w vm.max_map_count=262144`).
+
+2. Load the Simple Schema Logs index templates [Loading Logs](../../../../schema/observability/logs/Usage.md)
+
+   - `curl -XPUT localhost:9200/_component_template/http_template  -H "Content-Type: application/json" --data-binary @http.mapping`
+
+   - `curl -XPUT localhost:9200/_component_template/communication_template  -H "Content-Type: application/json" --data-binary @communication.mapping`
+
+   - `curl -XPUT localhost:9200/_index_template/logs  -H "Content-Type: application/json" --data-binary @logs.mapping`
+3. Bulk load the Nginx access logs preloaded data into the `sso_logs-nginx-prod` data_stream
+   - `curl -XPOST "localhost:9200/sso_logs-nginx-prod/_bulk?pretty&refresh" -H "Content-Type: application/json" --data-binary @bulk_logs.json`
+
+4. We can now load the Nginx dashboards to display the preloaded nginx access logs [dashboards](../../assets/display/sso-logs-dashboard-new.ndjson)
+   - First add an index pattern `sso_logs-*-*`
+     - `curl  -X POST localhost:5601/api/saved_objects/index-pattern/sso_logs -H 'osd-xsrf: true'  -H 'Content-Type: application/json' -d '{ "attributes": { "title": "sso_logs-*-*",  "timeFieldName": "@timestamp" } }'`
+
+   - Load the [dashboards](../../assets/display/sso-logs-dashboard-new.ndjson) 
+     - `curl -X POST "localhost:5601/api/saved_objects/_import?overwrite=true" -H "osd-xsrf: true" --form [email protected]`
+5. Open the dashboard and view the preloaded access logs
+   - Go to [Dashbords](http://localhost:5601/app/dashboards#/list?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'2023-02-24T17:10:34.442Z',to:'2023-02-24T17:46:44.056Z'))
+   - data-stream name :`sso_logs-nginx-prod`
+
+   ![](img/nginx-dashboard.png)