Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat: add TLS + pipeline options to checkpoint module #18116

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
44,938 changes: 33,205 additions & 11,733 deletions filebeat/docs/fields.asciidoc

Large diffs are not rendered by default.

152 changes: 152 additions & 0 deletions filebeat/docs/inputs/input-aws-s3.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
[role="xpack"]

:libbeat-xpack-dir: ../../../../x-pack/libbeat

:type: s3

[id="{beatname_lc}-input-{type}"]
=== s3 input

++++
<titleabbrev>s3</titleabbrev>
++++

beta[]

Use the `s3` input to retrieve logs from S3 objects that are pointed by messages
from specific SQS queues. This input can, for example, be used to receive S3
server access logs to monitor detailed records for the requests that are made to
a bucket.

When processing a s3 object which pointed by a sqs message, if half of the set
visibility timeout passed and the processing is still ongoing, then the
visibility timeout of that sqs message will be reset to make sure the message
does not go back to the queue in the middle of the processing. If there are
errors happening during the processing of the s3 object, then the process will be
stopped and the sqs message will be returned back to the queue.

["source","yaml",subs="attributes"]
----
{beatname_lc}.inputs:
- type: s3
queue_url: https://sqs.ap-southeast-1.amazonaws.com/1234/test-s3-queue
credential_profile_name: elastic-beats
expand_event_list_from_field: Records
----

The `s3` input supports the following configuration options plus the
<<{beatname_lc}-input-{type}-common-options>> described later.

[float]
==== `queue_url`

URL of the AWS SQS queue that messages will be received from. Required.

[float]
==== `visibility_timeout`

The duration that the received messages are hidden from subsequent
retrieve requests after being retrieved by a ReceiveMessage request.
This value needs to be a lot bigger than {beatname_uc} collection frequency so
if it took too long to read the s3 log, this sqs message will not be reprocessed.
The default visibility timeout for a message is 300 seconds. The minimum
is 0 seconds. The maximum is 12 hours.

[float]
==== `expand_event_list_from_field`

If the fileset using this input expects to receive multiple messages bundled
under a specific field then the config option expand_event_list_from_field value
can be assigned the name of the field. This setting will be able to split the
messages under the group value into separate events. For example, CloudTrail logs
are in JSON format and events are found under the JSON object "Records".

Note: When `expand_event_list_from_field` parameter is given in the config, s3
input will assume the logs are in JSON format and decode them as JSON. Content
type will not be checked.
If a file has "application/json" content-type, `expand_event_list_from_field`
becomes required to read the json file.

[float]
==== `api_timeout`

The maximum duration of AWS API can take. If it exceeds the timeout, AWS API
will be interrupted.
The default AWS API timeout for a message is 120 seconds. The minimum
is 0 seconds. The maximum is half of the visibility timeout value.

["source","json"]
----
{
"Records": [
{
"eventVersion": "1.07",
"eventTime": "2019-11-14T00:51:00Z",
"awsRegion": "us-east-1",
"eventID": "EXAMPLE8-9621-4d00-b913-beca2EXAMPLE",
...
},
{
"eventVersion": "1.07",
"eventTime": "2019-11-14T00:52:00Z",
"awsRegion": "us-east-1",
"eventID": "EXAMPLEc-28be-486c-8928-49ce6EXAMPLE",
...
}
]
}
```
----

[float]
==== `aws credentials`

In order to make AWS API calls, `s3` input requires AWS credentials.Please see
<<aws-credentials-config,AWS credentials options>> for more details.

[float]
=== AWS Permissions
Specific AWS permissions are required for IAM user to access SQS and S3:
----
s3:GetObject
sqs:ReceiveMessage
sqs:ChangeMessageVisibility
sqs:DeleteMessage
----

[float]
=== S3 and SQS setup
Enable bucket notification: any new object creation in S3 bucket will also
create a notification through SQS. Please see
https://docs.aws.amazon.com/AmazonS3/latest/dev/ways-to-add-notification-config-to-bucket.html#step1-create-sqs-queue-for-notification[create-sqs-queue-for-notification]
for more details.

[float]
=== Parallel Processing
Multiple Filebeat instances can read from the same SQS queues at the same time.
To horizontally scale processing when there are large amounts of log data
flowing into an S3 bucket, you can run multiple {beatname_uc} instances that
read from the same SQS queues at the same time. No additional configuration is
required.

Using SQS ensures that each message in the queue is processed only once
even when multiple {beatname_uc} instances are running in parallel. To prevent
{beatname_uc} from receiving and processing the message more than once, set the
visibility timeout.

The visibility timeout begins when SQS returns a message to Filebeat.
During this time, Filebeat processes and deletes the message. However, if
Filebeat fails before deleting the message and your system doesn't call the
DeleteMessage action for that message before the visibility timeout expires, the
message becomes visible to other {beatname_uc} instances, and the message is
received again. By default, the visibility timeout is set to 5 minutes for s3
input in {beatname_uc}. 5 minutes is sufficient time for {beatname_uc} to read
SQS messages and process related s3 log files.

[id="{beatname_lc}-input-{type}-common-options"]
include::../../../../filebeat/docs/inputs/input-common-options.asciidoc[]

[id="aws-credentials-config"]
include::{libbeat-xpack-dir}/docs/aws-credentials-config.asciidoc[]

:type!:
74 changes: 74 additions & 0 deletions filebeat/docs/inputs/input-azure-eventhub.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
[role="xpack"]

:type: azure-eventhub

[id="{beatname_lc}-input-{type}"]
=== Azure eventhub input

++++
<titleabbrev>Azure eventhub</titleabbrev>
++++

Users can make use of the `azure-eventhub` input in order to read messages from an azure eventhub.
The azure-eventhub input implementation is based on the the event processor host (EPH is intended to be run across multiple processes and machines while load balancing message consumers more on this here https://github.com/Azure/azure-event-hubs-go#event-processor-host, https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-event-processor-host).
State such as leases on partitions and checkpoints in the event stream are shared between receivers using an Azure Storage container. For this reason, as a prerequisite to using this input, users will have to create or use an existing storage account.




Example configuration:

["source","yaml",subs="attributes"]
----
{beatname_lc}.inputs:
- type: azure-eventhub
eventhub: "insights-operational-logs"
consumer_group: "test"
connection_string: "Endpoint=sb://....."
storage_account: "azureeph"
storage_account_key: "....."
storage_account_container: ""
resource_manager_endpoint: ""

----

==== Configuration options

The `azure-eventhub` input supports the following configuration:

==== `eventhub`

The name of the eventhub users would like to read from, field required.

==== `consumer_group`

Optional, we recommend using a dedicated consumer group for the azure input. Reusing consumer groups among non-related consumers can cause unexpected behavior and possibly lost events.

==== `connection_string`

The connection string required to communicate with Event Hubs, steps here https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-get-connection-string.

A Blob Storage account is required in order to store/retrieve/update the offset or state of the eventhub messages. This means that after stopping filebeat it can start back up at the spot that it stopped processing messages.

==== `storage_account`

The name of the storage account. Required.

==== `storage_account_key`

The storage account key, this key will be used to authorize access to data in your storage account, option is required.

==== `storage_account_container`

Optional, the name of the storage account container you would like to store the offset information in.

==== `resource_manager_endpoint`

Optional, by default we are using the azure public environment, to override, users can provide a specific resource manager endpoint in order to use a different azure environment.
Ex:
https://management.chinacloudapi.cn/ for azure ChinaCloud
https://management.microsoftazure.de/ for azure GermanCloud
https://management.azure.com/ for azure PublicCloud
https://management.usgovcloudapi.net/ for azure USGovernmentCloud
Users can also use this in case of a Hybrid Cloud model, where one may define their own endpoints.

93 changes: 93 additions & 0 deletions filebeat/docs/inputs/input-cloudfoundry.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
[role="xpack"]

:type: cloudfoundry

[id="{beatname_lc}-input-{type}"]
=== Cloud Foundry input

++++
<titleabbrev>Cloud Foundry</titleabbrev>
++++

beta[]

Use the `cloudfoundry` input to get http access logs, container logs and error logs from Cloud Foundry. Connects to
the Cloud Foundry loggregator to receive events.

Example configurations:

["source","yaml",subs="attributes"]
----
{beatname_lc}.inputs:
- type: cloudfoundry
api_address: https://api.dev.cfdev.sh
client_id: uaa-filebeat
client_secret: verysecret
ssl:
verification_mode: none
----

["source","yaml",subs="attributes"]
----
{beatname_lc}.inputs:
- type: cloudfoundry
api_address: https://api.dev.cfdev.sh
client_id: uaa-filebeat
client_secret: verysecret
ssl.certificate_authorities: ["/etc/pki/cf/ca.pem"]
ssl.certificate: "/etc/pki/cf/cert.pem"
ssl.key: "/etc/pki/cf/cert.key"

----


==== Configuration options

The `cloudfoundry` input supports the following configuration options plus the
<<{beatname_lc}-input-{type}-common-options>> described later.

[float]
==== `api_address`

The URL of the Cloud Foundry API. Optional. Default: "http://api.bosh-lite.com".

[float]
==== `doppler_address`

The URL of the Cloud Foundry Doppler Websocket. Optional. Default: "(value from ${api_address}/v2/info)".

[float]
==== `uaa_address`

The URL of the Cloud Foundry UAA API. Optional. Default: "(value from ${api_address}/v2/info)".

[float]
==== `rlp_address`

The URL of the Cloud Foundry RLP Gateway. Optional. Default: "(value from ${api_address}/v2/info)".

[float]
==== `client_id`

Client ID to authenticate with Cloud Foundry. Default: "".

[float]
==== `client_secret`

Client Secret to authenticate with Cloud Foundry. Default: "".

[float]
==== `shard_id`

Shard ID for connection to the RLP Gateway. Use the same ID across multiple {beatname_lc} to shard the load of events
from the RLP Gateway. Default: "(generated UUID)".

[float]
==== `ssl`

This specifies SSL/TLS common config. Default: not used.

[id="{beatname_lc}-input-{type}-common-options"]
include::../../../../filebeat/docs/inputs/input-common-options.asciidoc[]

:type!:
Loading