-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS Cloudwatch is indexing data in the wrong data stream #5467
Comments
Assigning to @elastic/obs-cloud-monitoring |
@lucabelluccini not sure how much the use of cloudwatch logs are just a logs "container", where different kinds and sources of logs format can be collected I would expect that users ingest their own apps logs from cloudwatch (like a lambda or anything else) and set the datastream to something specific or in any case add their own ingest pipeline to deal with the specific format of their custom logs moreover: you could have multiple cloudwatch logs, collecting multiple logs types, and you don't them to end up in the same data stream what it has to be fixed here is the following in my opinion:
I will check what are the "ingest pipelines, mappings or settings" for |
Hello, I've never developed a package, but When using a package, except the "custom" family, I've always observed the events going to the data stream named
If this is the way we expect customers to use the
If this is intended, then I would enhance the documentation to explain how the integration should be used and investigate why we have assets for this integration. |
most of the assets are relevant to the integration package on its own, not related to the a template or an ingest pipeline but for https://github.com/elastic/integrations/blob/b72fe2e619dc1fa8f5a6c0731e3661c940c00b1f/packages/aws/data_stream/cloudwatch_logs/elasticsearch/ingest_pipeline/default.yml
it was introduced because you could have multiple policies with the aws cloudwatch logs integration, each of them ingesting logs of different "generic" application, that you want to split on different datastreams. imagine ingesting two different applications logs, one in json format and one not: you will have to apply different processors in ingest pipeline, and the users should not be forced to write in the same data stream for both type of logs
yes, it is intended. I hope it is clearer now. |
@aspacca I think the issue is a bit around consistency. We have quite a few packages which focuses on raw data rather than an out of the box integration (httpjson, tcp, udp, log, s3, gcs, abs and so on) and they all have a few things in common:
There are a few issues with the input packages for cloudwatch (and eventhub), in which they apply some basic pipelines, which in most cases would not be used, and users gets confused when they don't work (and the dataset has been changed). However the biggest issue right now is 2 things:
|
I get your point and it does not invalidate what you mentioned before:
Introducing the s3 input in the package was an erroneous decision that was taken when creating the package in the first place. We realised that and if I remember correctly that input is now deprecated: we plan to totally remove it indeed
could clarify exactly what assets are you referring about (there are different ones)? sharing a link to a sample assets folder is enough for me to do the comparison :) |
If we look at another input package (example TCP): https://github.com/elastic/integrations/tree/main/packages/tcp/data_stream/generic We do not have any ingest pipelines at all, instead we allow the user to configure one here: The result for that is:
In terms of the field mapping, we usually want the user to define them, and unfortunately no workaround for that at the moment, however more and more packages (specially input related ones) are starting to use the dynamic ECS template: #5055 so that at least ECS fields should not need to be mapped manually. |
Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as |
This issue concerns https://docs.elastic.co/integrations/aws/cloudwatch
In particular, the
cloudwatch_logs
.The problem is the Cloudwatch Logs are being indexed in the wrong data stream because of the presence of a wrong
dataset
.See
integrations/packages/aws/data_stream/cloudwatch_logs/manifest.yml
Line 181 in b72fe2e
The consequence of the presence of
data_stream.dataset
set togeneric
is while the assets are installed to work withlogs-aws.cloudwatch_logs-default
data stream, the logs will actually end up inlogs-generic-default
. So no ingest pipelines, mappings or settings of the expected data stream will be used.This problem has been investigated with the help of @P1llus.
As it is a quite used integration, we should warn the users if we revert back.
A possible approach is:
dataset
setting and set it to the correct valueExample rendered policy just saving with default values:
The text was updated successfully, but these errors were encountered: