Add an S3 exporter #2835

rakyll · 2021-03-23T23:14:12Z

Various Otel users want to export raw telemetry data for long-term storage and analysis. We should add an S3 exporter that exports the incoming OTLP data points to sharded S3 objects in a bucket. Sharding can be done per telemetry type (metrics, traces, logs) and time (window to be configured by the user). Also we can consider serializing into Ion at the collector optionally.

(You can assign this issue to me.)

anuraaga · 2021-03-24T00:27:58Z

@rakyll What format are you thinking for storage? There's some discussion on open-telemetry/opentelemetry-specification#1443 but I don't think we have a good format defined for stored traces yet.

bogdandrutu · 2021-03-24T00:39:12Z

This can also be a plugin for fileexporter correct? It is just that you write into a remote "file".

rakyll · 2021-03-25T05:08:19Z

Once open-telemetry/opentelemetry-specification#1443 is addressed, we should follow the specification there. Otherwise, the format needs discussion/proposal.

@bogdandrutu It's a good idea. Given it's vendor-specific, my initial suggestion was a standalone exporter. But, we can ask other vendors to implement support for their services and provide the functionality as a part of the fileexporter.

sirianni · 2021-03-31T15:50:44Z

What format are you thinking for storage?

Should it be a design goal to pick a format (Parquet, ORC, etc.) (or have a pluggable format) that allows for the data to be queried "in-place" by a system like AWS Athena?

emeraldbay · 2021-04-09T22:52:39Z

Given the various data formats we have, can we first start build a few popular format like Parquet/ORC/JSON?

emeraldbay · 2021-05-31T03:41:19Z

Initial proposal for the S3 exporter design requirements:

Exporter feature:
1). support metrics exporter as start, support log and traces later

File output format related feature:
1). Use snappy as basic coding format
2). Support parquet and compressed json format as start, support ORC later
3). Output file format is configurable through exporter config
4). Output schema is configurable through a json config file
5). Support specify the pdata.Metrics fields to output schema mapping through config file

S3 uploader related feature:
1). User should specify the S3 bucket name, S3 key prefix, output file name prefix, time partition granularity (in hour or in minute)
The final S3 key will be in the format:
s3://s3_bucket_name/s3_key_prefix/year=XXX/month=XX/day=XX/hour=XX/{min=XX/}file_name_prefix_{random_id}.file_format
2). Support basic S3 uploader configuration options (PartSize/Concurrency) as specified in https://aws.github.io/aws-sdk-go-v2/docs/sdk-utilities/s3/

jrcamp · 2021-06-14T19:22:59Z

Could this be a generic object storage using a library like https://github.com/chartmuseum/storage (not advocating this particular one, just as an example). Authentication methods may vary across cloud providers so maybe config options may still need to be cloud specific or something like that but using a library that already abstracts across the providers may still make it easier to extend to other providers. It even has one for local usage so makes sense to extend file exporter like @bogdandrutu suggested https://github.com/chartmuseum/storage/blob/main/local.go.

emeraldbay · 2021-06-24T01:13:56Z

Could this be a generic object storage using a library like https://github.com/chartmuseum/storage (not advocating this particular one, just as an example). Authentication methods may vary across cloud providers so maybe config options may still need to be cloud specific or something like that but using a library that already abstracts across the providers may still make it easier to extend to other providers. It even has one for local usage so makes sense to extend file exporter like @bogdandrutu suggested https://github.com/chartmuseum/storage/blob/main/local.go.

But I think this exporter is specific to S3

emeraldbay · 2021-06-24T01:14:33Z

Initial code to kick off the discussion

https://github.com/emeraldbay/opentelemetry-collector-contrib/commit/faf3268e9bbad8973e776e75685d0c261ae02f5e

jrcamp · 2021-06-24T15:55:06Z

@emeraldbay why must it be s3 specific?

Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.19.0 to 0.20.0. - [Release notes](https://github.com/prometheus/common/releases) - [Commits](prometheus/common@v0.19.0...v0.20.0) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

knvpk · 2021-10-21T04:32:54Z

This feature is very nice, so that we can archive so much of instrumentation data but still queryable.

knvpk · 2021-12-11T02:48:09Z

what is the progress on this feature?

atoulme · 2021-12-14T06:58:10Z

Hey folks, I added a file exporter here: #6712
This is just outputting pipeline data as JSON.

atoulme · 2021-12-14T07:00:10Z

Because this format is not well suited for s3, I have started working on Parquet support for OpenTelemetry.
Here is a draft PR: open-telemetry/opentelemetry-proto#346

Once OpenTelemetry data can be serialized in Parquet support, we can create a receiver and exporter for Parquet files.

jpkrohling · 2021-12-14T09:42:23Z

I'm open to being a sponsor for the Parquet components on contrib. I think it would be a great addition.

atoulme · 2021-12-18T00:47:52Z

That is great to hear! I'll try and stick to the approach of creating the component structure first. @jpkrohling please see here: #6903

atoulme · 2022-02-11T18:35:41Z

Folks, open-telemetry/opentelemetry-proto#346 is ready for review. It is coming together. It is not a final Parquet mapping by any means but it gets us to a suitable support to experiment.

jmcarp · 2022-03-15T22:13:42Z

Is the idea that the stub parquet exporter in #6903 would eventually learn more formats and destinations, as in #6903 (comment)? Or would it be simpler to have one exporter to export text files to disk, another to export parquet files to disk, another to export parquet/json files to s3, etc.? I would be interested in exporting to both s3 and gcs, and happy to write a patch if it would be helpful.

pdelewski · 2022-04-26T07:11:36Z

Recently, I started working on s3 exporter using this work as a baseline. Is anyone working on that actively?

jpkrohling · 2022-04-26T12:02:55Z

I don't think anyone is working on this. If you decide to work on this, make sure you have a sponsor first, in order for this component to be accepted as part of the contrib repository.

jmacd · 2022-04-26T16:05:53Z

👋 I have a personal interest in seeing OTC be able to write telemetry to S3. I feel that not many vendors have this interest, but for a user who wants to store years worth of telemetry and cares about costs, S3 looks like a good path forward.

atoulme · 2022-04-26T22:15:43Z

I work on a parquet exporter, but no time recently.

pmm-sumo · 2022-04-27T08:37:39Z

I volunteer to be a sponsor of the initiative. I think parquet exporter would be very useful and it's great to see the work started, though it's probably a somewhat separate item (essentially serialization mechanism). So I think it might be e.g. in some common package and then referenced by fileexporter and s3 exporter.

nerochiaro · 2022-12-13T17:22:49Z

I know this component is still marked in-development and not officially in contrib.

But to what extent can it already be used ? Would using it risk crashing the entire collector or at worse it won't do its job but leave everything else functioning ?

pdelewski · 2022-12-13T17:31:05Z

Component is in alfa status, however it should be pretty stable. To continue work on that we need a sponsor as mentioned above.

MovieStoreGuy · 2023-01-26T06:58:30Z

Hi all,

First of all, thank you for everyone being interested in seeing this implemented.
I want to ask if there was a use case for a dedicated aws s3 exporter? Or would users be okay if this was an extension to the file exporter?

The latter has the advantage that it can be implemented sooner and be rather opinionated on how to handle S3 issues / format.
The former means that other components could benefit from it by making it an extension?

I am happy to sponsor either, I would prefer the extension approach however, it depends on what everyone involved prefers?

MovieStoreGuy · 2023-01-27T10:45:03Z

It looks like an earlier discussion leaned towards an abstraction for the fileexporter.

#1963

pdelewski · 2023-02-01T09:29:40Z

@MovieStoreGuy Just to clarify, in the case of fileexporter, s3 storage would be treated as a kind of remote file system, right? Does fileexporter have similar extensions already?

guyfrid · 2023-02-07T09:17:01Z

First of all, thank you for everyone being interested in seeing this implemented.
I want to ask if there was a use case for a dedicated aws s3 exporter? Or would users be okay if this was an extension to the file exporter?

Hi @MovieStoreGuy! we also have an interest in this exporter.
our use case is to export and store spans to s3 bucket from a collector running locally inside the lambda handler.
I think(please correct me if i'm wrong) Both approaches can work for this use case.

Is there any active work ongoing on this?

pdelewski · 2023-02-07T09:19:16Z

@guyfrid Work stopped due to lack of sponsor

erez111 · 2023-02-15T16:38:50Z

I'm looking also for S3 exporter.

Having difficulties using otlp with http+json protocol as metrics exporter.

I thought using S3 restful api directly using http+json can be a quick alternative.

All I can send is http+protobuf or grpc (which isn't supported by s3).

AIs someone implemented it and can sens a config.yaml file or investigate it as well. Thanks guys for this useful thread

philipcwhite · 2023-02-24T20:04:21Z

I'd like to see this as well. We have a lot of data we ship to S3. I think it would be useful. Thanks

atoulme · 2023-03-09T00:29:45Z

OK, I can sponsor this.

rakyll · 2023-03-23T00:34:40Z

If anyone wants to work on this, they should feel free to take it. Unfortunately I'm working on something else nowadays.

atoulme · 2023-03-23T03:14:32Z

Thank you for your help! We will get it done.

sudopras · 2023-03-30T02:39:49Z

Hi, @atoulme would like to know if you need any help with this. Interested in contributing and sponsoring (if required)

atoulme · 2023-03-30T04:00:19Z

No need for a sponsor, this PR is the latest:
#9979

Once it's in, we have to get the impl right.

akshadha22 · 2023-04-07T19:34:23Z

I was unable to use awss3 exporter with the latest release because of the error:

error decoding 'exporters': unknown type: "awss3" for id: "awss3" (valid values:
[alibabacloud_logservice clickhouse googlemanagedprometheus instana jaeger pulsar sapm f5cloud loadbalancing sumologic otlp azuremonitor datadog elasticsearch influxdb mezmo skywalking awscloudwatchlogs awsemf dynatrace googlecloud tanzuobservability tencentcloud_logservice logicmonitor opencensus signalfx splunk_hec googlecloudpubsub loki prometheusremotewrite otlphttp azuredataexplorer carbon file jaeger_thrift logzio kafka prometheus sentry zipkin logging awskinesis awsxray coralogix])

atoulme · 2023-04-07T21:08:12Z

The s3 exporter is not code complete yet from what I understand. We have just merged the skeleton of the exporter.

pdelewski · 2023-04-07T21:15:42Z

The s3 exporter is not code complete yet from what I understand. We have just merged the skeleton of the exporter.

That's true. There is a second PR with implementation #10000, however it requires some work as interfaces changed during last few months.

miguelb-gk · 2023-04-13T14:55:09Z

Any idea when this will make its way in? We are also seeing the same unknown error.

github-actions · 2023-04-14T14:17:33Z

Pinging code owners for exporter/awss3: @atoulme @pdelewski. See Adding Labels via Comments if you do not have permissions to add labels yourself.

atoulme · 2023-04-15T08:01:33Z

#20912 is open to finish the job, please review.

akshadha22 · 2023-05-08T23:53:06Z

I'm unable to pull latest release i.e., v0.76.3 from Dockerhub. v0.75.0 doesn't contain aws s3exporter option. When can we expect a new release that includes the latest changes for aws s3 exporter?

MovieStoreGuy · 2023-05-10T12:34:47Z

Take a look at https://github.com/open-telemetry/opentelemetry-collector-releases/blob/main/distributions/otelcol-contrib/manifest.yaml#LL39C79-L39C92, it should be released as of v0.77.0

alolita assigned rakyll May 13, 2021

alolita added the enhancement New feature or request label May 13, 2021

alolita added the comp:aws AWS components label Sep 2, 2021

atoulme mentioned this issue Dec 14, 2021

Add Parquet support open-telemetry/opentelemetry-proto#346

Closed

atoulme mentioned this issue Dec 18, 2021

[exporter/parquet] initial implementation #6903

Merged

suiluj mentioned this issue Mar 29, 2022

Exporter for Machine Learning Pipeline: Webdataset with batched MessagePack protobuf binary data (for PyTorch) #8948

Closed

MovieStoreGuy mentioned this issue Jan 27, 2023

[Discussion]Export OTel data to AWS S3 (exporter S3) #1963

Closed

atoulme added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor labels Mar 9, 2023

anish-aws mentioned this issue Mar 15, 2023

S3 Exporter in ADOT aws-observability/aws-otel-collector#1615

Open

rakyll removed their assignment Mar 23, 2023

mx-psi added the exporter/awss3 label Apr 14, 2023

atoulme mentioned this issue Apr 15, 2023

[exporter/awss3] S3 Exporter #20912

Merged

dmitryax closed this as completed in #20912 Apr 27, 2023

Add an S3 exporter #2835

Add an S3 exporter #2835

Comments

rakyll commented Mar 23, 2021

anuraaga commented Mar 24, 2021

bogdandrutu commented Mar 24, 2021

rakyll commented Mar 25, 2021

sirianni commented Mar 31, 2021

emeraldbay commented Apr 9, 2021

emeraldbay commented May 31, 2021

jrcamp commented Jun 14, 2021 • edited Loading

emeraldbay commented Jun 24, 2021

emeraldbay commented Jun 24, 2021

jrcamp commented Jun 24, 2021

knvpk commented Oct 21, 2021

knvpk commented Dec 11, 2021

atoulme commented Dec 14, 2021

atoulme commented Dec 14, 2021

jpkrohling commented Dec 14, 2021

atoulme commented Dec 18, 2021

atoulme commented Feb 11, 2022

jmcarp commented Mar 15, 2022 • edited Loading

pdelewski commented Apr 26, 2022

jpkrohling commented Apr 26, 2022

jmacd commented Apr 26, 2022

atoulme commented Apr 26, 2022

pmm-sumo commented Apr 27, 2022

nerochiaro commented Dec 13, 2022

pdelewski commented Dec 13, 2022

MovieStoreGuy commented Jan 26, 2023

MovieStoreGuy commented Jan 27, 2023

pdelewski commented Feb 1, 2023

guyfrid commented Feb 7, 2023 • edited Loading

pdelewski commented Feb 7, 2023

erez111 commented Feb 15, 2023

philipcwhite commented Feb 24, 2023

atoulme commented Mar 9, 2023

rakyll commented Mar 23, 2023

atoulme commented Mar 23, 2023

sudopras commented Mar 30, 2023

atoulme commented Mar 30, 2023

akshadha22 commented Apr 7, 2023 • edited Loading

atoulme commented Apr 7, 2023

pdelewski commented Apr 7, 2023

miguelb-gk commented Apr 13, 2023

github-actions bot commented Apr 14, 2023

atoulme commented Apr 15, 2023

akshadha22 commented May 8, 2023

MovieStoreGuy commented May 10, 2023

jrcamp commented Jun 14, 2021 •

edited

Loading

jmcarp commented Mar 15, 2022 •

edited

Loading

guyfrid commented Feb 7, 2023 •

edited

Loading

akshadha22 commented Apr 7, 2023 •

edited

Loading