-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec format for spans in build logs and bash script output #2773
Comments
Hi @abitrolly! |
@arminru you are the expert in this repo, so I don't know where it belong. I just came here with the problem that I need to extract telemetry from build logs. This belongs to specification, because without the spec it is hard to persuade people to format the messages in a suitable way for a wider range of tools, rather than my personal script. |
@abitrolly So you're not looking for a particular format to be used by the logging/console/stdout exporters (already) built into most OpenTelemetry SDK (e.g. the |
@arminru if there is a particular standardized format that I can reuse, I am all interested in that. But I could not find anything here. |
I see that it is a class https://open-telemetry.github.io/opentelemetry-js/classes/_opentelemetry_sdk_trace_base.ConsoleSpanExporter.html and not a format, so no examples, "prints spans", but no stated goal that the spans are actually machine readable. So I don't see it is a suitable lead. |
@abitrolly Indeed. I think all of the logging/console/stdout exporters in OTel are intended to be human readable for development and debugging and not intended for machine-readability since there are plenty of machine-readable export formats available (first and foremost OTLP). There is a file exporter in the OTel collector which uses the OTLP/JSON format but this does not guarantee stable field names either. |
@arminru from the link it looks like OTLP is a protocol (as described in its title), and while it describes that it uses "protobuf payloads in JSON format", I don't see examples to figure out how it looks like and if it is suitable to produces from build logs. You say there a plenty of machine-readable formats available. I would prefer the one standardized, but if there are links to a better examples for that specific use case, I am interested to study them.
I can not see that format. It assumes that the reader can read protobuf protocol, but I can't, so diagrams and example would be beneficial to be on the same wavelength.
I am open to proposals. My use case if explained in the issue description. I've got a plenty of build logs, and spending time instrumenting every tools in pipeline to produce protobuf output to a specified server it not effective. All information about timings is already present in build logs. It is hard to collect, because there is no standard. |
The protocol is defined in https://github.com/open-telemetry/opentelemetry-proto with the definition for traces being in https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto. It would mean that your app would need a protobuf client for it, which can be generated for a wide number of languages based on the
So the tools in your pipeline are already instrumented with OTel and have the respective languages logging exporter in their pipeline to print to those build logs? If that's the case, then there should be ways to configure them to export OTLP to a tracing backend or - if you decide to do so - your custom app that takes those span timings to produce an analysis for you. This might not even require you to change any code but based on how the respective SDKs are set up, it might be sufficient to just set the respective env vars (first and foremost |
@arminru build logs are produced by build scripts, which are often written in bash or in .yaml that calls bash of powerscript, or Python. How do you propose to generate
No. If they were instrumented, I wouldn't have to spend the time to instrument them. Even if I had the time, I don't think that developers of tools that are used in build pipelines, such as C compilers or Python linters agree that they need to support OLTP introducing additional externals dependencies and adding all the complexity and overhead of compiling protobufs in their development and release processes. In my practice releasing Python lib with protobuf was one hell of experience. |
@abitrolly So the pipelines and build tools you're trying to analyze are not instrumented with OpenTelemetry at all and you're also not looking to instrument them with OpenTelemetry? How is your question related to OpenTelemetry and the OTel spec then? |
Tigran will pursue further to see if this is a duplicate. |
It seems to me that the use case you are looking for would traditionally be solved with an OpenTelemetry client implementation for bash, aka a CLI interface for OpenTelemetry. |
You don't need a Protobuf library to produce OTLP/JSON. JSON is just a text that you can output from bash as well. Yes, it may be cumbersome, so that's why the question: is OTLP/JSON standard good enough for this use case? In your build tool you can structure the output like this:
This is valid OTLP/JSON. Otherwise, generally speaking I don't see how the standardization of this particular use case belongs to OpenTelemetry. I have not seen in the past the need to express build-time process durations as spans. It may be a totally valid use case for you, but I do not see the widespread need for it that can warrant coming up with a new standardized text format for spans that becomes part of the OpenTelemetry spec. I am going to mark this as "won't do" for now, however please feel free to comment. If you have additional arguments in favour of your proposal, we will reconsider and if there is enough interest in the community we can work on it. |
I am going to keep this open for now to allow time to respond. If we don't hear new supporting arguments we will close the issue. |
There are even some experimental OSS CLI tools for creating OTel traces: |
Thanks. That's the example that I could not find in entire spec. If the spec could contain a TL;DR chapter with the example of how spans desciption looks like and how it is finally rendered, it would save a lot of hours. I remember Zipkin had a very inspring web site for newcomers. I had to insert a couple of {
"resource_spans": [
{
"scope_spans": [
{
"spans": [
{
"trace_id": "1",
"span_id": "1",
"name": "minifying",
"start_time_unix_nano": 1544712660300000000,
"end_time_unix_nano": 1544712660600000000
}
]
}
]
}
]
} So only the full span is sent? I need to trace a CI/CD process that can never complete, and I'd like trace it in real time, so that span start is rendered even if the end is not received. Is that possible with OTEL/JSON?
|
That's a good point. I created an issue to add some examples.
No, this is not possible in OTLP/JSON. Generally speaking, this is not possible in OpenTelemetry. Only complete spans can be exported from OpenTelemetry SDK, so there is no support for incomplete spans in OTLP either. |
Thanks. I can't imagine how much I would need to read to get that answer. That makes it hard to debug long running jobs. I would use this Jsonl as a minimal format to track duration and dependency of various jobs in real-time.
This may become a replacement or an addition to formats that CI/CD systems currently use to make build logs more readable. https://docs.gitlab.com/ee/ci/jobs/index.html#custom-collapsible-sections With uniform format for embedding span info into web page content, it will even be possible to view the trace from F12 browser development tools. Which means avoid installing additional Jaeger server, authenticating, and searching for the trace that is produced by a current page in a separate tab. |
Small correction on this. In theory I think OpenTelemetry API could support this if the destination format allowed this (since Span creation API only requires the start timestamp), but in practice I don't think we have any implementations like this. And it is definitely not possible in OTLP.
Why not record this data as Otel LogRecords? Why does it have to be Spans? |
To me this sounds like a topic of semantic conventions for instrumenting build pipelines. See the following sample payload (the timestamps are not adjusted) - we can model the build process using a combination of Spans and standalone Events. The build job would be the parent span and each step in the build would be a child span. And while a span is in progress, you can emit standalone Events (with an association to the span) to indicate progress (this is not in the sample below). It is possible even today that parent spans don't get reported, in which case you will have a partial trace. You can use the partial trace for visualization purpose. We can address the gap of a parent span not getting reported, via a standalone Event. That is, you can emit an event every time you start a span. We intend to follow this approach in browser instrumentation for the page views. The PageView span takes some time to build and so there is a risk of losing it if the user navigates away, so we intend to emit an Event at the earliest opportunity and send the Span later. Here's the visualization and sample payload.
{
"resourceSpans": [
{
"resource": {
"attributes": [
],
"droppedAttributesCount": 0
},
"scopeSpans": [
{
"spans": [
{
"traceId": "b5dd69fce623bfcecfa6747ac6c7a247",
"spanId": "a06a1e805c399a37",
"parentSpanId": "2a167ab1c1b04756",
"name": "Foo",
"startTimeUnixNano": 1659114423246200000,
"endTimeUnixNano": 1659114423262200000,
"attributes": [
{
"key": "build.step.id",
"value": {
"stringValue": "build"
}
}
],
"droppedAttributesCount": 0
},
{
"traceId": "b5dd69fce623bfcecfa6747ac6c7a247",
"spanId": "a06a1e805c399a37",
"parentSpanId": "2a167ab1c1b04756",
"name": "Foo",
"startTimeUnixNano": 1659114423246200000,
"endTimeUnixNano": 1659114423262200000,
"attributes": [
{
"key": "build.step.id",
"value": {
"stringValue": "test"
}
}
],
"droppedAttributesCount": 0
},
{
"traceId": "b5dd69fce623bfcecfa6747ac6c7a247",
"spanId": "a06a1e805c399a37",
"parentSpanId": "2a167ab1c1b04756",
"name": "Foo",
"startTimeUnixNano": 1659114423246200000,
"endTimeUnixNano": 1659114423262200000,
"attributes": [
{
"key": "build.step.id",
"value": {
"stringValue": "package"
}
}
],
"droppedAttributesCount": 0
},
{
"traceId": "b5dd69fce623bfcecfa6747ac6c7a247",
"spanId": "2a167ab1c1b04756",
"name": "Foo",
"startTimeUnixNano": 1659114423246200000,
"endTimeUnixNano": 1659114423262200000,
"attributes": [
{
"key": "build.job.name",
"value": {
"stringValue": "job-a"
}
},
{
"key": "build.job.id",
"value": {
"stringValue": "123"
}
}
],
"droppedAttributesCount": 0
},
],
"scope": {
"name": "@opentelemetry/instrumentation-build",
"version": "0.0.1"
}
}
]
}
]
} |
@tigrannajaryan build log already contains log records. There is no need to add another interface to read them. But identifying bottlenecks where the processing stalls and takes a lot of time is the problem that is not solved programmatically, and for that real-time spans are needed. |
@scheler yes. Maybe there is no point in separate span payload then. If everything can be encoded as an event. |
@abitrolly the reason for using spans would be to leverage the tooling built for spans by the telemetry backends (visualization, analytics, correlation, etc). Otherwise, I agree that everything can be encoded in the LogRecord data model (which is the basis for Events too). |
Prioritizing representation over real world use cases seems like an omission when designing telemetry collection framework and its underlying data formats. Is there a list of use cases that had been considered when designing the spec? To see if my story of recording incomplete spans matches any of them? Regardless of historic decisions, I am still interested to see some solution I can adapt. Buildkite seems to use timestamp parsing to detect operation durations, and while start of operation is marked by some sequence like +++, there are no signals when operations end. So, for example, if the build process is wrapped into https://buildkite.com/sorbet/sorbet/builds/24799#01837d5f-70eb-43e9-8100-11c5fb92cb16 |
We do have the Z-Pages feature: https://github.com/open-telemetry/opentelemetry-specification/blob/main/experimental/trace/zpages.md which is implemented in Java: https://github.com/open-telemetry/opentelemetry-java/tree/main/sdk-extensions/incubator/src/main/java/io/opentelemetry/sdk/extension/incubator/trace/zpages We also export spans before they are ended in Dynatrace. But this discussion maybe belongs to #373 |
Closing since this item has been inactive for a couple weeks. Please re-open if there is further interest. |
I do have interest in retracing operations from debug logs, using incomplete spans. However, I can not reopen this issue. zPages from #373 and https://github.com/open-telemetry/opentelemetry-specification/blob/v1.16.0/experimental/trace/zpages.md propose to expose current incomplete/inprocess span metrics, which is a great idea, but it doesn't really solve my problem of recording and telemetry about build process in human readable form in build logs. Like installing GitLab Development Kit takes from 10 to 30 minutes. Until the installation is finished, no services (including Jaeger) is available. So far I have these requirements for recording span data in build logs.
Variant 1. JSON Lines. Start/duration.
Pros - easy to parse. Variant 2. Star Wars. Duration.
Pros - readable, easy to search for, parser is Parsing logic
I think that should cover it. |
What are you trying to achieve?
Given a build log https://gitlab.com/gitlab-org/gitlab-docs/-/jobs/2983094851 I want to quickly see what took the most time in build process. I can modify GitLab code to render spans, but I need a uniform way to report span timings in output stream so that tools can parse this in a universal way.
What did you expect to see?
A human readable and machine readable one liners to output span info.
One possibility is breadcrumb syntax.
And an example.
Additional context.
Add any other context about the problem here. If you followed an existing documentation, please share the link to it.
Could not find any diagrams, and/or examples of spans in any representation format.
The text was updated successfully, but these errors were encountered: