-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send pipeline logs to OpenTelemetry endpoint #304
Conversation
cyrille-leclerc
commented
Jan 25, 2022
•
edited
Loading
edited
- Support sending the Pipeline logs to an OpenTelemetry endpoint.
- When pipeline logs are sent to Elastic through the OpenTelemetry instrumentation, support visualization of logs through Jenkins pipeline logs screens in addition to supporting visualization of logs through Elastic / Kibana
- Add instrumentation of Jenkins Stapler
- Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
- Ensure that the pull request title represents the desired changelog entry
- Please describe what you did
- Link to relevant issues in GitHub or Jira
- Collect pipelines and build logs #245
- Link to relevant pull requests, esp. upstream and downstream changes
- Ensure you have provided tests - that demonstrates feature works or fixes the issue
5da6599
to
c4c0ea9
Compare
# Conflicts: # pom.xml
* Move Elasticsearch code under backend package * Move Elasticsearch code under backend package
…nt (#315) * Implement logs retriever pagination using new Elasticsearch Java client * WIP * WIP * extract integration tests configuration from code * WIP * WIP * WIP * WIP * fix findbugs * better docs * WIP
…stead of the new elasticsearch-java waiting for the fix elastic/elasticsearch-java#163
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any general feedback on how well JEP-210 worked for your use case?
if (rootPath.equals("job")) { | ||
// e.g /job/my-war/job/master/lastBuild/console | ||
// e.g /job/my-war/job/master/2/console | ||
ParsedJobUrl parsedJobUrl = parseJobUrl(pathInfoTokens); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all seems like a maintenance burden. Did you consider using a supported API like https://javadoc.jenkins.io/component/stapler/org/kohsuke/stapler/StaplerRequest.html#getAncestors()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks I got trapped. I started adding HTTP request tracing to understand how the progressive rendering of logs worked in Jenkins and ended up this insane if then else.
I'll add a comment in the code to use this Stapler API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding HTTP request tracing to understand how the progressive rendering of logs worked in Jenkins
BTW -Dstapler.trace
, automatic with mvn hpi:run
.
|
||
/** | ||
* Use the old `org.elasticsearch.client:elasticsearch-rest-high-level-client` waiting for | ||
* `co.elastic.clients:elasticsearch-java` to fix https://github.com/elastic/elasticsearch-java/issues/163 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code has been refactored and cleaned up since then.
* Retrieve the logs from Elasticsearch. | ||
* FIXME graceful shutdown | ||
*/ | ||
public class ElasticsearchLogStorageRetriever implements LogStorageRetriever<ElasticsearchLogsQueryContext>, Closeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deliberately committed as a *.txt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was struggling understanding the Jenkins progressive logs rendering APIs and kept temporarily some code as .txt, it has been removed since then.
|
||
import java.util.Objects; | ||
|
||
public class ElasticsearchLogsQueryContext implements LogsQueryContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, removed since then
|
||
/** | ||
* Utilities for extracting and reinserting {@link ConsoleNote}s. | ||
* copied from https://github.com/jenkinsci/pipeline-cloudwatch-logs-plugin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…and then modified from there. Candidate for a convenience API in workflow-api
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be an idea. The helper in the Jenkins Otel Plugin changed since then but it can probably be useful for other. implementations of JEP 210
ConsoleNotes.TextAndAnnotations textAndAnnotations = ConsoleNotes.parse(bytes, len); | ||
String plainLogLine = textAndAnnotations.text; | ||
if (plainLogLine == null || plainLogLine.isEmpty()) { | ||
LOGGER.log(Level.FINER, () -> buildInfo + " - skip empty log line"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be confusing if you actually had some tool run as part of a sh
step which deliberately emitted blank lines as a visual separator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid point, I should have added a comment to explain why I ended removing these line breaks, it was probably caused by challenges I faced understanding Jenkins progressive logs rendering APIs
this.clock = Clock.getDefault(); | ||
} else { | ||
long instantInNanosOnJenkinsAgentAtDeserialization = Clock.getDefault().now(); | ||
long offsetInNanosOnJenkinsAgent = instantInNanosOnJenkinsControllerBeforeSerialization - instantInNanosOnJenkinsAgentAtDeserialization; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this assume that the serialization, communication over Remoting, and deserialization are nearly instantaneous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not instantaneous, I wanted monotonicity of the timestamps even when the clocks of the controller and of the build agents are different.
Don't you do something similar with the CloudWatch storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted monotonicity of the timestamps
My point: let us say the two hardware clocks are actually perfectly synchronized (up to relativistic constraints, whatever). But the Remoting channel is extremely sluggish today for whatever reason, and it takes ten seconds to send a Command
. Then offsetInNanosOnJenkinsAgent
will be 10_000_000_000 or so. But in fact the correct behavior in this case would be to use zero offset, if you are trying to ensure that messages delivered to the external storage are accurately threaded.
In other words, if you want to calculate clock skew, you need to send a round-trip packet and adjust by half the time required to do so, or something a bit more complex like that. I think https://github.com/jenkinsci/jenkins/blob/38d4e86a80d123d50c966f4584a7e795b6bcb227/core/src/main/java/hudson/model/Slave.java#L732 does this.
Don't you do something similar with the CloudWatch storage?
No, io.jenkins.plugins.pipeline_cloudwatch_logs.TimestampTracker
solves different problems:
- ensuring that we do not mark the build log as complete until all messages known to have been sent from agents have actually arrived in the cloud and are available for reading (otherwise you would often lose the last few lines of a build, which tend to be the most critical if there was a failure)
- making sure output from multiple steps running on a given agent is logically ordered (no attempt is made to order messages between the controller and an agent, or between two agents, in the face of clock skew)
import java.util.Properties; | ||
import java.util.Random; | ||
|
||
public class OpenTelemetryLogToElasticsearchIT { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*.txt
rather than just @Ignore
like the rest of the ITs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to cleanup all the test container based ITs.
|
||
import static org.junit.Assume.assumeTrue; | ||
|
||
@Ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be the only IT of the JEP-210 implementation, if enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, all tests are manual for the moment. It's not great but I didn't have the bandwidth to fix the integrated tests.
private Object writeReplace() throws IOException { | ||
logger.log(Level.FINEST, () -> "writeReplace()"); | ||
JenkinsJVM.checkJenkinsJVM(); | ||
return new OtelLogSenderBuildListenerOnAgent(buildInfo, otelConfigProperties, otelResourceAttributes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any need to pass minimal credentials to the agent replacement as the CloudWatch Logs implementation does, or does the telemetry endpoint somewhat automatically verify the origin of a message so that a malicious build would not be able to spoof output from unrelated jobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otelConfigProperties
contain the credentials of the OTLP endpoint. Does it answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It answers my question by affirming that this implementation is indeed insecure for some configurations. https://github.com/jenkinsci/pipeline-cloudwatch-logs-plugin#security
THANK YOU very much for your code review @jglick .
JEP-210 was quite hard to implement but to our knowledge, it works well. The most difficult things were to understand:
|
Rendering of logs for running builds is not complete in JEP-210. Currently Stapler (and by extension Jenkins core, and also Blue Ocean) use |