-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[apache_spark] Add Apache Spark package #2811
Conversation
💔 Build Failed
Expand to view the summary
Build stats
Steps errors
Expand to view the steps failures
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you populate "README.md" with all details needed? It does not have any details of the integration currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you take a look at the Spark project documentation? Metrics are grouped based on instances, there isn't one big bucket for all metrics.
policy_templates: | ||
- name: apache_spark | ||
title: Apache Spark metrics | ||
description: Collect Apache Spark metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem as with Spring Boot, you need to divide metrics into logical groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we have addressed this in our previous comment
size: 600x600 | ||
type: image/png | ||
icons: | ||
- src: /img/apache_spark-logo.svg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you can use the right logo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about using this one?
https://upload.wikimedia.org/wikipedia/commons/f/f3/Apache_Spark_logo.svg
packages/apache_spark/manifest.yml
Outdated
- monitoring | ||
release: beta | ||
conditions: | ||
kibana.version: ^7.16.2 || ^8.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... I think you can go with ^8.0.0 only. We don't need to publish this package for old releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execution/sprint plan that we sent out mentioned ^7.16.0 and hence we kept this. But if you believe we should just keep ^8.0.0, we will update it.
CC: @akshay-saraswat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8.0 sounds fine to me too. Let's reduce the supported versions and hence the need for backward compatibility as much as we can. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, thank you for the clarification. We will go with 8.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a similar comment on Nagios integration just now. Let's make it 8.2.0 instead. 8.0 is already released and we will not be able to publish this integration before 8.2.0 I am guessing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ on releasing new integrations only for the most recent version. I assume we don't test all the previous releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we'll update it once it has been verified and tested on our end. We'll follow the same for the future integrations as well.
packages/apache_spark/changelog.yml
Outdated
|
||
- version: "0.1.0" | ||
changes: | ||
- description: Initial draft of the package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you plan to merge this package with only this PR, it won't be a draft anymore :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's in the plan for future commits.
Does this look good?
Apache Spark integration package with Driver, Executor, Master, Worker and ApplicationSource metrics
packages/apache_spark/docs/README.md
Outdated
|
||
This is a new integration created using the [elastic-package](https://github.com/elastic/elastic-package) tool. | ||
|
||
Consider using the README template file `_dev/build/docs/README.md`to generate a list of exported fields or include a sample event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is required from the end-user to make it integrated? What kind of changes should be introduced to Spark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's something we are still working on. We'll update the README with detailed steps of the changes required.
@lalit-satapathy , yes that's something we are still working on. This PR is still in progress, so expect system tests, dashboards, and README in the later commits |
@mtojek Yes, we did check out the documentation. However, we referred to the Hadoop integration where we have clubbed multiple metrics in a single data stream keeping in mind the future extensibility and thought of using the same approach here for Apache Spark. |
Pinging @elastic/integrations (Team:Integrations) |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtojek Yes, we did check out the documentation. However, we referred to the Hadoop integration where we have clubbed multiple metrics in a single data stream keeping in mind the future extensibility and thought of using the same approach here for Apache Spark.
I'm afraid that it isn't the way we'd like to follow. We'd rather see metrics split into multiple areas and add a new area if needed. We have to iterate on this. I will check the Hadoop integration and if it applies also there, I guess we have to refactor that one as well.
/test |
We haven't actually analyzed the |
We are facing an issue with the system tests for the ApplicationSource, Driver and Executor as we need to keep an application running in the container to collect the metrics. We are actively working on the same, and we'll update the PR as soon as it's done. Meanwhile, please feel free to add reviews to the other changes that are completed. |
Issue with system tests: If we run a sample application (for example, WordCount) as a part of the Dockerfile, it would be considered as a step and would complete when the application is ran completely, and hence the metrics are not found during the system tests. We also tried running the application in background (once and in an infinite loop) using entrypoint script, but were unsuccessful and were not able to hit Jolokia even after that. Please let us know if you have any ideas on the same. We can push the code over the PR for you to take a look if you think that's a good idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yug, I did partially the review as some comments apply to many places. Please revisit the entire PR based on those.
export SPARK_MASTER_OPTS="$SPARK_MASTER_OPTS -javaagent:/usr/share/java/jolokia-agent.jar=config=/usr/local/spark/conf/jolokia-master.properties" | ||
``` | ||
|
||
Now, create `/usr/local/spark/conf/jolokia-master.properties` file with following content: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use words: main
/worker
instead of master
/worker
everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting to use main
/worker
everywhere in the README or everywhere else (like the field names)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everywhere where it can be applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks for passing links, I'll make the changes.
wget -O jolokia-agent.jar http://search.maven.org/remotecontent?filepath=org/jolokia/jolokia-jvm/1.3.6/jolokia-jvm-1.3.6-agent.jar | ||
``` | ||
|
||
As far, as Jolokia JVM Agent is downloaded, we should configure Apache Spark, to use it as JavaAgent and expose metrics via HTTP/Json. Edit spark-env.sh. It should be in `/usr/local/spark/conf` and add following parameters (Assuming that spark install folder is /usr/local/spark, if not change the path to one on which Spark is installed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: HTTP/JSON
wget -O jolokia-agent.jar http://search.maven.org/remotecontent?filepath=org/jolokia/jolokia-jvm/1.3.6/jolokia-jvm-1.3.6-agent.jar | ||
``` | ||
|
||
As far, as Jolokia JVM Agent is downloaded, we should configure Apache Spark, to use it as JavaAgent and expose metrics via HTTP/Json. Edit spark-env.sh. It should be in `/usr/local/spark/conf` and add following parameters (Assuming that spark install folder is /usr/local/spark, if not change the path to one on which Spark is installed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/usr/local/spark
``` | ||
|
||
Now we need to create /usr/local/spark/conf/jolokia.policy with following content: | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if you can use ```xml to enable code coloring. Could you please check if it works?
USER root | ||
RUN \ | ||
apt-get update && apt-get install -y \ | ||
wget |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is curl
present in the image? Maybe we don't need to install wget
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, thank you!
"version": "8.0.0" | ||
}, | ||
"apache_spark": { | ||
"metrics": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not updated event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this sample event is old because the system tests are only completed for the nodes
data stream as of now. The sample event would be updated along with the same.
type: group | ||
release: beta | ||
fields: | ||
- name: executor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, data stream name is inconsistent with this field
type: long | ||
- name: cpu_time | ||
type: long | ||
- name: deserialize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should skip these metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but we followed the PRD where it has been mentioned to collect gauge and counter metrics.
Spark provides the following types of metrics - Gauge, Counter, Histogram, Meter, Timer. The most common types of metrics used in Spark instrumentation are gauges and counters. Hence, we will support only Gauge and Counter metrics from the following providers
How should we take it forward?
CC: @akshay-saraswat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will pass the final decision to @akshay-saraswat.
BTW what you quoted here is the overall idea to collect different TYPES of metrics, not ALL metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yug-elastic I believe both deserialize
and cpu_time
are timer metrics. They are neither counter nor gauge. Please correct me if I am wrong. If not, discard them, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, got it. Maybe this line in the Apache doc confused you - "namespace=executor (metrics are of type counter or gauge)". They are exposing it as a count but I don't think this is a count. It's time in seconds in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thanks for the clarification!
type: long | ||
- name: generated_method_size | ||
type: long | ||
- name: hive_client_calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same answer as this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, our competitors collect it I believe. Let's keep it.
"version": "8.0.0" | ||
}, | ||
"apache_spark": { | ||
"metrics": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, outdated event
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this sample event is old because the system tests are only completed for the nodes
data stream as of now. The sample event would be updated along with the same.
I think Cluster Manager is just a component in the architecture and you get the metrics associated via Master or ApplicationMaster if I am not mistaken. Feel free to correct my understanding if you find anything different. |
This PR would strongly benefit from splitting it up into multiple PRs. One for the foundation of spark + 1 data stream and then have an additional PR for each data stream. As the comments in this PR are important and should not be lost, I recommend to switch this PR to draft, open a fresh new PR and copy over the parts needed for the first foundation PR and reference this PR there. Then go on an copy over each data stream directory to a PR per data stream as soon as the first PR is merged. This should make it possible to move forward much more quickly and more focused. |
Thanks @ruflin, it's a good idea. Here are the links for the PRs that this PR is split into: @mtojek JFYI, the change suggested in #2811 (comment) in this PR regarding dropping/not collecting some fields because they are too detailed are left to be covered in the above PRs. We'll update the same soon. |
Thanks for the split up @yug-elastic . Which of the PRs we should have a detailed look first? We need one PR to get merged first and then follow up with the other 3, this ensures we don't review some of the content twice. Update: Looks like #2939 is the one? |
@@ -0,0 +1,96 @@ | |||
# Apache Spark | |||
|
|||
The Apache Spark integration collects and parses data using the Jolokia Metricbeat Module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would stating 'metricbeat module' confuse users into thinking that a separate metricbeat module is required for this integration. Should we instead say "Jolokia Input"? Although it's implicit from the requirements that the metricbeat module is not required. But it would be nice to make it explicit if you receive more comments to address and update this PR. Otherwise, this PR looks good to me. Don't update just for this nitpick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the approval, @akshay-saraswat! This PR is the old PR which was split into data stream specific PRs as discussed. The other PRs have been approved and some of them are already merged. We do have one open PR (#3070) which is approved by Jaime Soriano Pastor and waiting for the approval from CODEOWNERS. If "Jolokia Input" sounds more intuitive to you, we'll update that PR with the same change, it's not a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the same.
Quick Reference: https://github.com/elastic/integrations/pull/3070/files#diff-05a30b6a3e489638b54c77068eb780e019e4dcf90827f391e5693aa96aefb4f2R3
Closing this PR as it was split up into multiple PRs as discussed in the comment #2811 (comment). All the parts are now merged and the linked issue (#493) has been closed. Thanks a lot @mtojek, @ruflin, @jsoriano, @akshay-saraswat and @lalit-satapathy for taking out time to review the PRs and providing valuable feedback! |
What does this PR do?
Checklist
changelog.yml
file.manifest.yml
file to point to the latest Elastic stack release (e.g.^7.13.0
).How to test this PR locally
Screenshots