-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[apache_spark] Add Apache Spark package #2811
Changes from 15 commits
5d89087
c63bbb0
905d9c5
bca5904
09cd450
18817bb
836c367
a5e7e0c
03f48c0
418818d
123e99e
7400981
697d8e4
7590986
c95a75a
1f6fd43
87330a9
cf5377f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
dependencies: | ||
ecs: | ||
reference: [email protected] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
# Apache Spark | ||
|
||
The Apache Spark integration collects and parses data using the Jolokia Metricbeat Module. | ||
|
||
## Compatibility | ||
|
||
This module has been tested against `Apache Spark version 3.2.0` | ||
|
||
## Requirements | ||
|
||
In order to ingest data from Apache Spark, you must know the full hosts for the Master and Worker nodes. | ||
|
||
In order to gather Spark statistics, we need to download and enable Jolokia JVM Agent. | ||
|
||
``` | ||
cd /usr/share/java/ | ||
wget -O jolokia-agent.jar http://search.maven.org/remotecontent?filepath=org/jolokia/jolokia-jvm/1.3.6/jolokia-jvm-1.3.6-agent.jar | ||
``` | ||
|
||
As far, as Jolokia JVM Agent is downloaded, we should configure Apache Spark, to use it as JavaAgent and expose metrics via HTTP/Json. Edit spark-env.sh. It should be in `/usr/local/spark/conf` and add following parameters (Assuming that spark install folder is /usr/local/spark, if not change the path to one on which Spark is installed): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: HTTP/JSON There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
``` | ||
export SPARK_MASTER_OPTS="$SPARK_MASTER_OPTS -javaagent:/usr/share/java/jolokia-agent.jar=config=/usr/local/spark/conf/jolokia-master.properties" | ||
``` | ||
|
||
Now, create `/usr/local/spark/conf/jolokia-master.properties` file with following content: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please use words: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you suggesting to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Everywhere where it can be applied. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it. Thanks for passing links, I'll make the changes. |
||
``` | ||
host=0.0.0.0 | ||
port=7777 | ||
agentContext=/jolokia | ||
backlog=100 | ||
|
||
policyLocation=file:///usr/local/spark/conf/jolokia.policy | ||
historyMaxEntries=10 | ||
debug=false | ||
debugMaxEntries=100 | ||
maxDepth=15 | ||
maxCollectionSize=1000 | ||
maxObjects=0 | ||
``` | ||
|
||
Now we need to create /usr/local/spark/conf/jolokia.policy with following content: | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm wondering if you can use ```xml to enable code coloring. Could you please check if it works? |
||
<?xml version="1.0" encoding="utf-8"?> | ||
<restrict> | ||
<http> | ||
<method>get</method> | ||
<method>post</method> | ||
</http> | ||
<commands> | ||
<command>read</command> | ||
</commands> | ||
</restrict> | ||
``` | ||
|
||
Configure Agent with following in conf/bigdata.ini file: | ||
``` | ||
[Spark-Master] | ||
stats: http://127.0.0.1:7777/jolokia/read | ||
``` | ||
Restart Spark master. | ||
|
||
Follow the same set of steps for Spark Worker, Driver and Executor. | ||
|
||
## Metrics | ||
|
||
### Driver | ||
|
||
This is the `driver` dataset. | ||
|
||
{{event "driver"}} | ||
|
||
{{fields "driver"}} | ||
|
||
### Executors | ||
|
||
This is the `executors` dataset. | ||
|
||
{{event "executors"}} | ||
|
||
{{fields "executors"}} | ||
|
||
### Applications | ||
|
||
This is the `applications` dataset. | ||
|
||
{{event "applications"}} | ||
|
||
{{fields "applications"}} | ||
|
||
### Nodes | ||
|
||
This is the `nodes` dataset. | ||
|
||
{{event "nodes"}} | ||
|
||
{{fields "nodes"}} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
ARG SERVICE_VERSION=${SERVICE_VERSION:-3.2.0} | ||
FROM docker.io/bitnami/spark:${SERVICE_VERSION} | ||
|
||
USER root | ||
RUN \ | ||
apt-get update && apt-get install -y \ | ||
wget | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea, thank you! |
||
|
||
ENV JOLOKIA_VERSION=1.6.0 | ||
RUN mkdir /usr/share/java && \ | ||
wget "http://search.maven.org/remotecontent?filepath=org/jolokia/jolokia-jvm/${JOLOKIA_VERSION}/jolokia-jvm-${JOLOKIA_VERSION}-agent.jar" -O "/usr/share/java/jolokia-agent.jar" && \ | ||
echo 'export SPARK_MASTER_OPTS="$SPARK_MASTER_OPTS -javaagent:/usr/share/java/jolokia-agent.jar=config=/spark/conf/jolokia-master.properties"' >> "/opt/bitnami/spark/conf/spark-env.sh" | ||
|
||
RUN echo '*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink' >> "/opt/bitnami/spark/conf/metrics.properties" && \ | ||
echo '*.source.jvm.class=org.apache.spark.metrics.source.JvmSource' >> "/opt/bitnami/spark/conf/metrics.properties" | ||
HEALTHCHECK --interval=1s --retries=90 CMD curl -f http://localhost:7777/jolokia/version | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
ARG SERVICE_VERSION=${SERVICE_VERSION:-3.2.0} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This file and one for master look nearly the same. Is there an option to use a common Dockerfile and use ENV vars to override differences? |
||
FROM docker.io/bitnami/spark:${SERVICE_VERSION} | ||
|
||
USER root | ||
RUN \ | ||
apt-get update && apt-get install -y \ | ||
wget | ||
|
||
ENV JOLOKIA_VERSION=1.6.0 | ||
RUN mkdir /usr/share/java && \ | ||
wget "http://search.maven.org/remotecontent?filepath=org/jolokia/jolokia-jvm/${JOLOKIA_VERSION}/jolokia-jvm-${JOLOKIA_VERSION}-agent.jar" -O "/usr/share/java/jolokia-agent.jar" && \ | ||
echo 'export SPARK_MASTER_OPTS="$SPARK_MASTER_OPTS -javaagent:/usr/share/java/jolokia-agent.jar=config=/spark/conf/jolokia-master.properties"' >> "/opt/bitnami/spark/conf/spark-env.sh" | ||
|
||
RUN echo '*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink' >> "/opt/bitnami/spark/conf/metrics.properties" && \ | ||
echo '*.source.jvm.class=org.apache.spark.metrics.source.JvmSource' >> "/opt/bitnami/spark/conf/metrics.properties" | ||
HEALTHCHECK --interval=1s --retries=90 CMD curl -f http://localhost:7778/jolokia/version |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
version: '2' | ||
services: | ||
apache_spark: | ||
hostname: apachesparkmaster | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: apache-spark-main It's easier to read with dashes :) |
||
build: | ||
context: ./Dockerfiles | ||
dockerfile: Dockerfile-master | ||
environment: | ||
- SPARK_MODE=master | ||
- SPARK_RPC_AUTHENTICATION_ENABLED=no | ||
- SPARK_RPC_ENCRYPTION_ENABLED=no | ||
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no | ||
- SPARK_SSL_ENABLED=no | ||
- SPARK_MASTER_OPTS="-javaagent:/usr/share/java/jolokia-agent.jar=config=/spark/conf/jolokia-master.properties" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we move all/some of them to Dockerfiles instead? |
||
ports: | ||
- 7777 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please sort |
||
- 8088 | ||
- 7077 | ||
volumes: | ||
- ./jolokia-configs/master:/spark/conf/:rw |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
[Spark-Master] | ||
stats: http://127.0.0.1:7777/jolokia/read |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
host=0.0.0.0 | ||
port=7777 | ||
agentContext=/jolokia | ||
backlog=100 | ||
|
||
policyLocation=file:///spark/conf/jolokia.policy | ||
historyMaxEntries=10 | ||
debug=false | ||
debugMaxEntries=100 | ||
maxDepth=15 | ||
maxCollectionSize=1000 | ||
maxObjects=0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
<?xml version="1.0" encoding="utf-8"?> | ||
<restrict> | ||
<http> | ||
<method>get</method> | ||
<method>post</method> | ||
</http> | ||
<commands> | ||
<command>read</command> | ||
<command>list</command> | ||
<command>search</command> | ||
<command>version</command> | ||
</commands> | ||
</restrict> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
[Spark-Worker] | ||
stats: http://127.0.0.1:7778/jolokia/read |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
host=0.0.0.0 | ||
port=7778 | ||
agentContext=/jolokia | ||
backlog=100 | ||
|
||
policyLocation=file:///spark/conf/jolokia.policy | ||
historyMaxEntries=10 | ||
debug=false | ||
debugMaxEntries=100 | ||
maxDepth=15 | ||
maxCollectionSize=1000 | ||
maxObjects=0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
<?xml version="1.0" encoding="utf-8"?> | ||
<restrict> | ||
<http> | ||
<method>get</method> | ||
<method>post</method> | ||
</http> | ||
<commands> | ||
<command>read</command> | ||
<command>list</command> | ||
<command>search</command> | ||
<command>version</command> | ||
</commands> | ||
</restrict> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
variants: | ||
v3.2.0: | ||
SERVICE_VERSION: 3.2.0 | ||
default: v3.2.0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# newer versions go on top | ||
|
||
- version: "0.1.0" | ||
changes: | ||
- description: Apache Spark integration package with Driver, Executor, Master, Worker and ApplicationSource metrics | ||
type: enhancement | ||
link: https://github.com/elastic/integrations/pull/2811 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
metricsets: ["jmx"] | ||
namespace: "metrics" | ||
hosts: | ||
{{#each hosts}} | ||
- {{this}} | ||
{{/each}} | ||
path: {{path}} | ||
period: {{period}} | ||
jmx.mappings: | ||
- mbean: 'metrics:name=application.*.runtime_ms,type=gauges' | ||
attributes: | ||
- attr: Value | ||
field: application_source.runtime.ms | ||
- mbean: 'metrics:name=application.*.cores,type=gauges' | ||
attributes: | ||
- attr: Value | ||
field: application_source.cores | ||
- mbean: 'metrics:name=application.*.status,type=gauges' | ||
attributes: | ||
- attr: Value | ||
field: application_source.status |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
--- | ||
description: Pipeline for parsing Apache Spark application metrics. | ||
processors: | ||
- set: | ||
field: ecs.version | ||
value: '8.0.0' | ||
- rename: | ||
field: jolokia | ||
target_field: apache_spark | ||
ignore_missing: true | ||
- set: | ||
field: event.type | ||
value: info | ||
- set: | ||
field: event.kind | ||
value: metric | ||
- set: | ||
field: event.module | ||
value: apache_spark | ||
- script: | ||
lang: painless | ||
description: This script will add the name of application under key 'application_source.name' | ||
if: ctx?.apache_spark?.metrics?.mbean?.contains("name=application") == true | ||
source: >- | ||
def bean_name = ctx.apache_spark.metrics.mbean.toString().splitOnToken("."); | ||
def app_name = ""; | ||
if (bean_name[0].contains("name=application") == true) { | ||
app_name = bean_name[1] + "." + bean_name[2]; | ||
} | ||
ctx.apache_spark.metrics.application_source.name = app_name; | ||
- remove: | ||
field: apache_spark.metrics.mbean | ||
ignore_failure: true | ||
on_failure: | ||
- set: | ||
field: error.message | ||
value: '{{ _ingest.on_failure_message }}' |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
- name: data_stream.dataset | ||
type: constant_keyword | ||
description: Data stream dataset. | ||
- name: data_stream.namespace | ||
type: constant_keyword | ||
description: Data stream namespace. | ||
- name: data_stream.type | ||
type: constant_keyword | ||
description: Data stream type. | ||
- name: '@timestamp' | ||
type: date | ||
description: Event timestamp. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
- external: ecs | ||
name: event.kind | ||
- external: ecs | ||
name: event.type | ||
- external: ecs | ||
name: ecs.version | ||
- external: ecs | ||
name: tags | ||
- external: ecs | ||
name: service.address | ||
- external: ecs | ||
name: service.type |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
- name: apache_spark.metrics | ||
type: group | ||
release: beta | ||
fields: | ||
- name: application_source | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be called |
||
type: group | ||
fields: | ||
- name: cores | ||
type: long | ||
description: | | ||
Number of cores. | ||
- name: name | ||
type: keyword | ||
description: | | ||
Name of the application. | ||
- name: runtime_ms | ||
type: long | ||
description: | | ||
Time taken to run the application (ms). | ||
- name: status | ||
type: keyword | ||
description: | | ||
Current status of the application. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
title: Apache Spark application metrics | ||
type: metrics | ||
release: beta | ||
streams: | ||
- input: jolokia/metrics | ||
title: Apache Spark application metrics | ||
description: Collect Apache Spark application metrics using Jolokia agent. | ||
vars: | ||
- name: hosts | ||
type: text | ||
title: Hosts | ||
multi: true | ||
description: | | ||
Full hosts for the Jolokia for Apache Spark (https://spark_master:jolokia_port). | ||
required: true | ||
show_user: true | ||
- name: path | ||
type: text | ||
title: Path | ||
multi: false | ||
required: true | ||
show_user: false | ||
default: /jolokia/?ignoreErrors=true&canonicalNaming=false | ||
- name: period | ||
type: text | ||
title: Period | ||
multi: false | ||
required: true | ||
show_user: true | ||
default: 60s | ||
template_path: "stream.yml.hbs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would stating 'metricbeat module' confuse users into thinking that a separate metricbeat module is required for this integration. Should we instead say "Jolokia Input"? Although it's implicit from the requirements that the metricbeat module is not required. But it would be nice to make it explicit if you receive more comments to address and update this PR. Otherwise, this PR looks good to me. Don't update just for this nitpick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the approval, @akshay-saraswat! This PR is the old PR which was split into data stream specific PRs as discussed. The other PRs have been approved and some of them are already merged. We do have one open PR (#3070) which is approved by Jaime Soriano Pastor and waiting for the approval from CODEOWNERS. If "Jolokia Input" sounds more intuitive to you, we'll update that PR with the same change, it's not a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the same.
Quick Reference: https://github.com/elastic/integrations/pull/3070/files#diff-05a30b6a3e489638b54c77068eb780e019e4dcf90827f391e5693aa96aefb4f2R3