Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[apache_spark][driver] Add Apache Spark package with Driver data stream #2945

Merged
merged 10 commits into from
Apr 22, 2022
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
/packages/activemq @elastic/integrations
/packages/akamai @elastic/security-external-integrations
/packages/apache @elastic/integrations
/packages/apache_spark @elastic/integrations
/packages/atlassian_bitbucket @elastic/security-external-integrations
/packages/atlassian_confluence @elastic/security-external-integrations
/packages/atlassian_jira @elastic/security-external-integrations
Expand Down
3 changes: 3 additions & 0 deletions packages/apache_spark/_dev/build/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
dependencies:
ecs:
reference: [email protected]
72 changes: 72 additions & 0 deletions packages/apache_spark/_dev/build/docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Apache Spark

The Apache Spark integration collects and parses data using the Jolokia Metricbeat Module.

## Compatibility

This module has been tested against `Apache Spark version 3.2.0`

## Requirements

In order to ingest data from Apache Spark, you must know the full hosts for the Main and Worker nodes.

In order to gather Spark statistics, we need to download and enable Jolokia JVM Agent.

```
cd /usr/share/java/
wget -O jolokia-agent.jar http://search.maven.org/remotecontent?filepath=org/jolokia/jolokia-jvm/1.3.6/jolokia-jvm-1.3.6-agent.jar
```

As far, as Jolokia JVM Agent is downloaded, we should configure Apache Spark, to use it as JavaAgent and expose metrics via HTTP/JSON. Edit spark-env.sh. It should be in `/usr/local/spark/conf` and add following parameters (Assuming that spark install folder is `/usr/local/spark`, if not change the path to one on which Spark is installed):
```
export SPARK_MASTER_OPTS="$SPARK_MASTER_OPTS -javaagent:/usr/share/java/jolokia-agent.jar=config=/usr/local/spark/conf/jolokia-master.properties"
```

Now, create `/usr/local/spark/conf/jolokia-master.properties` file with following content:
```
host=0.0.0.0
port=7777
agentContext=/jolokia
backlog=100

policyLocation=file:///usr/local/spark/conf/jolokia.policy
historyMaxEntries=10
debug=false
debugMaxEntries=100
maxDepth=15
maxCollectionSize=1000
maxObjects=0
```

Now we need to create /usr/local/spark/conf/jolokia.policy with following content:
```xml
<?xml version="1.0" encoding="utf-8"?>
<restrict>
<http>
<method>get</method>
<method>post</method>
</http>
<commands>
<command>read</command>
</commands>
</restrict>
```

Configure Agent with following in conf/bigdata.ini file:
```
[Spark-Master]
stats: http://127.0.0.1:7777/jolokia/read
```
Restart Spark master.

Follow the same set of steps for Spark Worker, Driver and Executor.

## Metrics

### Driver

This is the `driver` dataset.

{{event "driver"}}

{{fields "driver"}}
7 changes: 7 additions & 0 deletions packages/apache_spark/changelog.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# newer versions go on top

- version: "0.1.0"
changes:
- description: Apache Spark integration package with metrics
type: enhancement
link: https://github.com/elastic/integrations/pull/2811
Loading