Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[apache_spark][driver] Add Apache Spark package with Driver data stream #2945

Merged
merged 10 commits into from
Apr 22, 2022
12 changes: 10 additions & 2 deletions packages/apache_spark/_dev/build/docs/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Apache Spark
# Apache Spark Integration

The Apache Spark integration collects and parses data using the Jolokia Metricbeat Module.

## Compatibility

This integration has been tested against `Apache Spark version 3.2.0`
This module has been tested against `Apache Spark version 3.2.0`
yug-rajani marked this conversation as resolved.
Show resolved Hide resolved

## Requirements

Expand Down Expand Up @@ -70,3 +70,11 @@ This is the `nodes` data stream.
{{event "nodes"}}

{{fields "nodes"}}

### Driver

This is the `driver` data stream.

{{event "driver"}}

{{fields "driver"}}
38 changes: 23 additions & 15 deletions packages/apache_spark/_dev/deploy/docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,15 +1,23 @@
ARG SERVICE_VERSION=${SERVICE_VERSION:-3.2.0}
FROM docker.io/bitnami/spark:${SERVICE_VERSION}

ENV JOLOKIA_VERSION=1.6.0
USER root

COPY jolokia-configs /spark/conf/
RUN mkdir /usr/share/java && \
curl -o /usr/share/java/jolokia-agent.jar https://repo1.maven.org/maven2/org/jolokia/jolokia-jvm/$JOLOKIA_VERSION/jolokia-jvm-$JOLOKIA_VERSION-agent.jar && \
echo 'export SPARK_MASTER_OPTS="$SPARK_MASTER_OPTS -javaagent:/usr/share/java/jolokia-agent.jar=config=/spark/conf/jolokia-master.properties"' >> "/opt/bitnami/spark/conf/spark-env.sh"

RUN echo '*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink' >> "/opt/bitnami/spark/conf/metrics.properties" && \
echo '*.source.jvm.class=org.apache.spark.metrics.source.JvmSource' >> "/opt/bitnami/spark/conf/metrics.properties"

HEALTHCHECK --interval=1s --retries=90 CMD curl -f -s http://localhost:7777/jolokia/version
ARG SERVICE_VERSION=${SERVICE_VERSION:-3.2.0}
FROM docker.io/bitnami/spark:${SERVICE_VERSION}

ENV JOLOKIA_VERSION=1.6.0 SPARK_MAIN_URL=spark://apache-spark-main:7077 SPARK_RPC_AUTHENTICATION_ENABLED=no SPARK_RPC_ENCRYPTION_ENABLED=no SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no SPARK_SSL_ENABLED=no SPARK_WORKER_MEMORY=1024G SPARK_WORKER_CORES=8
yug-rajani marked this conversation as resolved.
Show resolved Hide resolved
USER root

COPY jolokia-configs /spark/conf/
RUN mkdir /usr/share/java && \
curl -o /usr/share/java/jolokia-agent.jar https://repo1.maven.org/maven2/org/jolokia/jolokia-jvm/$JOLOKIA_VERSION/jolokia-jvm-$JOLOKIA_VERSION-agent.jar && \
echo 'export SPARK_MASTER_OPTS="$SPARK_MASTER_OPTS -javaagent:/usr/share/java/jolokia-agent.jar=config=/spark/conf/jolokia-master.properties"' >> "/opt/bitnami/spark/conf/spark-env.sh" && \
echo 'export SPARK_WORKER_OPTS="$SPARK_WORKER_OPTS -javaagent:/usr/share/java/jolokia-agent.jar=config=/spark/conf/jolokia-worker.properties"' >> "/opt/bitnami/spark/conf/spark-env.sh"

RUN echo '*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink' >> "/opt/bitnami/spark/conf/metrics.properties" && \
echo '*.source.jvm.class=org.apache.spark.metrics.source.JvmSource' >> "/opt/bitnami/spark/conf/metrics.properties"

RUN echo 'spark.driver.extraJavaOptions -javaagent:/usr/share/java/jolokia-agent.jar=config=/spark/conf/jolokia-driver.properties' >> "/opt/bitnami/spark/conf/spark-defaults.conf"

WORKDIR /opt/bitnami/spark/examples/src/main/python/
COPY wordcount.py /opt/bitnami/spark/examples/src/main/python
COPY docker-entrypoint /docker-entrypoint/
RUN chmod 755 /docker-entrypoint/docker-entrypoint.sh
ENTRYPOINT ["/docker-entrypoint/docker-entrypoint.sh","/opt/bitnami/scripts/spark/run.sh"]
HEALTHCHECK --interval=1s --retries=90 CMD curl -f -s http://localhost:7777/jolokia/version
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ services:
dockerfile: Dockerfile
ports:
- 7777
- 7779
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#!/bin/bash

# shellcheck disable=SC1091

set -o errexit
set -o nounset
set -o pipefail
#set -o xtrace

# Load libraries
. /opt/bitnami/scripts/libbitnami.sh
. /opt/bitnami/scripts/libspark.sh

# Load Spark environment variables
eval "$(spark_env)"

print_welcome_page

if [ ! $EUID -eq 0 ] && [ -e "$LIBNSS_WRAPPER_PATH" ]; then
echo "spark:x:$(id -u):$(id -g):Spark:$SPARK_HOME:/bin/false" > "$NSS_WRAPPER_PASSWD"
echo "spark:x:$(id -g):" > "$NSS_WRAPPER_GROUP"
echo "LD_PRELOAD=$LIBNSS_WRAPPER_PATH" >> "$SPARK_CONFDIR/spark-env.sh"
fi

if [[ "$1" = "/opt/bitnami/scripts/spark/run.sh" ]]; then
info "** Starting Spark setup **"
/opt/bitnami/scripts/spark/setup.sh
info "** Spark setup finished! **"
fi

eval "$(spark_env)"
cd /opt/bitnami/spark/sbin
./start-worker.sh $SPARK_MAIN_URL --cores $SPARK_WORKER_CORES --memory $SPARK_WORKER_MEMORY &
cd /opt/bitnami/spark/examples/src/main/python/
/opt/bitnami/spark/bin/spark-submit wordcount.py status_api_demo.py $SPARK_MAIN_URL &

echo ""
exec "$@"
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
[Spark-Master]
stats: http://127.0.0.1:7777/jolokia/read
[Spark-Master]
stats: http://127.0.0.1:7777/jolokia/read
[Spark-Worker]
stats: http://127.0.0.1:7778/jolokia/read
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
host=0.0.0.0
port=7779
agentContext=/jolokia
backlog=100

policyLocation=file:///spark/conf/jolokia.policy
historyMaxEntries=10
debug=false
debugMaxEntries=100
maxDepth=15
maxCollectionSize=1000
maxObjects=0
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
host=0.0.0.0
port=7777
agentContext=/jolokia
backlog=100
policyLocation=file:///spark/conf/jolokia.policy
historyMaxEntries=10
debug=false
debugMaxEntries=100
maxDepth=15
maxCollectionSize=1000
maxObjects=0
host=0.0.0.0
port=7777
agentContext=/jolokia
backlog=100

policyLocation=file:///spark/conf/jolokia.policy
historyMaxEntries=10
debug=false
debugMaxEntries=100
maxDepth=15
maxCollectionSize=1000
maxObjects=0
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
host=0.0.0.0
port=7778
agentContext=/jolokia
backlog=100

policyLocation=file:///spark/conf/jolokia.policy
historyMaxEntries=10
debug=false
debugMaxEntries=100
maxDepth=15
maxCollectionSize=1000
maxObjects=0
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
<?xml version="1.0" encoding="utf-8"?>
<restrict>
<http>
<method>get</method>
<method>post</method>
</http>
<commands>
<command>read</command>
<command>list</command>
<command>search</command>
<command>version</command>
</commands>
</restrict>
<?xml version="1.0" encoding="utf-8"?>
<restrict>
<http>
<method>get</method>
<method>post</method>
</http>
<commands>
<command>read</command>
<command>list</command>
<command>search</command>
<command>version</command>
</commands>
</restrict>
50 changes: 50 additions & 0 deletions packages/apache_spark/_dev/deploy/docker/wordcount.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

import sys
import signal
import time

from operator import add
from datetime import datetime

from pyspark.sql import SparkSession

if __name__ == "__main__":
yug-rajani marked this conversation as resolved.
Show resolved Hide resolved
if len(sys.argv) != 3:
print("Usage: wordcount <file>", file=sys.stderr)
sys.exit(-1)

spark = SparkSession\
.builder\
.master(sys.argv[2])\
.appName("PythonWordCount")\
.getOrCreate()

t_end = time.time() + 60 * 15

# Run loop for 15 mins
while time.time() < t_end:
lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
counts = lines.flatMap(lambda x: x.split(' ')) \
.map(lambda x: (x, 1)) \
.reduceByKey(add)
output = counts.collect()
for (word, count) in output:
print("%s: %i" % (word, count))

spark.stop()
3 changes: 3 additions & 0 deletions packages/apache_spark/changelog.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

- version: "0.1.0"
changes:
- description: Implement "driver" data stream
type: enhancement
link: https://github.com/elastic/integrations/pull/2945
- description: Implement "nodes" data stream
type: enhancement
link: https://github.com/elastic/integrations/pull/2939
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
vars: ~
data_stream:
vars:
hosts:
- http://apache-spark-main:{{Ports.[1]}}
path:
- /jolokia/?ignoreErrors=true&canonicalNaming=false
Loading