Skip to content

Latest commit



97 lines (67 loc) · 3.17 KB

File metadata and controls

97 lines (67 loc) · 3.17 KB



  1. HADOOP YARN Setup should be completed and HADOOP_HOME should be set in the environment variable.
  2. Make sure the nodes are set for password-less SSH both ways(master->slaves & slaves->master).
  3. Since we use the environment variables a lot in our scripts, make sure to comment out the portion following this statement in your ~/.bashrc , If not running interactively, don't do anything
  4. Kindly refer to the setups & scripts provided in before proceeding further as the utility scripts provided in the repository are needed here.


  • Monitoring utility nmon & operf are required to collect performance data. To install follows the steps,

    git clone
    cd perftools-setup
    . ~/.bashrc

Note: The script will perform the following,

  • nmon & operfin all the nodes (master + slaves)
  • Configure to run pid_monitor in master node.
  • include the scripts part of this repo to the PATH
# CAUTION: oprofile should be rebuild and installed every time when the system is moved to a different JDK level. 
# The old oprofile must be uninstalled before installing the new version. 

# To uninstall the old oprofile installed do

  cd ${WORKDIR}/oprofile/oprofile-1.1.0
  make uninstall
  make clean
  ./configure -prefix=${WORKDIR}/oprofile/oprofile_install --with-java=${JAVA_HOME}
  make install

nmon recording.

  • Two scripts are provided for nmon recording and they can be invoked as follows,
# To Start nmon recording in all the nodes of the cluster. <dir_name>

# To Stop nmon recording in all the nodes of the cluster.
# This script is also capable of collecting all the nmon recordings 
# and place it in the directory provided by user and keep it in an archive format. <dir_name>


Before starting Spark Application profiling add the following configuration parameter to either spark-submit or spark-sql or spark-shell,

Set OPERFLIB to ${WORKDIR}/oprofile/oprofile_install/lib

--conf spark.executor.extraJavaOptions="-agentpath:${OPERFLIB}/oprofile/" 
--conf spark.executor.extraLibraryPath=${OPERFLIB}                                
--driver-library-path ${OPERFLIB}                                                
--driver-java-options "-agentpath:${OPERFLIB}/oprofile/"

  • To enable oprofile for profiling spark applications:
  • To disable oprofile for spark applications


To invoke pid_monitor in your cluster, run the following command,

pmon <command/script with arguments>

eg:- pmon q73 6 18 23g tpcds1t


  • This command pmon will start necessary data collection agent in all nodes(master+slaves)
  • On completion it will collect all the performance data from the node and make it available in master.
  • All the perf data can be viewed in web under http://masternode/rundir