Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add profiling tool #2402

Merged
merged 13 commits into from
May 19, 2021
Merged

Add profiling tool #2402

merged 13 commits into from
May 19, 2021

Conversation

nartal1
Copy link
Collaborator

@nartal1 nartal1 commented May 12, 2021

This is the initial PR to add profiling tool.
In this PR, we process event logs to print executor information, properties, Spark version and application information.

Signed-off-by: Niranjan Artal [email protected]

Signed-off-by: Niranjan Artal <[email protected]>
@nartal1 nartal1 added the feature request New feature or request label May 12, 2021
@nartal1 nartal1 added this to the May 10 - May 21 milestone May 12, 2021
@nartal1 nartal1 self-assigned this May 12, 2021
@nartal1 nartal1 requested a review from GaryShen2008 as a code owner May 12, 2021 17:40
@nartal1 nartal1 marked this pull request as draft May 12, 2021 17:40
@nartal1
Copy link
Collaborator Author

nartal1 commented May 12, 2021

This is still WIP. I am not able to convert this to draft PR.
Getting this error "People who are already subscribed will not be unsubscribed." .
Will remove WIP from the title once it is ready for review.

Update: Converted to draft PR.

Signed-off-by: Niranjan Artal <[email protected]>
@@ -0,0 +1,139 @@
# Spark event log profiling tool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about directory name here, this is event_log_profiling, but really this tool should do both profiling and qualification. I realize the first part is pulling in profiling tool, but perhaps we should name this directory something generic like 'workload_profiling'

@@ -0,0 +1,139 @@
# Spark event log profiling tool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say lets remove "event log" from title

@@ -0,0 +1,139 @@
# Spark event log profiling tool

This is a profiling tool to parse and analyze Spark event logs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to provide more information here. In the very least we should say generates information used for debugging and profiling. information such as X, Y, Z. See more information below for specifics.

Does this work with both cpu and gpu generated event logs?


## How to compile and use with Spark
1. `mvn clean package`
2. Copy event_log_profiling_2.12-0.6.0-SNAPSHOT.jar to $SPARK_HOME/jars/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly here, I think we want different name for the jar, I wonder if we use something like rapids-4-spark-workload-profiling

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe something even more generic such as rapids-4-spark-tools because who knows what else we will come up with in the future

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I want to think about this, I didn't realize we are basically running local mode for all the queries here so we can do an uber jar but its essentially all of spark.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the jar name.


`cp target/event_log_profiling_2.12-0.6.0-SNAPSHOT.jar $SPARK_HOME/jars/`

3. Add below entries in $SPARK_HOME/conf/log4j.properties
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wanting to get rid of this before it went into github. is it only the driver that prints? If so lets just make it print to a file specified by user

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed log4j and using PrintWriter instead.


<!--<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-parent</artifactId>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets figure out parent

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

event_log_profiling/src/assembly/assembly.xml Outdated Show resolved Hide resolved
* limitations under the License.
*/

package org.apache.spark.sql.rapids.tool.profiling
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't seem like it needs to be in org.apache.spark, anything that doesn't need to be, lets put in com.nvidia.spark.rapids.XX

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Moved 2 files to com.nvidia.spark.rapids.tool.profiling package.

pom.xml Outdated
@@ -72,7 +72,8 @@

<modules>
<module>api_validation</module>
<module>dist</module>
<module>dist</module>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing is off


// Create the same logger and sparkSession used for ALL Applications.
val logger = ProfileUtils.createLogger(outputDirectory, logFileName)
val sparkSession = ProfileUtils.createSparkSession
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize we are creating a spark session for this, I thought we were just using the classes, this essentially requires spark then to run, I need to think about that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were creating uber jar for this reason and then run using java command to run. Let us know your inputs on handling this in a better way.

Signed-off-by: Niranjan Artal <[email protected]>
@nartal1
Copy link
Collaborator Author

nartal1 commented May 13, 2021

Yet to do: Getting rid of log4j and moving some files to com.nvidia.spark.rapids.XX

@nartal1 nartal1 marked this pull request as ready for review May 14, 2021 07:18
@nartal1 nartal1 changed the title [WIP] Add profiling tool Add profiling tool May 14, 2021
nartal1 added 2 commits May 14, 2021 11:11
Signed-off-by: Niranjan Artal <[email protected]>
Signed-off-by: Niranjan Artal <[email protected]>
pom.xml Outdated Show resolved Hide resolved
@tgravescs
Copy link
Collaborator

build

Signed-off-by: Niranjan Artal <[email protected]>
@nartal1
Copy link
Collaborator Author

nartal1 commented May 18, 2021

build

andygrove
andygrove previously approved these changes May 18, 2021
Copy link
Contributor

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is at a good point to merge now.

@pxLi pxLi changed the base branch from branch-0.6 to branch-21.06 May 19, 2021 01:10
@pxLi pxLi dismissed andygrove’s stale review May 19, 2021 01:10

The base branch was changed.

@nartal1
Copy link
Collaborator Author

nartal1 commented May 19, 2021

@tgravescs @andygrove Could you please approve again. Thanks for reviewing.

@tgravescs tgravescs merged commit 4bbe192 into NVIDIA:branch-21.06 May 19, 2021
nartal1 added a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add profiling tool

Signed-off-by: Niranjan Artal <[email protected]>

* update readme

Signed-off-by: Niranjan Artal <[email protected]>

* change package name

Signed-off-by: Niranjan Artal <[email protected]>

* Remove log4j and add PrintWriter to save report

Signed-off-by: Niranjan Artal <[email protected]>

* move some files to different package

Signed-off-by: Niranjan Artal <[email protected]>

* update pom to remove dependency on reduced pom file

Signed-off-by: Niranjan Artal <[email protected]>

* addressed review comments

Signed-off-by: Niranjan Artal <[email protected]>

* update tests

Signed-off-by: Niranjan Artal <[email protected]>

* addressed review comments

Signed-off-by: Niranjan Artal <[email protected]>

* Addressed review comments

Signed-off-by: Niranjan Artal <[email protected]>

* Create sparkSession in unit tests

Signed-off-by: Niranjan Artal <[email protected]>

* addressed review comments

Signed-off-by: Niranjan Artal <[email protected]>
nartal1 added a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add profiling tool

Signed-off-by: Niranjan Artal <[email protected]>

* update readme

Signed-off-by: Niranjan Artal <[email protected]>

* change package name

Signed-off-by: Niranjan Artal <[email protected]>

* Remove log4j and add PrintWriter to save report

Signed-off-by: Niranjan Artal <[email protected]>

* move some files to different package

Signed-off-by: Niranjan Artal <[email protected]>

* update pom to remove dependency on reduced pom file

Signed-off-by: Niranjan Artal <[email protected]>

* addressed review comments

Signed-off-by: Niranjan Artal <[email protected]>

* update tests

Signed-off-by: Niranjan Artal <[email protected]>

* addressed review comments

Signed-off-by: Niranjan Artal <[email protected]>

* Addressed review comments

Signed-off-by: Niranjan Artal <[email protected]>

* Create sparkSession in unit tests

Signed-off-by: Niranjan Artal <[email protected]>

* addressed review comments

Signed-off-by: Niranjan Artal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants