Tuesday, September 6, 2016

Debug Spark Application running on cloud

Most of the time we develop spark applications locally and once we are done we run then in the cloud (Cloudera/Horton works/ AWS). We can not instantiate spark context or hive context in our local machine because we don’t have them installed in our machine. However it will be very helpful if we can debug our spark application in eclipse like we do any other simple java application.

This step by step guide will help you debug your spark application while running on the cloud.

  1. First we will export the jar file and copy it to the cloud/cluster where we want to run the application.
  2. Then run the following command on the cluster

export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=7777

Now when you give any spark submit command it will listen to the port 7777.

Here is the main method I am testing:

  1. Now run the spark-submit command,
spark-submit --master yarn --deploy-mode client --class com.ujwal.SparkPhoenix.MergingServices /home/DEVAPP/usapkota/jars/SparkPhoenix.jar

  1. Now go to eclipse and right click on the main class and Debug As and click on Debug Configurations and choose Remote Java Application.

  1. Notice that we do not have any remote application, so click on create new icon at top left corner.

Give your host and port information and debug.

This will open Debug perspective also you can see on your cluster that your application is running.