Tuesday, September 6, 2016

Debug Spark Application running on cloud

Most of the time we develop spark applications locally and once we are done we run then in the cloud (Cloudera/Horton works/ AWS). We can not instantiate spark context or hive context in our local machine because we don’t have them installed in our machine. However it will be very helpful if we can debug our spark application in eclipse like we do any other simple java application.

This step by step guide will help you debug your spark application while running on the cloud.

  1. First we will export the jar file and copy it to the cloud/cluster where we want to run the application.
  2. Then run the following command on the cluster

export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=7777

Now when you give any spark submit command it will listen to the port 7777.

Here is the main method I am testing:

  1. Now run the spark-submit command,
spark-submit --master yarn --deploy-mode client --class com.ujwal.SparkPhoenix.MergingServices /home/DEVAPP/usapkota/jars/SparkPhoenix.jar

  1. Now go to eclipse and right click on the main class and Debug As and click on Debug Configurations and choose Remote Java Application.

  1. Notice that we do not have any remote application, so click on create new icon at top left corner.

Give your host and port information and debug.

This will open Debug perspective also you can see on your cluster that your application is running.

Wednesday, April 20, 2016

Updating Docker Version

1. Check your current version,
  #docker -v

Right now we are at version 1.6.2

2. Update package information, and ensure that APT works with the https method
# apt-get update

Tuesday, April 19, 2016

Docker Optional Configurations

Adding user to docker group

1. If you have not configured anything extra, then docker needs root to work. By default, the docker daemon listens on a local Unix socket called 'docker.sock' in /var/run  which is owned by the user root.

If you do 
# ls /run -l

Spark Application in Eclipse

I have prepared a step by step guide to build spark project using scala in eclipse. We will run this application locally through eclipse and also run this in hdfs. You might have already guessed. Yes, you are correct we will work on WordCount as an example. After all we do not want to be an outliers by working on something else.

So, let's get started. I am assuming you already have spark set up and running. If not, you can download cloudera quick start VM from http://www.cloudera.com/downloads.html. I would recommend this if you do not have anything set up and running, cause this is easy and quick. 

Sunday, April 17, 2016

Installation of Docker

In this guide we will install Docker on Ubuntu Trusty 14.04 (LTS). If you do not have ubuntu, check out my step by step guide to install Ubuntu on Vmware.

1. Open terminal on ubuntu and change it to root user, so we don’t have to enter password again and again.

2. Docker requires your kernel to be 3.10 at minimum.

Sunday, April 10, 2016

Download Ubuntu Desktop in VMWare

  1. Go to http://www.ubuntu.com/download/desktop and download Ubuntu 14.04.4LTS. You want to download 64 bit version unless you are using something from your grandfather's age. It will take a while to download.