You would commonly run Spark on a remote cluster like AWS EMR. We can monitor applications using the Spark UI, however this information will be lost after the cluster is shut down. To continue to examine a Spark application, we will need to run the Spark UI locally.
We first need to download the event logs from the History Server when the cluster is still running. This is obtained from
http://<driver-node-ip>:18080, then by clicking Download on the appropriate application.
Unzip and store these in a separate directory e.g.
To run the UI locally, we need to navigate to the Spark source code directory and set the path to
$SPARK_HOME (it can be downloaded from here). For me this variable is
Create a new file in
spark-defaults.conf and add the following line with the directory you created above:
$SPARK_HOME/sbin/start-history-server.sh to start the Spark UI.
Now go to
http://localhost:18080 in a browser and you should see the History Server with the application you downloaded.
$SPARK_HOME/sbin/stop-history-server.sh to stop the UI.