Skip to content

Using MsPASS with Docker

Gary Pavlis edited this page Dec 17, 2019 · 26 revisions

Overview

To understand this procedure you need to be sure to understand a few key points.

  • docker is a piece of software to create a lightweight virtual machine that will run on your machine. It is best suited for a single host with multiple processors that can be exploited for parallel processing. See the related section on singularity for clusters.
  • A container in docker is a lightweight instance of a virtual machine that share a common configuration. It might be helpful to think of each container as a child of the root docker virtual machine. The containers run largely in isolation from each other but share a common virtual operating system.
  • docker-compose is a related tool for working with multiple docker containers that mspass uses for parallel operations. docker-compose is configured with a YAML script. For mspass an example configuration is stored in the top of the github tree as the file docker-compose.yml.
  • An key feature of docker is that it provides a way standardize the setup of mongodb, which would otherwise be a burden on most users. That step is described below in the section titled starting and stopping mongodb.
  • It can be confusing to understand where data is stored in a virtual machine environment. In the discussion below files or data we reference that reside on a virtual machine will be set in italics. Local files/data will be referred to with normal font text.

Setting up your system

Overview

There are two distinctly different steps needed to set up your system for mspass with docker: (1) installing docker and docker-compose, and (2) configuration of the virtual machine environment for running mspass. The next two sections discuss details concerning these two steps.

To install Docker on machines that you have root access, please refer to the guide here. For HPC systems, please refer to the following section and use Singularity instead.

For linux systems we note two issues you may encounter that will speed this process:

  1. Without some tricks docker can only be with a sudo command. That means each "docker" call below would need to be change to "sudo docker". You can do that, but it can get annoying. To avoid this you need to manipulate groups to get your user name in the same group as docker. There are variants in unix about how groups are handled. Follow this link for instructions on Ubuntu. You also may find it necessary to restart your machine to get the revised groups to be recognized.
  2. You will need both docker and docker-compose. Unix package managers may split them. e.g. on Ubuntu you need to use apt-get for both the key docker and docker-compose.

To proceed from here we assume docker has been installed and the docker daemon is running in the background.

Once you have docker setup properly, use the following command in a terminal to pull the docker image to your local machine:

docker pull wangyinz/mspass

Be patient as this can take a few minutes.

Getting MongoDB Running with Docker

After pulling the docker image, cd to the directory that you want to hold the database related files stored, and create a data directory with mkdir data if it does not already exist. Use this command to start the MongoDB server:

docker run --name MsPASS -d --mount src=`pwd`,target=/home,type=bind wangyinz/mspass
  • The --name option will give the launched container instance a name MsPASS.
  • The -d will let the container run as a daemon so that the process will be kept in the background.
  • The --mount option will bind current directory to /home within the container, which is the default directory for database files and logs. This option keeps the files outside of the container, so they will be accessible after the container is removed.

To be able to access MongoDB server from outside of the container, use the following command:

docker run --name MsPASS -d -p 27017:27017 --mount src=`pwd`,target=/home,type=bind wangyinz/mspass
  • The -p is used to map the host port to the container port. 27017 is the default for MongoDB. It is not necessary if all MongoDB communications will be within the container.

You may have to wait for a couple seconds for the MongoDB server to initialize. Then, you can launch the MongoDB client with:

docker exec -it MsPASS mongo

It will launch the mongo shell within the MsPASS container created from previous command. The -i and -t specifies an interactive pseudo-TTY session.

To stop the mongoDB server, type the following commands in the mongo shell:

use admin
db.shutdownServer()

and then remove the container with:

docker rm MsPASS

Getting Spark and MongoDB Running with Docker

We will use the docker-compose command to launch two container instances that compose a Spark standalone cluster. One is called mspass-master that runs the MongoDB server and Spark master, and the other is called mspass-worker that runs a Spark worker. Both containers will be running on the same machine in this setup.

First, pull the docker image. Then, create a data directory to hold the MongoDB database files if it does not already exist. Assume you are working in the root directory of this repository, run the following command to bring up the two container instances:

docker-compose up -d
  • The -d will let the containers run as daemons so that the processes will be kept in the background.

To launch the containers in a different directory, cd to that directory and create a data directory there. Then, you need to explicitly point the command to the docker-compose.yml file:

docker-compose -f path_to_MsPASS/docker-compose.yml up -d

Once the containers are running, you will see several log files from MongoDB and Spark created in current directory. Since we have the port mapping feature of Docker enabled, you can also open localhost:8080 in your browser to check the status of Spark through the master’s web UI, where you should see the worker is listed a ALIVE. Note that the links to the worker will not work due to the container's network setup.

First, we want to make sure the Spark cluster is setup and running correctly. This can be done running the pi calculation example within the Spark distribution. To submit the example from mspass-master, use:

docker exec mspass-master /usr/local/spark/bin/run-example --master spark://mspass-master:7077 SparkPi 10

to submit it from mspass-worker, use:

docker exec mspass-worker /usr/local/spark/bin/run-example --master spark://mspass-master:7077 SparkPi 10
  • The docker exec will run the command within the mspass-master or mspass-worker container.
  • The --master option specifies the Spark master, which is mspass-master in our case. The 7077 is the default port of Spark master.

The output of this example is very verbose, but you should see a line of Pi is roughly 3.141... near the end of the stdout, which is the result of the calculation. You should also see the jobs in the Running Applications or Completed Applications session at localhost:8080.

To launch an interactive mongo shell within mspass-master, use:

docker exec -it mspass-master mongo

To access the MongoDB server from mspass-worker, use:

docker exec -it mspass-worker mongo --host mspass-master
  • The -it option opens an interactive pseudo-TTY session
  • The --host option will direct the client to the server running on mspass-master.

To launch an interactive Python session to run Spark jobs, use the pyspark command through mspass-master:

docker exec -it mspass-master pyspark \
  --conf "spark.mongodb.input.uri=mongodb://mspass-master/test.myCollection?readPreference=primaryPreferred" \
  --conf "spark.mongodb.output.uri=mongodb://mspass-master/test.myCollection" \
  --conf "spark.master=spark://mspass-master:7077" \
  --packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.1

or through mspass-worker:

docker exec -it mspass-worker pyspark \
  --conf "spark.mongodb.input.uri=mongodb://mspass-master/test.myCollection?readPreference=primaryPreferred" \
  --conf "spark.mongodb.output.uri=mongodb://mspass-master/test.myCollection" \
  --conf "spark.master=spark://mspass-master:7077" \
  --packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.1
  • The three --conf options specify the input, output database collections, and the Spark master. The Spark master and the MongoDB server are running on mspass-master, so the urls should point to that in both cases. Please substitute test and myCollection with the database name or collection name desired.
  • The --packages option will setup the MongoDB Spark connector environment in this Python session.

Please refer to this documentation for more details about the MongoDB Spark connector.

To bring down the containers, run:

docker-compose down

or

docker-compose -f path_to_MsPASS/docker-compose.yml down