-
Notifications
You must be signed in to change notification settings - Fork 13
Using MsPASS with Docker
To understand this procedure you need to be sure to understand a few key points.
- docker is a piece of software to create a lightweight virtual machine that will run on your machine. It is best suited for a single host with multiple processors that can be exploited for parallel processing. See the related section on singularity for clusters.
- A container in docker is a lightweight instance of a virtual machine that share a common configuration. It might be helpful to think of each container as a child of the root docker virtual machine. The containers run largely in isolation from each other but share a common virtual operating system.
- docker-compose is a related tool for working with multiple docker containers that mspass uses for parallel operations. docker-compose is configured with a YAML script. For mspass an example configuration is stored in the top of the github tree as the file docker-compose.yml.
- An key feature of docker is that it provides a way standardize the setup of mongodb, which would otherwise be a burden on most users. That step is described below in the section titled starting and stopping mongodb.
- It can be confusing to understand where data is stored in a virtual machine environment. In the discussion below files or data we reference that reside on a virtual machine will be set in italics. Local files/data will be referred to with normal font text.
There are two distinctly different steps needed to set up your system for mspass with docker: (1) installing docker and docker-compose, and (2) configuration of the virtual machine environment for running mspass. The next two sections discuss details concerning these two steps.
To install Docker on machines that you have root access, please refer to the guide here. For HPC systems, please refer to the following section and use Singularity instead.
For linux systems we note two issues you may encounter that will speed this process:
- Without some tricks docker can only be with a sudo command. That means each "docker" call below would need to be change to "sudo docker". You can do that, but it can get annoying. To avoid this you need to manipulate groups to get your user name in the same group as docker. There are variants in unix about how groups are handled. Follow this link for instructions on Ubuntu. You also may find it necessary to restart your machine to get the revised groups to be recognized.
- You will need both docker and docker-compose. Unix package managers may split them. e.g. on Ubuntu you need to use apt-get for both the key docker and docker-compose.
To proceed from here we assume docker has been installed and the docker daemon is running in the background.
Once you have docker setup properly, use the following command in a terminal to pull the docker image to your local machine:
docker pull wangyinz/mspass
Be patient as this can take a few minutes.
After pulling the docker image, cd
to the directory that you want to hold the database related files stored, and create a data
directory with mkdir data
if it does not already exist. Use this command to start the MongoDB server:
docker run --name MsPASS -d --mount src=`pwd`,target=/home,type=bind wangyinz/mspass
- The
--name
option will give the launched container instance a nameMsPASS
. - The
-d
will let the container run as a daemon so that the process will be kept in the background. - The
--mount
option will bind current directory to/home
within the container, which is the default directory for database files and logs. This option keeps the files outside of the container, so they will be accessible after the container is removed.
To be able to access MongoDB server from outside of the container, use the following command:
docker run --name MsPASS -d -p 27017:27017 --mount src=`pwd`,target=/home,type=bind wangyinz/mspass
- The
-p
is used to map the host port to the container port.27017
is the default for MongoDB. It is not necessary if all MongoDB communications will be within the container.
You may have to wait for a couple seconds for the MongoDB server to initialize. Then, you can launch the MongoDB client with:
docker exec -it MsPASS mongo
It will launch the mongo shell within the MsPASS
container created from previous command. The -i
and -t
specifies an interactive pseudo-TTY session.
To stop the mongoDB server, type the following commands in the mongo shell:
use admin
db.shutdownServer()
and then remove the container with:
docker rm MsPASS
We will use the docker-compose
command to launch two container instances that compose a Spark standalone cluster. One is called mspass-master
that runs the MongoDB server and Spark master, and the other is called mspass-worker
that runs a Spark worker. Both containers will be running on the same machine in this setup.
First, pull the docker image. Then, create a data
directory to hold the MongoDB database files if it does not already exist. Assume you are working in the root directory of this repository, run the following command to bring up the two container instances:
docker-compose up -d
- The
-d
will let the containers run as daemons so that the processes will be kept in the background.
To launch the containers in a different directory, cd
to that directory and create a data
directory there. Then, you need to explicitly point the command to the docker-compose.yml
file:
docker-compose -f path_to_MsPASS/docker-compose.yml up -d
Once the containers are running, you will see several log files from MongoDB and Spark created in current directory. Since we have the port mapping feature of Docker enabled, you can also open localhost:8080
in your browser to check the status of Spark through the master’s web UI, where you should see the worker is listed a ALIVE. Note that the links to the worker will not work due to the container's network setup.
First, we want to make sure the Spark cluster is setup and running correctly. This can be done running the pi calculation example within the Spark distribution. To submit the example from mspass-master
, use:
docker exec mspass-master /usr/local/spark/bin/run-example --master spark://mspass-master:7077 SparkPi 10
to submit it from mspass-worker
, use:
docker exec mspass-worker /usr/local/spark/bin/run-example --master spark://mspass-master:7077 SparkPi 10
- The
docker exec
will run the command within themspass-master
ormspass-worker
container. - The
--master
option specifies the Spark master, which ismspass-master
in our case. The7077
is the default port of Spark master.
The output of this example is very verbose, but you should see a line of Pi is roughly 3.141...
near the end of the stdout, which is the result of the calculation. You should also see the jobs in the Running Applications or Completed Applications session at localhost:8080
.
To launch an interactive mongo shell within mspass-master
, use:
docker exec -it mspass-master mongo
To access the MongoDB server from mspass-worker
, use:
docker exec -it mspass-worker mongo --host mspass-master
- The
-it
option opens an interactive pseudo-TTY session - The
--host
option will direct the client to the server running onmspass-master
.
To launch an interactive Python session to run Spark jobs, use the pyspark command through mspass-master
:
docker exec -it mspass-master pyspark \
--conf "spark.mongodb.input.uri=mongodb://mspass-master/test.myCollection?readPreference=primaryPreferred" \
--conf "spark.mongodb.output.uri=mongodb://mspass-master/test.myCollection" \
--conf "spark.master=spark://mspass-master:7077" \
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.1
or through mspass-worker
:
docker exec -it mspass-worker pyspark \
--conf "spark.mongodb.input.uri=mongodb://mspass-master/test.myCollection?readPreference=primaryPreferred" \
--conf "spark.mongodb.output.uri=mongodb://mspass-master/test.myCollection" \
--conf "spark.master=spark://mspass-master:7077" \
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.1
- The three
--conf
options specify the input, output database collections, and the Spark master. The Spark master and the MongoDB server are running onmspass-master
, so the urls should point to that in both cases. Please substitutetest
andmyCollection
with the database name or collection name desired. - The
--packages
option will setup the MongoDB Spark connector environment in this Python session.
Please refer to this documentation for more details about the MongoDB Spark connector.
To bring down the containers, run:
docker-compose down
or
docker-compose -f path_to_MsPASS/docker-compose.yml down