Skip to content
This repository has been archived by the owner on Feb 11, 2020. It is now read-only.

Commit

Permalink
1.0 Release (#2)
Browse files Browse the repository at this point in the history
* working copy of lcmap-spark.

* wip

* wip

* wip

* Update README.md

* Update README.md

* wip

* wip

* readme wip

* wip. update readme for running notebooks

* wip. update readme for running notebooks

* wip. update readme for running notebooks

* updated readme

* updated readme

* updated readme

* updated readme

* updated readme

* updated readme

* updated readme

* updated readme

* updated readme

* updated readme

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* Update README.rst

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip updates

* wip doc updates

* wip doc updates

* wip doc updates

* Update RUNNING.rst

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* updates

* wip updates

* wip updates

* wip updates

* wip doc updates

* wip updates

* updated Dockerfile to reduce disk space and optimize caching.

* update wip docs

* update readme

* updates

* updates

* updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* Update configuration.rst

* wip updates

* doc updates

* doc updates

* doc updates

* doc updates

* doc updates

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc update

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip doc updates

* wip update docs

* wip updates

* adding application.rst

* update

* update

* updates

* update

* update

* doc updates

* update docs

* update

* update configuration.rst

* update configuration.rst

* update configuration.rst

* update configuration.rst

* update configuration.rst

* update configuration.rst

* wip updates

* wip updates

* updates

* remove tmp file

* update running

* examples work now

* updates

* update running

* update running

* added imgs

* updated image

* updated configuration

* updated configuration

* updated configuration

* updated configuration

* updated configuration

* updated configuration

* updated configuration

* renamed examples to jobs to match docs

* install Merlin 2 and increment version.

* formatting on running.rst

* update examples

* travis-ci set up, build modified to push based on branch and commit id

* update .travis.yml

* update travis

* update dockerfile for merlin 2rc2.  updated travis.yml.

* update travis

* removing bin/build.  Updated version.txt to 1.0-SNAPSHOT.  Update tags in Makefile.

* updated for merlin 2 rc2.  removed .travis foo

* updated docs

* update docs

* update docs

* update docs

* update readme

* update readme

* update makefile with conditional logic for branch tagging

* update makefile.  need to update docs and add travis.yml in.

* added travis.yml, updated version

* update travis

* change notebook dir from copy to create

* update travis

* update travis

* update travis

* update docs

* update docs

* update docs

* update docs

* update docs

* update docs

* update docs

* update readme

* update readme

* update readme

* update docs

* doc updates

* update docs

* update docs

* update docs

* update docs

* update docs

* update docs

* update docs
  • Loading branch information
davidvhill authored Feb 7, 2018
1 parent 5eecb99 commit d2e6ef5
Show file tree
Hide file tree
Showing 17 changed files with 851 additions and 114 deletions.
19 changes: 19 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
language: c

sudo: required

services:
- docker

script: make debug && make build && make tag

deploy:
- provider: script
script: make debug && make push
on:
all_branches: true

notifications:
slack:
rooms:
- lcmap:UTqlh9PfPVomfpli10WKyZoh#cicd
82 changes: 66 additions & 16 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,23 +1,73 @@
FROM mesosphere/mesos:1.1.1
FROM centos:7.3.1611

MAINTAINER USGS LCMAP http://eros.usgs.gov
LABEL maintainer="USGS EROS LCMAP http://eros.usgs.gov http://github.com/usgs-eros/lcmap-spark" \
description="CentOS based Spark image for LCMAP" \
org.apache.mesos.version=1.4.0 \
org.apache.spark.version=2.2.0 \
net.java.openjdk.version=1.8.0 \
org.python.version=3.6 \
org.centos=7.3.1611

RUN apt-get update
EXPOSE 8081 4040 8888

WORKDIR /opt/spark/dist
ENV HOME=/home/lcmap \
USER=lcmap \
SPARK_HOME=/opt/spark \
SPARK_NO_DAEMONIZE=true \
PYSPARK_PYTHON=python3 \
MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos.so \
TINI_SUBREAPER=true \
LIBPROCESS_SSL_ENABLED=1 \
LIBPROCESS_SSL_SUPPORT_DOWNGRADE=1 \
LIBPROCESS_SSL_VERIFY_CERT=0 \
LIBPROCESS_SSL_ENABLE_SSL_V3=0 \
LIBPROCESS_SSL_ENABLE_TLS_V1_0=0 \
LIBPROCESS_SSL_ENABLE_TLS_V1_1=0 \
LIBPROCESS_SSL_ENABLE_TLS_V1_2=1 \
LIBPROCESS_SSL_CIPHERS=ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA384:AES256-SHA256:AES128-SHA256 \
LIBPROCESS_SSL_CERT_FILE=/certs/mesos.crt \
LIBPROCESS_SSL_KEY_FILE=/certs/mesos.key \
LIBPROCESS_SSL_CA_FILE=/certs/trustedroot.crt \
LIBPROCESS_SSL_CA_DIR=/certs \
LIBPROCESS_SSL_ECDH_CURVE=auto

COPY tmp/spark-2.1.0-bin-hadoop2.7/ .
COPY files/ /
ENV PATH=$SPARK_HOME/bin:${PATH} \
PYTHONPATH=$PYTHONPATH:$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$SPARK_HOME/python/lib/pyspark.zip

# This lets docker manage the execution
ENV SPARK_NO_DAEMONIZE "true"
ENV MESOS_NATIVE_JAVA_LIBRARY /usr/lib/libmesos.so
ENV SPARK_HOME /opt/spark/dist
ENV PATH $SPARK_HOME/bin:$PATH
ENV PYTHONPATH $SPARK_HOME/python/:$PYTHONPATH
ENV PYTHONPATH $SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
# Add a user to run as inside the container to prevent accidental foo while mounting volumes.
# Use "docker run -u `id -u`" at runtime to assign proper UIDs for file permissions.
# Mesos username must match this username (and be assigned permissions by Mesos admin.)

EXPOSE 7077
EXPOSE 8081
RUN yum install -y sudo && \
adduser -ms /bin/bash $USER && \
echo "$USER ALL=(root) NOPASSWD:ALL" > /etc/sudoers.d/$USER && \
echo "alias sudo='sudo env PATH=$PATH'" > /etc/profile.d/sudo.sh && \
chmod 0440 /etc/sudoers.d/$USER

COPY pom.xml /root
RUN mkdir -p $HOME/notebook

RUN yum update -y
RUN yum install -y java-1.8.0-openjdk-devel.x86_64 \
http://repos.mesosphere.io/el/7/noarch/RPMS/mesosphere-el-repo-7-3.noarch.rpm \
mesos \
bzip2 \
gcc \
maven
RUN yum -y downgrade mesos-1.4.0
RUN curl https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz -o /opt/spark.tgz
RUN cd /opt && tar -zxf spark.tgz && rm -f spark.tgz && ln -s spark-* spark && cd -
RUN curl https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -o /root/mc.sh
RUN bash /root/mc.sh -u -b -p /usr/local
RUN conda install python=3.6 pip jupyter numpy --yes
RUN pip install lcmap-merlin==2.0rc2
RUN mvn -f /root/pom.xml dependency:copy-dependencies -DoutputDirectory=$SPARK_HOME/jars
RUN yum erase -y maven gcc bzip2
RUN yum clean all
RUN rm -rf /var/cache/yum /root/.cache /root/.m2 /root/pom.xml /root/mc.sh
RUN conda clean --all -y

USER $USER
WORKDIR $HOME
RUN sudo chown -R $USER:$USER .

#ENTRYPOINT ["sbin/dispatcher-entry-point.sh"]
37 changes: 25 additions & 12 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,19 +1,32 @@
.DEFAULT_GOAL := build
VERSION := `cat version.txt`
IMAGE := usgseros/lcmap-spark
BRANCH := $(or $(TRAVIS_BRANCH),`git rev-parse --abbrev-ref HEAD | tr / -`)
BUILD_TAG := $(IMAGE):build
TAG := $(shell if [ "$(BRANCH)" = "master" ];\
then echo "$(IMAGE):$(VERSION)";\
else echo "$(IMAGE):$(VERSION)-$(BRANCH)";\
fi)

download-spark:
mkdir tmp; wget -O tmp/spark-2.1.0-bin-hadoop2.7.tgz http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz

unpack-spark: download-spark
cd tmp; gunzip *gz; tar -xvf *tar;
build:
@docker build -t $(BUILD_TAG) --rm=true --compress $(PWD)

init: download-spark unpack-spark
tag:
@docker tag $(BUILD_TAG) $(TAG)

build:
docker build -t usgseros/mesos-spark --rm=true --compress .
docker tag usgseros/mesos-spark usgseros/mesos-spark:latest
docker tag usgseros/mesos-spark usgseros/mesos-spark:1.1.1-2.1.0
login:
@$(if $(and $(DOCKER_USER), $(DOCKER_PASS)), docker login -u $(DOCKER_USER) -p $(DOCKER_PASS), docker login)

push: login
docker push $(TAG)

debug:
@echo "VERSION: $(VERSION)"
@echo "IMAGE: $(IMAGE)"
@echo "BRANCH: $(BRANCH)"
@echo "BUILD_TAG: $(BUILD_TAG)"
@echo "TAG: $(TAG)"

push:
docker login; docker push usgseros/mesos-spark
all: debug build tag push

all: init build push
63 changes: 0 additions & 63 deletions README.md

This file was deleted.

84 changes: 84 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
.. image:: https://travis-ci.org/USGS-EROS/lcmap-spark.svg?branch=develop
:target: https://travis-ci.org/USGS-EROS/lcmap-spark

============
lcmap-spark
============
LCMAP SEE Spark base image.

On DockerHub
------------

https://hub.docker.com/r/usgseros/lcmap-spark/


Features
--------
* Run `Spark <https://spark.apache.org/docs/latest/>`_ locally or on `Mesos <https://mesos.apache.org/>`_
* Interactive development and analysis via `Jupyter Notebooks <https://jupyter.org/>`_
* Connect to `Apache Cassandra <https://cassandra.apache.org/>`_ with the `Spark-Cassandra Connector <https://github.com/datastax/spark-cassandra-connector/>`_ and `DataFrames <https://spark.apache.org/docs/latest/sql-programming-guide.html>`_
* Includes Spark 2.2, JDK 1.8, Python 3.6 and MKL-enabled Numpy

Example
-------

.. code-block:: bash
docker run -it \
--rm \
--user=`id -u` \
--net=host \
--pid=host \
usgseros/lcmap-spark:1.0 \
pyspark
Documentation
-------------

* `Overview <docs/overview.rst/>`_
* `Running lcmap-spark <docs/running.rst/>`_
* `Configuration <docs/configuration.rst/>`_
* `Applications <docs/applications.rst/>`_
* `Developing lcmap-spark <docs/developing.rst/>`_

Requirements
------------

* Docker
* Network access to Mesos Master (optional)
* Mesos username (optional)
* Mesos role (optional)
* Mesos password (optional)
* Mesos certificates (optional)
* Make (optional)

Versioning
----------
lcmap-spark follows semantic versioning: http://semver.org/

License
-------
This is free and unencumbered software released into the public domain.

Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.

In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.

For more information, please refer to http://unlicense.org.
81 changes: 81 additions & 0 deletions docs/applications.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
Developing A SEE application
============================
SEE applications are created by extending the ``lcmap-spark`` Docker image. Additional dependencies may be added to the derivative Docker images, and code may be developed using ``pyspark`` and the Jupyter Notebook server.

Once the new application is ready to run on the SEE, the derivative Docker image must be published to https://hub.docker.com. A user account is required.

``make`` is a good choice to build and push your Dockerfile. An example of doing so is:

.. code-block:: make
.DEFAULT_GOAL := build
VERSION := `cat version.txt`
IMAGE := <YOURACCOUNT/YOUR-IMAGE-NAME>
BRANCH := $(or $(TRAVIS_BRANCH),`git rev-parse --abbrev-ref HEAD | tr / -`)
BUILD_TAG := $(IMAGE):build
TAG := $(shell if [ "$(BRANCH)" = "master" ];\
then echo "$(IMAGE):$(VERSION)";\
else echo "$(IMAGE):$(VERSION)-$(BRANCH)";\
fi)
build:
@docker build -t $(BUILD_TAG) --rm=true --compress $(PWD)
tag:
@docker tag $(BUILD_TAG) $(TAG)
login:
@$(if $(and $(DOCKER_USER), $(DOCKER_PASS)), docker login -u $(DOCKER_USER) -p $(DOCKER_PASS), docker login)
push: login
docker push $(TAG)
debug:
@echo "VERSION: $(VERSION)"
@echo "IMAGE: $(IMAGE)"
@echo "BRANCH: $(BRANCH)"
@echo "BUILD_TAG: $(BUILD_TAG)"
@echo "TAG: $(TAG)"
all: debug build tag push
Keep in mind that Dockerhub is a public resource and all images published there are public by default.

Do not include any sensitive information in your image such as usernames, passwords, URLs, machine names, IP addresses or SSH keys: This is a security violation.

If your application requires sensitive data it can be supplied at runtime through Docker environment variables using ``-e`` or ``--env``. An ``--env-file`` may also be used locally.


What's already installed?
-------------------------
* Python3
* Pyspark
* Conda
* Jupyter
* numpy
* cytoolz
* lcmap-merlin

For a full view of what's available in the lcmap-spark base image, see the `Dockerfile <../Dockerfile>`_.

Installing Additional System Dependencies
------------------------------
* ``sudo conda install X``
* ``sudo yum install X``

Installing Additional Python Dependencies
------------------------------
* ``sudo conda install X``
* ``sudo pip install X``

Derivative Docker Image
-----------------------
All SEE application Dockerfiles should begin with: ``FROM lcmap-spark:<version>``, such as ``FROM lcmap-spark:1.0``.

For a list of available lcmap-spark images, see https://hub.docker.com/r/usgseros/lcmap-spark/tags/.

References
----------
* `Running lcmap-spark <running.rst>`_
* `Official Dockerfile reference <https://docs.docker.com/engine/reference/builder/#usage>`_
Loading

0 comments on commit d2e6ef5

Please sign in to comment.