Skip to content
This repository has been archived by the owner on Dec 17, 2024. It is now read-only.

[Remote-Shuffle-2]Add mkdocs.yml and update docs #3

Merged
merged 1 commit into from
Jan 13, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Remote Shuffle

## Online Documentation

You can find the all the PMem Spill documents on the [project web page](https://oap-project.github.io/remote-shuffle/).

## Contents
- [Introduction](#introduction)
- [User Guide](#userguide)
Expand All @@ -11,13 +15,13 @@ This is an essential part of enabling Spark on disaggregated compute and storage


### Installation
We have provided a Conda package which will automatically install dependencies needed by OAP, you can refer to [OAP-Installation-Guide](../../docs/OAP-Installation-Guide.md) for more information. If you have finished [OAP-Installation-Guide](../../docs/OAP-Installation-Guide.md), you can find compiled OAP jars in `$HOME/miniconda2/envs/oapenv/oap_jars/`.
We have provided a Conda package which will automatically install dependencies needed by OAP, you can refer to [OAP-Installation-Guide](./docs/OAP-Installation-Guide.md) for more information. If you have finished [OAP-Installation-Guide](./docs/OAP-Installation-Guide.md), you can find compiled OAP jars in `$HOME/miniconda2/envs/oapenv/oap_jars/`.

## Developer Guide
### Build and Deploy

We have provided a Conda package which will automatically install dependencies needed by OAP, you can refer to [OAP-Installation-Guide](../../docs/OAP-Installation-Guide.md) for more information. If you have finished [OAP-Installation-Guide](../../docs/OAP-Installation-Guide.md), you can find compiled remote shuffle jars under `$HOME/miniconda2/envs/oapenv/oap_jars`.
Then just skip this section and jump to [User Guide](#User-Guide).
We have provided a Conda package which will automatically install dependencies needed by OAP, you can refer to [OAP-Installation-Guide](./docs/OAP-Installation-Guide.md) for more information. If you have finished [OAP-Installation-Guide](./docs/OAP-Installation-Guide.md), you can find compiled remote shuffle jars under `$HOME/miniconda2/envs/oapenv/oap_jars`.
Then just skip this section and jump to [User Guide](#user-guide).

Build this module using the following command in `OAP/oap-shuffle/remote-shuffle` folder. This file needs to be deployed on every compute node that runs Spark. Manually place it on all nodes or let resource manager do the work.

Expand All @@ -33,8 +37,8 @@ following configurations in spark-defaults.conf or Spark submit command line arg
Note: For DAOS users, DAOS Hadoop/Java API jars should also be included in the classpath as we leverage DAOS Hadoop filesystem.

```
spark.executor.extraClassPath $HOME/miniconda2/envs/oapenv/oap_jars/oap-remote-shuffle-<version>.jar
spark.driver.extraClassPath $HOME/miniconda2/envs/oapenv/oap_jars/oap-remote-shuffle-<version>.jar
spark.executor.extraClassPath $HOME/miniconda2/envs/oapenv/oap_jars/remote-shuffle-<version>.jar
spark.driver.extraClassPath $HOME/miniconda2/envs/oapenv/oap_jars/remote-shuffle-<version>.jar
```

Enable the remote shuffle manager and specify the Hadoop storage system URI holding shuffle data.
Expand Down
135 changes: 0 additions & 135 deletions docs/Developer-Guide.md

This file was deleted.

109 changes: 109 additions & 0 deletions docs/OAP-Developer-Guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# OAP Developer Guide

This document contains the instructions & scripts on installing necessary dependencies and building OAP.
You can get more detailed information from OAP each module below.

* [SQL Index and Data Source Cache](https://github.com/oap-project/sql-ds-cache/blob/master/docs/Developer-Guide.md)
* [PMem Common](https://github.com/oap-project/pmem-common)
* [PMem Shuffle](https://github.com/oap-project/pmem-shuffle#5-install-dependencies-for-shuffle-remote-pmem-extension)
* [Remote Shuffle](https://github.com/oap-project/remote-shuffle)
* [OAP MLlib](https://github.com/oap-project/oap-mllib)
* [Arrow Data Source](https://github.com/oap-project/arrow-data-source)
* [Native SQL Engine](https://github.com/oap-project/native-sql-engine)

## Building OAP

### Prerequisites for Building

OAP is built with [Apache Maven](http://maven.apache.org/) and Oracle Java 8, and mainly required tools to install on your cluster are listed below.

- [Cmake](https://help.directadmin.com/item.php?id=494)
- [GCC > 7](https://gcc.gnu.org/wiki/InstallingGCC)
- [Memkind](https://github.com/memkind/memkind/tree/v1.10.1-rc2)
- [Vmemcache](https://github.com/pmem/vmemcache)
- [HPNL](https://github.com/Intel-bigdata/HPNL)
- [PMDK](https://github.com/pmem/pmdk)
- [OneAPI](https://software.intel.com/content/www/us/en/develop/tools/oneapi.html)
- [Arrow](https://github.com/Intel-bigdata/arrow)

- **Requirements for Shuffle Remote PMem Extension**
If enable Shuffle Remote PMem extension with RDMA, you can refer to [PMem Shuffle](https://github.com/oap-project/pmem-shuffle) to configure and validate RDMA in advance.

We provide scripts below to help automatically install dependencies above **except RDMA**, need change to **root** account, run:

```
# git clone -b <tag-version> https://github.com/Intel-bigdata/OAP.git
# cd OAP
# sh $OAP_HOME/dev/install-compile-time-dependencies.sh
```

Run the following command to learn more.

```
# sh $OAP_HOME/dev/scripts/prepare_oap_env.sh --help
```

Run the following command to automatically install specific dependency such as Maven.

```
# sh $OAP_HOME/dev/scripts/prepare_oap_env.sh --prepare_maven
```


### Building

To build OAP package, run command below then you can find a tarball named `oap-$VERSION-bin-spark-$VERSION.tar.gz` under directory `$OAP_HOME/dev/release-package `.
```
$ sh $OAP_HOME/dev/compile-oap.sh
```

Building Specified OAP Module, such as `oap-cache`, run:
```
$ sh $OAP_HOME/dev/compile-oap.sh --oap-cache
```


### Running OAP Unit Tests

Setup building environment manually for intel MLlib, and if your default GCC version is before 7.0 also need export `CC` & `CXX` before using `mvn`, run

```
$ export CXX=$OAP_HOME/dev/thirdparty/gcc7/bin/g++
$ export CC=$OAP_HOME/dev/thirdparty/gcc7/bin/gcc
$ export ONEAPI_ROOT=/opt/intel/inteloneapi
$ source /opt/intel/inteloneapi/daal/2021.1-beta07/env/vars.sh
$ source /opt/intel/inteloneapi/tbb/2021.1-beta07/env/vars.sh
$ source /tmp/oneCCL/build/_install/env/setvars.sh
```

Run all the tests:

```
$ mvn clean test
```

Run Specified OAP Module Unit Test, such as `oap-cache`:

```
$ mvn clean -pl com.intel.oap:oap-cache -am test

```

### Building SQL Index and Data Source Cache with PMem

#### Prerequisites for building with PMem support

When using SQL Index and Data Source Cache with PMem, finish steps of [Prerequisites for building](#prerequisites-for-building) to ensure needed dependencies have been installed.

#### Building package

You can build OAP with PMem support with command below:

```
$ sh $OAP_HOME/dev/compile-oap.sh
```
Or run:

```
$ mvn clean -q -Ppersistent-memory -Pvmemcache -DskipTests package
```
25 changes: 11 additions & 14 deletions docs/OAP-Installation-Guide.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# OAP Installation Guide
This document introduces how to install OAP and its dependencies on your cluster nodes by **Conda**.
This document introduces how to install OAP and its dependencies on your cluster nodes by ***Conda***.
Follow steps below on ***every node*** of your cluster to set right environment for each machine.

## Contents
Expand All @@ -25,35 +25,32 @@ For changes to take effect, close and re-open your current shell. To test your i

Dependencies below are required by OAP and all of them are included in OAP Conda package, they will be automatically installed in your cluster when you Conda install OAP. Ensure you have activated environment which you created in the previous steps.

- [Arrow](https://github.com/Intel-bigdata/arrow)
- [Plasma](http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/)
- [Memkind](https://anaconda.org/intel/memkind)
- [Vmemcache](https://anaconda.org/intel/vmemcache)
- [HPNL](https://anaconda.org/intel/hpnl)
- [PMDK](https://github.com/pmem/pmdk)
- [OneAPI](https://software.intel.com/content/www/us/en/develop/tools/oneapi.html)


Create a conda environment and install OAP Conda package.
```bash
$ conda create -n oapenv -y python=3.7
$ conda activate oapenv
$ conda install -c conda-forge -c intel -y oap=0.9.0
$ conda install -c conda-forge -c intel -y oap=1.0.0
```

Once finished steps above, you have completed OAP dependencies installation and OAP building, and will find built OAP jars under `$HOME/miniconda2/envs/oapenv/oap_jars`

#### Extra Steps for Shuffle Remote PMem Extension

If you use one of OAP features -- [Shuffle Remote PMem Extension](../oap-shuffle/RPMem-shuffle/README.md), there are 2 points to note.

1. Shuffle Remote PMem Extension needs to install library [PMDK](https://github.com/pmem/pmdk) which we haven't provided in OAP Conda package, so you can run commands below to enable PMDK (Certain libraries need to be compiled and installed on your system using ***root*** account, so you need change to `root` account to run the following commands).

```
# git clone -b <tag-version> https://github.com/Intel-bigdata/OAP.git
# cd OAP/
# sh dev/install-runtime-dependencies.sh
```
2. If you also want to use Shuffle Remote PMem Extension with **RDMA**, you need to configure and validate RDMA, please refer to [Shuffle Remote PMem Extension Guide](../oap-shuffle/RPMem-shuffle/README.md#4-configure-and-validate-rdma) for the details.
If you use one of OAP features -- [PMmem Shuffle](https://github.com/oap-project/pmem-shuffle) with **RDMA**, you need to configure and validate RDMA, please refer to [PMem Shuffle](https://github.com/oap-project/pmem-shuffle#4-configure-and-validate-rdma) for the details.


## Configuration
Once finished steps above, make sure libraries installed by Conda can be linked by Spark, please add the following configuration settings to `$SPARK_HOME/conf/spark-defaults` on the working node.

Once finished steps above, make sure libraries installed by Conda can be linked by Spark, please add the following configuration settings to `$SPARK_HOME/conf/spark-defaults.conf`.

```
spark.executorEnv.LD_LIBRARY_PATH $HOME/miniconda2/envs/oapenv/lib
Expand All @@ -65,7 +62,7 @@ spark.driver.extraClassPath $HOME/miniconda2/envs/oapenv/oap_jars/$OAP_F

And then you can follow the corresponding feature documents for more details to use them.

* [OAP User Guide](../README.md#user-guide)




Expand Down
Loading