Rudra is a distributed framework for large-scale machine learning, which accepts training data and model configuration as inputs from the user and outputs the parameters of the trained model.
Detailed documentation on input formats, invocation, sample datasets and planned features can be found on the project wiki.
Dependencies:
- X10 (version 2.5.4 or higher)
- g++ (4.4.7 or higher) / xlC
- (optional - for cuDNN learner) rudra-cudnnlearner package
- (optional - for Theano learner) Theano plus Theano prerequisites
The default version of Rudra uses a proprietary IBM learner implementation
with cuDNN.
There is also an example Theano
learner, the source code for which is included in this package.
A mock learner is also included for unit testing purposes.
Other learners are supported by implementing the learner API in
include/NativeLearner.h
. The make variable RUDRA_LEARNER
chooses between
different learner implementations e.g. basic, theano, mock.
Setting RUDRA_LEARNER=xxx
requires the build to link against a learner
implementation at lib/librudralearner-xxx.so
.
To build the default (cuDNN) version of Rudra, simply run:
$ source rudra.profile
$ make
To build Rudra with a mock learner (for testing purposes):
$ make rudra-mock
The make variable X10RTIMPL
chooses the implementation of
X10RT. You can use whichever
versions of X10RT are supported on your platform e.g. sockets, pami, mpi.
The default is MPI.
(Note: mpi does not currently work on the IBM-internal DCC system for POWER nodes.)
For example, to build the default version of Rudra with X10RT for PAMI, run:
$ make rudra-cudnn X10RTIMPL=pami
To build Rudra with the Theano learner and MPI, run:
$ make rudra-theano X10RTIMPL=mpi
To build librudra:
$ cd cpp && make
To build the Theano learner:
$ cd theano && make
Note:
rudra.profile
sets the necessary environment variables needed for building and running Rudra. Amongst other things, it sets the$RUDRA_HOME
environment variable. In some cases, you may need to modifyrudra.profile
to correctly point to your local Python installation.
One process is reserved for testing, and the remainder are used as learners.
(To turn off testing and use all processes for learning, pass the command line argument -noTest
.)
With MPI or PAMI, the number of places equals the number of processes.
For sockets, set the number of places with the environment variable X10_NPLACES
.
Try running with mlp.py:
$ make rudra-theano X10RTIMPL=sockets
$ export X10_NPLACES=2
$ ./rudra-theano -f examples/theano-mnist.cfg -ll 0 -lr 0 -lt 0 -lu 0
Log level 0 (TRACING) prints the maximum amount of information. If you don't want it, skip the -l* flags.
Dependencies: Doxygen v1.8.0 or higher
$ cd doc && doxygen