Paranode is a project developed as part of POSTECH's CSED434 curriculum, dedicated to the implementation of distributed and parallel sorting system for key-value records using the Scala programming language. Its primary objective is the development of a robust sorting system tailored for the distribution of extensive datasets, surpassing the limitations of available RAM and storage capacity. Leveraging Scala, Paranode orchestrates parallel data processing across multiple machines, preserving record integrity through key-based sorting.
To build the project, follow these instructions:
- Clone the repository.
- Run
sbt clean
. - Run
sbt assembly
.
To launch a master node and worker nodes, follow these steps:
- Build the project using the instructions above.
- Ensure that
master
andworker
executables are present in thebuild
directory. - Run
master
with parameters.- For example,
./master <NUMBER_OF_WORKERS>
.
- For example,
- Run
worker
with parameters.- For example,
./worker <MASTER_HOST>:<MASTER_PORT> -I <INPUT_DIRECTORY> <INPUT_DIRECTORY> <...> -O <OUTPUT_DIRECTORY>
.
- For example,
Note that built executables consist of shell scripts and JAR files, and thus require systems that support running shell scripts and have Java installed.
- Scala 2.13.12
- SBT 1.9.6
- Java 20
To run unit tests, follow the instructions below:
- Clone the repository.
- Run
sbt test
.
To run e2e tests, follow the instructions below:
- Clone the repository.
- Run
sbt e2e/test
.
- Install
pre-commit
package by instructions. - Run
pre-commit install
.
Note that pre-commit install
should be re-run if .pre-commit-config.yaml
changed.
This project is developed as part of the CSED434 course at POSTECH, and is not intended for production or commercial use. Any usage is at your own risk, and the contributors are not responsible for any potential issues.