Distributed Learning Cluster
Authors:
-
Girija Manoj Kumar Reddy Kalakoti
-
Santosh Kumar Chejarla
Programming Language: Python
We built Distributed Learning Cluster, that can fairly schedule jobs of different preloaded models. This is project is done from scratch and includes the foundational modules like
- Distributed Log Grepper - This module is used to debug the entire application.
- Distributed Failure Detector - This module is used to find the failed processes.
- Distributed File System - This module is used to handle the input and output files of the ML Jobs keeping the consistency and availability in mind.
- Distributed Learning Cluster - This module is responsible of allocating the fair amount of resources to each task.