datawhale

Datawhale is an open-source replacement to Datadog and Weights & Biases, combining system- (GPU/CPU/I/O/Networking) and experiment-level (loss, lr, norms, gradients, optimizer state, activations, attention maps) observability (metrics/logs) for AI research.

Datawhale uses an established open-source software stack (Grafana, Loki, Prometheus, node_exporter, dgcm_exporter) and can be seen as a configuration template for this stack.

Datawhale is designed to accommodate workflows from individual researchers working locally or on academic clusters (often without sudo rights) to entire research orgs with clusters in the cloud.

Installation

Datawhale supports single-server as well as multi-server configurations. Remote clusters usually require the multi-server installation path, such that jobs can be monitored across multiple nodes from a central location. Datawhale distinguishes between one server and multiple clients. A familiar analogy are the login (server) and compute (client) nodes on an HPC cluster.

Installing datawhale is as simple as cloning the repository and running setup.sh:

git clone https://github.com/p-doom/datawhale.git
# server installation
bash scripts/setup.sh ROLE=server MODE=standalone
# client installation
bash scripts/setup.sh ROLE=client MODE=standalone

We currently only support the standalone installation path. The server installation downloads prebuilt binaries of Grafana, Loki and Prometheus, while the client installation downloads node_exporter, and, depending on the availability, dcgm_exporter or nvml_exporter. Notably, this does not require package manager access, sudo rights, or cluster-side Docker support, and can thus be run on any (academic|on-prem|cloud) cluster. Docker-based installation support is on the roadmap. We currently only support amd64-linux, but the repository should be easily extendable to other architectures and operating systems.

You can run datawhale using deploy.sh:

bash scripts/deploy.sh ROLE=<server|client> MODE=standalone

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
configs		configs
dashboards		dashboards
docs		docs
example_experiment		example_experiment
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

datawhale

Installation

About

Releases

Packages

Languages

License

p-doom/datawhale

Folders and files

Latest commit

History

Repository files navigation

datawhale

Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages