Kronara

Kronara is a production-ready machine learning training pipeline originally demonstrated with synthetic data, now fully enhanced with real-world data validation, calibration, interpretability, and robust logging. It uses PyTorch Lightning, Hydra, MLflow, and a variety of data science libraries.

Key Features

Name & Identity: Project is now named "Kronara," a unique, primordial-sounding name representing foundational strength.
Data Handling: Supports both synthetic and real-world (e.g., Breast Cancer Wisconsin) datasets.
Robust Model: Large MLP with advanced regularization, early stopping, and OneCycleLR scheduling.
Calibration & Threshold Optimization: Automated threshold selection based on F1. Reliability diagrams for calibration checks.
Interpretability: SHAP-based feature importance analysis.
Cross-Validation & Ensembling: K-fold splits and ensemble predictions for improved generalization.
Artifacts Directory: All .pt prediction files, reliability diagrams, and other outputs saved to artifacts/ at the project root.
Logging & Monitoring: Structured logging with Loguru, MLflow experiment tracking, and CI-ready testing framework.

Setup Instructions

Create Virtual Environment & Install:

python3.12 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
pip install -e .

Create Artifacts Directory:
```
mkdir artifacts
```
Run Synthetic Data (Default):
```
python scripts/train.py
```
This trains using synthetic data and saves results to artifacts/.

Real-World Data (Breast Cancer):

python tutorials/breast_cancer_data_demo.py
python scripts/train.py data.path=./tutorials/breast_cancer_data.csv data.fallback_to_synthetic=false

Cross-Validation & Ensembling:
```
python scripts/benchmark.py
python scripts/ensemble.py
```
Results and .pt files appear in artifacts/.
Calibration & Evaluation:
```
python -m kronara.evaluate
```

Feature Importance:

python tutorials/feature_importance_demo.py

Hyperparameter Tuning:
```
python scripts/tune_hparams.py
```
Tests:
```
pytest tests
```

Usage Examples

Single Fold Training with Synthetic Data:

python scripts/train.py

Training on Real-World Data:

python tutorials/breast_cancer_data_demo.py
python scripts/train.py data.path=./tutorials/breast_cancer_data.csv data.fallback_to_synthetic=false

Contributing

Fork the repository.
Create a new branch for your feature or bugfix.
Make your changes with clear commit messages.
Submit a pull request with a detailed explanation of changes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

📊 Performance Metrics

The Kronara model was trained and evaluated on a synthetic dataset designed to simulate real-world scenarios. Below are the details of the dataset and the resulting performance metrics:

🗃️ Dataset Details

Number of Samples: 1,000
Number of Features: 20
Dataset Type: Synthetic Binary Classification

🧠 Model Architecture

Model Type: Multi-Layer Perceptron (MLP)
Number of Parameters: 51.2 Million
Model Size: Approximately 204.8 MB
Training Framework: PyTorch Lightning

📈 Training Results

Metric	Value
Accuracy	91.08%
AUC	0.9714
F1 Score	0.9094
Precision	92.34%
Recall	89.58%
Loss	0.2256

🔍 Interpretation

Accuracy (91.08%): The model correctly classified approximately 91% of the instances in the synthetic dataset.
AUC (0.9714): An Area Under the Receiver Operating Characteristic Curve (AUC) of 0.9714 indicates excellent discriminative ability, meaning the model effectively distinguishes between the two classes.
F1 Score (0.9094): The F1 score balances precision and recall, reflecting a high level of performance in identifying true positives while minimizing false positives and false negatives.
Precision (92.34%): This metric shows that when the model predicts a positive class, it is correct 92.34% of the time.
Recall (89.58%): The model successfully identifies 89.58% of all actual positive instances, demonstrating its effectiveness in capturing relevant data points.
Loss (0.2256): The Binary Cross-Entropy loss value indicates the error between the predicted probabilities and the actual labels. A lower loss signifies better model performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kronara

Key Features

Setup Instructions

Usage Examples

Single Fold Training with Synthetic Data:

Training on Real-World Data:

Contributing

License

📊 Performance Metrics

🗃️ Dataset Details

🧠 Model Architecture

📈 Training Results

🔍 Interpretation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
kronara		kronara
scripts		scripts
tests		tests
tutorials		tutorials
LICENSE		LICENSE
README.md		README.md
dev-requirements.txt		dev-requirements.txt
requirements.txt		requirements.txt
setup.py		setup.py

License

jaygwelsh/Kronara

Folders and files

Latest commit

History

Repository files navigation

Kronara

Key Features

Setup Instructions

Usage Examples

Single Fold Training with Synthetic Data:

Training on Real-World Data:

Contributing

License

📊 Performance Metrics

🗃️ Dataset Details

🧠 Model Architecture

📈 Training Results

🔍 Interpretation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages