Skip to content

Releases: pikarpov-LANL/Sapsan

v0.4.3

28 Aug 04:41
0051eda
Compare
Choose a tag to compare

Changes

PyPi Release

  • ./examples directory is now included in the release
    • added MANIFEST.in

Command Line Interface (CLI)

  • Fixed: sapsan get_examples command

v0.4.0

27 Aug 23:52
59d8f63
Compare
Choose a tag to compare

Changes

General

  • Loaders for Train and Evaluate now have the same format
  • The functions above have an identical interface for both PyTorch and Sklearn

Estimators

  • Fixed: Model Saving & Loading

    • loaded models can continue to be trained
    • upon their initialization when loading the model, you can redefine its previous config, such as n_epoch, lr
    • optimizer dict state is correctly saved and loaded
    • optimizer state is moved to cpu or gpu depending on a setup (catalyst doesn't do it on its own, which caused issues when evaluating a loaded model)
    • Added dummy estimators for loading (otherwise all estimators have load and save)
      • load_estimator() for torch
      • load_sklearn_estimator() for sklearn
  • Reworked how the models are initialized

    • upon calling the estimator, ex: estimator = CNN3d(loaders=loaders)
      • before: when training the model, upon estimator.train
    • model initialization requires to provide loaders now
  • All self vars in ModelConfig() get recorded in tracking by default

  • Added options in ModelConfig()

    • lr and min_lr - learning rate parameters are no longer hard-coded
    • device - sets a specific device to run the models on, either cpu or cuda
  • Added sklearn_backend and torch_backend to be used by all estimators

    • sklearn-based estimators have a structure close to torch-based
    • pytorch_estimator -> torch_backend
    • cleared up variable name conventions throughout

Evaluation

  • Evaluate and Data loader accept data without target
    • useful when there is no ground truth to compare to
    • will still output pdf, cdf, and spatial, without comparison metrics
    • Evaluate.run() now output a dict of "pred_cube" and "target_cube" (if the latter is provided)
  • PDF and CDF plots are now combined under a single figure
    • recorded as 'pdf_cdf.png' in MLflow
  • Fixed: definition of n_output_channel in Evaluate()

Command Line Interface (CLI)

  • Added new option: sapsan create --ddp option copies torch_backend.py

    • gives ability to customize Catalyst Runner
    • adjust DDP settings based on the linked Catalyst DDP tutorial in the Wiki
    • will be useful when running on HPC
    • refer to Parallel GPU Training on the Wiki for more details
  • Fixed: CLI click initialization

Graphical User Interface (GUI)

  • Up to date with Streamlit 0.87.0
  • PDF and CDF plots are now showed as well
  • Fixed: data loading issue in regards to train_fraction

MLflow

  • MLflow: evaluate runs will be nested under the recent train run
    • significantly aids organization
  • Added estimator.model.forward() to be recorded by MLflow (if torch is used)

Plotting

  • Plotting routines return Axes object
  • All parameters are changed for the Axes instead of plt which allows individual tweaking after return
  • figsize and ax arguments added to most plotting routines
    • useful if you create a figure and subplots outside of the plotting routines
  • Universal plotting params expanded and were made easily accessible through plot_params()

Other

  • Edited the examples, tests, and estimator template to reflect model initialization changes
  • Requirements Updated:
    • streamlit >= 0.87.0
    • plotly >= 5.2.0
    • tornado >= 6.1.0
    • notebook >= 6.4.3 (fixes security vulnerabilities)
  • Added a few data_loader warnings
  • Cleaned up debug prints throughout the code
  • Expanded code comments

v0.3.0

12 Aug 01:43
801dfa1
Compare
Choose a tag to compare

Changes

Command Line Interface (CLI)

  • New & Changed Commands
    • sapsan create --name {name} or sapsan create -n {name} - creates a custom project template tree
    • sapsan test - runs pytest to make sure Sapsan is working correctly
    • sapsan get_examples - copies Sapsan's examples into the current working directory for easy access
    • sapsan --version - to check the installed version
  • Updated CLI options and --help

Testing

  • moved tests to Sapsan's root folder, so they are always accessible when installed via pip
  • notebook tests don't create a separate folder, but test on the existing example notebooks
  • sapsan test allows for the user to run pytest tests to check if everything was installed correctly

Custom Estimators

  • To get started, run: sapsan create -n {name}
    • where {name} should be replaced with your custom project name
  • Significantly simplified the template structure, making it much easier to navigate and get started
    • cleaned up by removing all unnecessary templates, leaving a few scripts that allow to customize the estimator (i.e. ML network), Jupyter Notebook interface, and Dockerfile to easily share your project
    • pre-filled all templates with the custom project name

Examples

  • sapsan get_examples: copies examples into ./sapsan_examples
    • makes the example jupyter notebooks with various ML algorithms easily accessible

Other

v0.2.11

28 Jul 13:27
2ab3e70
Compare
Choose a tag to compare

Changes

Testing

  • notebook testing errors trigger pytest
  • estimator loading testing is the same for both PyTorch and Scikit-learn

MLflow

  • for evaluation, some model training pars and metrics are also recorded to make it easier to see with what model the prediction was done

PICAE Estimator

  • removed redundant torch.device pass for PICAE estimator

Estimators

  • simplified ModelConfig() in both included estimators and custom template
    • ModelConfig now only has __init__
    • moved load() to backend, removed to_dict()
    • added load_config() under core Estimator class in models.py to enforce it to be included in custom estimators (most cases won't be affected, as it is part of the backend pytorch_estimator.py)
    • changes reflected in the estimator template + wiki
    • Save config now saves all input parameters instead of only the ones tracked by MLflow
    • Loading is now consistent between PyTorch and scikit-learn based models

Other

  • minor formatting fixes in examples
  • minor bug fixes

v0.2.10

25 Jul 20:20
044764c
Compare
Choose a tag to compare

Changes

There are no new features in this update, but numerous quality of life improvements and bug fixes.

Data Loader

  • fixed multi-checkpoint loading for hdf5 module
    • they are added as new batches to the loaded data np.array
  • fixed splitting by batches
    • the rounding error when computing the fraction to split into batches has been corrected
    • default batch_num=1, hence batch_size = input_size if it is not specified (no need to specify the batching in the loader)
  • fixed input data size processing, when sampling is not specified

Backend

  • FakeBackend is now the default if backend is not specified

Sampling

  • no longer need to pass original input shape
    • it is determined from the data itself
  • added a warning on dealing with irregular shapes
    • the function will try to match the requested shape, but if it can't - a warning is issued

Tensor Calculation

  • fixed Tensor calculation indices
  • added "only_x_components" flag
    • at times full tensor is too taxing on memory

Requirements

  • corrected installation requirements under python 3.8.10+

v0.2.9

16 Jul 04:11
560aba3
Compare
Choose a tag to compare

Changes

Parallel GPU

  • Parallel GPU training is done via Distributed Data Parallel (DDP), through Catalyst
    • Besides locally, capable of running on HPC
  • Fixes in train and evaluation methods to account for DDP
  • Added a Parallel GPU tutorial on Wiki

Physics

  • Gradient Model - full 3D tensor is now calculated, fixes to the algorithm
  • Dynamic Smagorinsky Model - fully operational, outputs full 3D tensor
  • Added tensor() function to calculate a stress tensor (see API entry)
  • assertion checks in various functions to make the errors more intelligible

Docker

MLflow

  • Updated compatibility with MLflow 18.0
  • No need to restart kernel when re-running the model in jupyter notebooks
  • Fixed compatibility with Catalist 21.7 logging
  • Added MLflow tutorial on Wiki

GUI

  • Fixed compatibility with Catalist 21.7 - correct logging and plotting

Other

v0.2.8

24 Jun 19:47
601483a
Compare
Choose a tag to compare

Changes

  • requirement updates
  • simplified hierarchy
  • compatibility with Catalyst 21.5+
  • removed automatic PyTorch install: user will need to do it manually

v0.2.7

30 Apr 17:34
0138994
Compare
Choose a tag to compare

Changes

  • fixed PICAE in pypi

v0.2.6

30 Apr 14:14
03e285c
Compare
Choose a tag to compare

Changes

  • GUI fixes - brought up to date with the latest backend
  • minor bug fixes

v0.2.5

02 Apr 05:28
29b3ab2
Compare
Choose a tag to compare

Changes

  • Physics Informed CAE method

    • reworked PICAE now adheres Sapsan's interface (sapsan/lib/estimators/picae.py)
    • added PICAE example with random data
  • Data loaders

    • loaders are now consistent between pytorch and sklearn
    • train() takes in loaders, instead of inputs & targets separately. Those can be either Pytorch.Dataloader or a list of inputs & targets (i.e loaders = [x,y]). This is done to accommodate Pytorch-based and sklearn-based models, which require the input of different formats.
      • data can be loaded as a numpy array by calling load_numpy(), instead of load
      • loaded numpy data can be converted to torch dataloader via convert_to_torch([x, y])
      • alternatively, both of the steps can be combined by just calling load()
    • cleaned up data_functions; added new methods: flatten, split_by_batch, get_loader_shape
    • corrected split into train and valid datasets + enhanced with train_test_split function from sklearn
    • added new params to data_loader: train_fraction, shuffle
    • support for irregular input data shapes
  • Examples

    • added PICAE example with random data
    • cleaned up examples further
  • Estimators

    • cleaned up CNN3d estimator, deleted legacy functions
    • further generalized Pytorch estimator to be used as a backend for any PyTorch-based models
  • Templates

    • reflect the changes from above - streamlined
  • Tests

    • added PICAE related tests on push
  • Misc

    • further improvement to backend handling of data transformations
    • minor bug fixes