Releases: pikarpov-LANL/Sapsan
Releases · pikarpov-LANL/Sapsan
v0.4.3
Changes
PyPi Release
./examples
directory is now included in the release- added MANIFEST.in
Command Line Interface (CLI)
- Fixed:
sapsan get_examples
command
v0.4.0
Changes
General
- Loaders for Train and Evaluate now have the same format
- The functions above have an identical interface for both PyTorch and Sklearn
Estimators
-
Fixed: Model Saving & Loading
- loaded models can continue to be trained
- upon their initialization when loading the model, you can redefine its previous config, such as n_epoch, lr
- optimizer dict state is correctly saved and loaded
- optimizer state is moved to cpu or gpu depending on a setup (catalyst doesn't do it on its own, which caused issues when evaluating a loaded model)
- Added dummy estimators for loading (otherwise all estimators have
load
andsave
)load_estimator()
for torchload_sklearn_estimator()
for sklearn
-
Reworked how the models are initialized
- upon calling the estimator, ex:
estimator = CNN3d(loaders=loaders)
- before: when training the model, upon
estimator.train
- before: when training the model, upon
- model initialization requires to provide
loaders
now
- upon calling the estimator, ex:
-
All
self
vars inModelConfig()
get recorded in tracking by default -
Added options in
ModelConfig()
lr
andmin_lr
- learning rate parameters are no longer hard-codeddevice
- sets a specific device to run the models on, either cpu or cuda
-
Added
sklearn_backend
andtorch_backend
to be used by all estimators- sklearn-based estimators have a structure close to torch-based
- pytorch_estimator -> torch_backend
- cleared up variable name conventions throughout
Evaluation
- Evaluate and Data loader accept data without target
- useful when there is no ground truth to compare to
- will still output pdf, cdf, and spatial, without comparison metrics
Evaluate.run()
now output adict
of"pred_cube"
and"target_cube"
(if the latter is provided)
- PDF and CDF plots are now combined under a single figure
- recorded as 'pdf_cdf.png' in MLflow
- Fixed: definition of
n_output_channel
in Evaluate()
Command Line Interface (CLI)
-
Added new option:
sapsan create --ddp
option copiestorch_backend.py
- gives ability to customize Catalyst Runner
- adjust DDP settings based on the linked Catalyst DDP tutorial in the Wiki
- will be useful when running on HPC
- refer to Parallel GPU Training on the Wiki for more details
-
Fixed: CLI click initialization
Graphical User Interface (GUI)
- Up to date with Streamlit 0.87.0
- PDF and CDF plots are now showed as well
- Fixed: data loading issue in regards to
train_fraction
MLflow
- MLflow: evaluate runs will be nested under the recent train run
- significantly aids organization
- Added
estimator.model.forward()
to be recorded by MLflow (if torch is used)
Plotting
- Plotting routines return
Axes
object - All parameters are changed for the
Axes
instead ofplt
which allows individual tweaking afterreturn
figsize
andax
arguments added to most plotting routines- useful if you create a figure and subplots outside of the plotting routines
- Universal plotting params expanded and were made easily accessible through
plot_params()
Other
- Edited the examples, tests, and estimator template to reflect model initialization changes
- Requirements Updated:
- streamlit >= 0.87.0
- plotly >= 5.2.0
- tornado >= 6.1.0
- notebook >= 6.4.3 (fixes security vulnerabilities)
- Added a few data_loader warnings
- Cleaned up debug prints throughout the code
- Expanded code comments
v0.3.0
Changes
Command Line Interface (CLI)
- New & Changed Commands
sapsan create --name {name}
orsapsan create -n {name}
- creates a custom project template treesapsan test
- runs pytest to make sure Sapsan is working correctlysapsan get_examples
- copies Sapsan's examples into the current working directory for easy accesssapsan --version
- to check the installed version
- Updated CLI options and --help
Testing
- moved tests to Sapsan's root folder, so they are always accessible when installed via pip
- notebook tests don't create a separate folder, but test on the existing example notebooks
sapsan test
allows for the user to run pytest tests to check if everything was installed correctly
Custom Estimators
- To get started, run:
sapsan create -n {name}
- where
{name}
should be replaced with your custom project name
- where
- Significantly simplified the template structure, making it much easier to navigate and get started
- cleaned up by removing all unnecessary templates, leaving a few scripts that allow to customize the estimator (i.e. ML network), Jupyter Notebook interface, and Dockerfile to easily share your project
- pre-filled all templates with the custom project name
Examples
sapsan get_examples
: copies examples into./sapsan_examples
- makes the example jupyter notebooks with various ML algorithms easily accessible
Other
sapsan
now has__version__
attribute- Updated the following sections in Documentation to reflect the CLI and template changes
- Minor comment updates to example notebooks
v0.2.11
Changes
Testing
- notebook testing errors trigger pytest
- estimator loading testing is the same for both PyTorch and Scikit-learn
MLflow
- for evaluation, some model training pars and metrics are also recorded to make it easier to see with what model the prediction was done
PICAE Estimator
- removed redundant torch.device pass for PICAE estimator
Estimators
- simplified ModelConfig() in both included estimators and custom template
- ModelConfig now only has __init__
- moved load() to backend, removed to_dict()
- added load_config() under core Estimator class in models.py to enforce it to be included in custom estimators (most cases won't be affected, as it is part of the backend pytorch_estimator.py)
- changes reflected in the estimator template + wiki
- Save config now saves all input parameters instead of only the ones tracked by MLflow
- Loading is now consistent between PyTorch and scikit-learn based models
Other
- minor formatting fixes in examples
- minor bug fixes
v0.2.10
Changes
There are no new features in this update, but numerous quality of life improvements and bug fixes.
Data Loader
- fixed multi-checkpoint loading for hdf5 module
- they are added as new batches to the loaded data np.array
- fixed splitting by batches
- the rounding error when computing the fraction to split into batches has been corrected
- default batch_num=1, hence batch_size = input_size if it is not specified (no need to specify the batching in the loader)
- fixed input data size processing, when sampling is not specified
Backend
- FakeBackend is now the default if backend is not specified
Sampling
- no longer need to pass original input shape
- it is determined from the data itself
- added a warning on dealing with irregular shapes
- the function will try to match the requested shape, but if it can't - a warning is issued
Tensor Calculation
- fixed Tensor calculation indices
- added "only_x_components" flag
- at times full tensor is too taxing on memory
Requirements
- corrected installation requirements under python 3.8.10+
v0.2.9
Changes
Parallel GPU
- Parallel GPU training is done via Distributed Data Parallel (DDP), through Catalyst
- Besides locally, capable of running on HPC
- Fixes in train and evaluation methods to account for DDP
- Added a Parallel GPU tutorial on Wiki
Physics
- Gradient Model - full 3D tensor is now calculated, fixes to the algorithm
- Dynamic Smagorinsky Model - fully operational, outputs full 3D tensor
- Added
tensor()
function to calculate a stress tensor (see API entry) - assertion checks in various functions to make the errors more intelligible
Docker
- Minor quality of life simplifications
- Added a Docker tutorial on Wiki
MLflow
- Updated compatibility with MLflow 18.0
- No need to restart kernel when re-running the model in jupyter notebooks
- Fixed compatibility with Catalist 21.7 logging
- Added MLflow tutorial on Wiki
GUI
- Fixed compatibility with Catalist 21.7 - correct logging and plotting
Other
- Expanded Custom Estimator tutorial on Wiki
- Added Community Guidelines to add new ML models and contribute to Sapsan
- Set up a jupyter notebook example on Google Colab to play around with Sapsan
- Minor bug fixes
v0.2.8
Changes
- requirement updates
- simplified hierarchy
- compatibility with Catalyst 21.5+
- removed automatic PyTorch install: user will need to do it manually
v0.2.7
Changes
- fixed PICAE in pypi
v0.2.6
Changes
- GUI fixes - brought up to date with the latest backend
- minor bug fixes
v0.2.5
Changes
-
Physics Informed CAE method
- reworked PICAE now adheres Sapsan's interface (sapsan/lib/estimators/picae.py)
- added PICAE example with random data
-
Data loaders
- loaders are now consistent between pytorch and sklearn
train()
takes in loaders, instead of inputs & targets separately. Those can be eitherPytorch.Dataloader
or a list of inputs & targets (i.eloaders = [x,y]
). This is done to accommodate Pytorch-based and sklearn-based models, which require the input of different formats.- data can be loaded as a numpy array by calling
load_numpy()
, instead of load - loaded numpy data can be converted to torch dataloader via
convert_to_torch([x, y])
- alternatively, both of the steps can be combined by just calling
load()
- data can be loaded as a numpy array by calling
- cleaned up data_functions; added new methods:
flatten, split_by_batch, get_loader_shape
- corrected split into train and valid datasets + enhanced with train_test_split function from sklearn
- added new params to data_loader:
train_fraction, shuffle
- support for irregular input data shapes
-
Examples
- added PICAE example with random data
- cleaned up examples further
-
Estimators
- cleaned up CNN3d estimator, deleted legacy functions
- further generalized Pytorch estimator to be used as a backend for any PyTorch-based models
-
Templates
- reflect the changes from above - streamlined
-
Tests
- added PICAE related tests on push
-
Misc
- further improvement to backend handling of data transformations
- minor bug fixes