Add PyTorch linear regression example

This adds a new tutorial example on distributing a linear regression task over OpenFL cluster. The model is defined by Pytorch which is able to run over both cpu (by default) and gpu. The dataset is generated by make_regression from sklearn.datasets with pre-defined parameters. Fixes 797 Co-authored-by: Jiang, Jiaqiu <jiaqiu.jiang@intel.com> Signed-off-by: He, Dan H <dan.h.he@intel.com> Signed-off-by: Jiang, Jiaqiu <jiaqiu.jiang@intel.com> Signed-off-by: Li, Qingqing <qingqing.li@intel.com> Signed-off-by: Wang, Le <le3.wang@intel.com> Signed-off-by: Wu, Caili <caili.wu@intel.com>
securefederatedai · Apr 26, 2023 · 8097342 · 8097342
1 parent 60911a6
commit 8097342
Show file tree

Hide file tree

Showing 9 changed files with 554 additions and 0 deletions.
diff --git a/openfl-tutorials/interactive_api/PyTorch_LinearRegression/README.md b/openfl-tutorials/interactive_api/PyTorch_LinearRegression/README.md
@@ -0,0 +1,56 @@
+# PyTorch based Linear Regression Tutorial
+
+### 1. About dataset
+
+Generate a random regression problem using `make_regression` from sklearn.datasets with pre-defined parameters.
+
+Define the below param in envoy.yaml config to shard the dataset across participants/envoy.
+- rank_worldsize
+
+
+### 2. About model
+
+Simple Regression Model based on PyTorch.
+
+
+### 3. How to run this tutorial (without TLC and locally as a simulation):
+
+1. Run director:
+
+```sh
+cd director_folder
+./start_director.sh
+```
+
+2. Run envoy:
+
+Step 1: Activate virtual environment and install packages
+```
+cd envoy_folder
+pip install -r requirements.txt
+```
+Step 2: start the envoy
+```sh
+./start_envoy.sh env_instance_1 envoy_config.yaml
+```
+
+Optional: start second envoy:
+
+- Copy `envoy_folder` to another place and follow the same process as above:
+
+```sh
+./start_envoy.sh env_instance_2 envoy_config.yaml
+```
+
+3. Run `torch_linear_regression.ipynb` jupyter notebook:
+
+```sh
+cd workspace
+jupyter lab torch_linear_regression.ipynb
+```
+
+4. Visualization
+
+```
+tensorboard --logdir logs/
+```
diff --git a/openfl-tutorials/interactive_api/PyTorch_LinearRegression/director/director_config.yaml b/openfl-tutorials/interactive_api/PyTorch_LinearRegression/director/director_config.yaml
@@ -0,0 +1,6 @@
+settings:
+  listen_host: localhost
+  listen_port: 50050
+  sample_shape: ['1'] # Modify this param if experimenting with `n_features` of shard_descriptor.
+  target_shape: ['1']
+  envoy_health_check_period: 5  # in seconds
diff --git a/openfl-tutorials/interactive_api/PyTorch_LinearRegression/director/start_director.sh b/openfl-tutorials/interactive_api/PyTorch_LinearRegression/director/start_director.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+set -e
+
+fx director start --disable-tls -c director_config.yaml
diff --git a/openfl-tutorials/interactive_api/PyTorch_LinearRegression/envoy/envoy_config.yaml b/openfl-tutorials/interactive_api/PyTorch_LinearRegression/envoy/envoy_config.yaml
@@ -0,0 +1,9 @@
+params:
+  cuda_devices: []
+
+optional_plugin_components: {}
+
+shard_descriptor:
+  template: regression_shard_descriptor.RegressionShardDescriptor
+  params:
+    rank_worldsize: 1, 2
diff --git a/...l-tutorials/interactive_api/PyTorch_LinearRegression/envoy/regression_shard_descriptor.py b/...l-tutorials/interactive_api/PyTorch_LinearRegression/envoy/regression_shard_descriptor.py
@@ -0,0 +1,71 @@
+# Copyright (C) 2020-2022 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+"""Noisy-Sin Shard Descriptor."""
+
+from typing import List
+
+import numpy as np
+import torch
+from sklearn.datasets import make_regression
+from sklearn.model_selection import train_test_split
+
+from openfl.interface.interactive_api.shard_descriptor import ShardDescriptor
+
+
+class RegressionShardDescriptor(ShardDescriptor):
+    """Regression Shard descriptor class."""
+
+    def __init__(self, rank_worldsize: str = '1, 1', **kwargs) -> None:
+        """
+        Initialize Regression Data Shard Descriptor.
+
+        This Shard Descriptor generate random regression data with some gaussian centered noise
+        using make_regression method from sklearn.datasets.
+        Shards data across participants using rank and world size.
+        """
+
+        self.rank, self.worldsize = tuple(int(num) for num in rank_worldsize.split(','))
+        X_train, y_train, X_test, y_test = self.generate_data()
+        self.data_by_type = {
+            'train': np.concatenate((X_train, y_train[:, None]), axis=1),
+            'val': np.concatenate((X_test, y_test[:, None]), axis=1)
+        }
+
+    def generate_data(self):
+        """Generate regression dataset with predefined params."""
+        x, y = make_regression(n_samples=1000, n_features=1, noise=14, random_state=24)
+        X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=24)
+        self.data = np.concatenate((x, y[:, None]), axis=1)
+        return X_train, y_train, X_test, y_test
+
+    def get_shard_dataset_types(self) -> List[str]:
+        """Get available shard dataset types."""
+        return list(self.data_by_type)
+
+    def get_dataset(self, dataset_type='train'):
+        """Return a shard dataset by type."""
+        if dataset_type not in self.data_by_type:
+            raise Exception(f'Incorrect dataset type: {dataset_type}')
+
+        if dataset_type in ['train', 'val']:
+            return torch.tensor(self.data_by_type[dataset_type][self.rank - 1::self.worldsize], dtype=torch.float32)
+        else:
+            raise ValueError
+
+    @property
+    def sample_shape(self) -> List[str]:
+        """Return the sample shape info."""
+        (*x, _) = self.data[0]
+        return [str(i) for i in np.array(x, ndmin=1).shape]
+
+    @property
+    def target_shape(self) -> List[str]:
+        """Return the target shape info."""
+        (*_, y) = self.data[0]
+        return [str(i) for i in np.array(y, ndmin=1).shape]
+
+    @property
+    def dataset_description(self) -> str:
+        """Return the dataset description."""
+        return (f'Regression dataset, shard number {self.rank}'
+                f' out of {self.worldsize}')
diff --git a/openfl-tutorials/interactive_api/PyTorch_LinearRegression/envoy/requirements.txt b/openfl-tutorials/interactive_api/PyTorch_LinearRegression/envoy/requirements.txt
@@ -0,0 +1,7 @@
+openfl>=1.2.1
+numpy>=1.13.3
+torch>=1.13.1
+scikit-learn>=0.24.1
+mistune>=2.0.3 # not directly required, pinned by Snyk to avoid a vulnerability
+setuptools>=65.5.1 # not directly required, pinned by Snyk to avoid a vulnerability
+wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability
diff --git a/openfl-tutorials/interactive_api/PyTorch_LinearRegression/envoy/start_envoy.sh b/openfl-tutorials/interactive_api/PyTorch_LinearRegression/envoy/start_envoy.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+set -e
+ENVOY_NAME=$1
+ENVOY_CONF=$2
+
+fx envoy start -n "$ENVOY_NAME" --disable-tls --envoy-config-path "$ENVOY_CONF" -dh localhost -dp 50050
diff --git a/openfl-tutorials/interactive_api/PyTorch_LinearRegression/workspace/requirements.txt b/openfl-tutorials/interactive_api/PyTorch_LinearRegression/workspace/requirements.txt
@@ -0,0 +1,7 @@
+openfl>=1.2.1
+numpy>=1.13.3
+torch>=1.13.1
+scikit-learn>=0.24.1
+jupyterlab
+setuptools>=65.5.1 # not directly required, pinned by Snyk to avoid a vulnerability
+wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability