Skip to content

Commit

Permalink
Add PyTorch linear regression example
Browse files Browse the repository at this point in the history
    This adds a new tutorial example on distributing a linear regression task over OpenFL cluster.

    The model is defined by Pytorch which is able to run over both cpu (by default) and gpu. The dataset is generated by make_regression from sklearn.datasets with pre-defined parameters.

    Fixes 797

    Co-authored-by: Jiang, Jiaqiu <jiaqiu.jiang@intel.com>
    Signed-off-by: He, Dan H <dan.h.he@intel.com>
    Signed-off-by: Jiang, Jiaqiu <jiaqiu.jiang@intel.com>
    Signed-off-by: Li, Qingqing <qingqing.li@intel.com>
    Signed-off-by: Wang, Le <le3.wang@intel.com>
    Signed-off-by: Wu, Caili <caili.wu@intel.com>
  • Loading branch information
danhe1 committed Apr 26, 2023
1 parent 60911a6 commit 8097342
Show file tree
Hide file tree
Showing 9 changed files with 554 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# PyTorch based Linear Regression Tutorial

### 1. About dataset

Generate a random regression problem using `make_regression` from sklearn.datasets with pre-defined parameters.

Define the below param in envoy.yaml config to shard the dataset across participants/envoy.
- rank_worldsize


### 2. About model

Simple Regression Model based on PyTorch.


### 3. How to run this tutorial (without TLC and locally as a simulation):

1. Run director:

```sh
cd director_folder
./start_director.sh
```

2. Run envoy:

Step 1: Activate virtual environment and install packages
```
cd envoy_folder
pip install -r requirements.txt
```
Step 2: start the envoy
```sh
./start_envoy.sh env_instance_1 envoy_config.yaml
```

Optional: start second envoy:

- Copy `envoy_folder` to another place and follow the same process as above:

```sh
./start_envoy.sh env_instance_2 envoy_config.yaml
```

3. Run `torch_linear_regression.ipynb` jupyter notebook:

```sh
cd workspace
jupyter lab torch_linear_regression.ipynb
```

4. Visualization

```
tensorboard --logdir logs/
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
settings:
listen_host: localhost
listen_port: 50050
sample_shape: ['1'] # Modify this param if experimenting with `n_features` of shard_descriptor.
target_shape: ['1']
envoy_health_check_period: 5 # in seconds
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

fx director start --disable-tls -c director_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
params:
cuda_devices: []

optional_plugin_components: {}

shard_descriptor:
template: regression_shard_descriptor.RegressionShardDescriptor
params:
rank_worldsize: 1, 2
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Copyright (C) 2020-2022 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
"""Noisy-Sin Shard Descriptor."""

from typing import List

import numpy as np
import torch
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

from openfl.interface.interactive_api.shard_descriptor import ShardDescriptor


class RegressionShardDescriptor(ShardDescriptor):
"""Regression Shard descriptor class."""

def __init__(self, rank_worldsize: str = '1, 1', **kwargs) -> None:
"""
Initialize Regression Data Shard Descriptor.
This Shard Descriptor generate random regression data with some gaussian centered noise
using make_regression method from sklearn.datasets.
Shards data across participants using rank and world size.
"""

self.rank, self.worldsize = tuple(int(num) for num in rank_worldsize.split(','))
X_train, y_train, X_test, y_test = self.generate_data()
self.data_by_type = {
'train': np.concatenate((X_train, y_train[:, None]), axis=1),
'val': np.concatenate((X_test, y_test[:, None]), axis=1)
}

def generate_data(self):
"""Generate regression dataset with predefined params."""
x, y = make_regression(n_samples=1000, n_features=1, noise=14, random_state=24)
X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=24)
self.data = np.concatenate((x, y[:, None]), axis=1)
return X_train, y_train, X_test, y_test

def get_shard_dataset_types(self) -> List[str]:
"""Get available shard dataset types."""
return list(self.data_by_type)

def get_dataset(self, dataset_type='train'):
"""Return a shard dataset by type."""
if dataset_type not in self.data_by_type:
raise Exception(f'Incorrect dataset type: {dataset_type}')

if dataset_type in ['train', 'val']:
return torch.tensor(self.data_by_type[dataset_type][self.rank - 1::self.worldsize], dtype=torch.float32)
else:
raise ValueError

@property
def sample_shape(self) -> List[str]:
"""Return the sample shape info."""
(*x, _) = self.data[0]
return [str(i) for i in np.array(x, ndmin=1).shape]

@property
def target_shape(self) -> List[str]:
"""Return the target shape info."""
(*_, y) = self.data[0]
return [str(i) for i in np.array(y, ndmin=1).shape]

@property
def dataset_description(self) -> str:
"""Return the dataset description."""
return (f'Regression dataset, shard number {self.rank}'
f' out of {self.worldsize}')
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
openfl>=1.2.1
numpy>=1.13.3
torch>=1.13.1
scikit-learn>=0.24.1
mistune>=2.0.3 # not directly required, pinned by Snyk to avoid a vulnerability
setuptools>=65.5.1 # not directly required, pinned by Snyk to avoid a vulnerability
wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash
set -e
ENVOY_NAME=$1
ENVOY_CONF=$2

fx envoy start -n "$ENVOY_NAME" --disable-tls --envoy-config-path "$ENVOY_CONF" -dh localhost -dp 50050
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
openfl>=1.2.1
numpy>=1.13.3
torch>=1.13.1
scikit-learn>=0.24.1
jupyterlab
setuptools>=65.5.1 # not directly required, pinned by Snyk to avoid a vulnerability
wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability
Loading

0 comments on commit 8097342

Please sign in to comment.