Quickstart SMPC

Quickstart (SMPC)¶

In this tutorial, we will describe how to use IFLeaner to complete image classification federated training under the MNIST dataset with SMPC.

our example contains two clients and one server by default. In each round of training, the client is responsible for training and uploading the model parameters to the server, the server aggregates, and sends the aggregated global model parameters to each client, and then each client updates the aggregated model parameters. Multiple rounds will be repeated.

First of all, we highly recommend to create a python virtual environment to run, you can use virtual tools such as virtualenv, pyenv, conda, etc.

Next, we can quickly install the IFLearner library with the following command:

pip install iflearner
````

Also, since we want to use PyTorch for image classification tasks on MNIST data, we need to go ahead and install the PyTorch and torchvision libraries:
```shell
pip install torch==1.7.1 torchvision==0.8.2
````

### Ifleaner Server

1.  Create a new file named `server.py`, import iflearner:
```python
from iflearner.business.homo.aggregate_server import main

if __name__ == "__main__":
    main()

You can start the server with the follow command:
```
python server.py -n 2
```
-n 2: Receive two clients for federated training

Ifleaner Client¶

Create a new file named quickstart_pytorch.py and do the following.

1. Define Model Network¶

Firstly, you need define your model network by using PyTorch.

from torch import nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self, num_channels, num_classes):
        super().__init__()
        self.conv1 = nn.Conv2d(num_channels, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, num_classes)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, x.shape[1]*x.shape[2]*x.shape[3])
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

2. Implement Trainer Class¶

Secondly, you need implement your trainer class, inheriting from the iflearner.business.homo.trainer.Trainer class. The class need to implement four functions, which are get, set, fit and evaluate. We also have provided a iflearner.business.homo.pytorch_trainer.PyTorchTrainer inheriting from the iflearner.business.homo.trainer.Trainer class, which has implement usual get and set functions.

You can use this class as the follow:

import torch
import torch.optim as optim
import torch.nn.functional as F
from torchvision import datasets, transforms
from iflearner.business.homo.train_client import Controller
from iflearner.business.homo.pytorch_trainer import PyTorchTrainer

class Mnist(PyTorchTrainer):
    def __init__(self, lr=0.15, momentum=0.5) -> None:
        self._lr = lr
        self._device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
        print(f'device: {self._device}')
        self._model = Model(num_channels=1, num_classes=10).to(self._device)

        super().__init__(self._model)

        self._optimizer = optim.SGD(self._model.parameters(), lr=lr, momentum=momentum)
        self._loss = F.nll_loss

        apply_transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
        train_dataset = datasets.MNIST("./data", train=True, download=True, transform=apply_transform)
        test_dataset = datasets.MNIST("./data", train=False, download=True, transform=apply_transform)
        self._train_data = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
        self._test_data = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)

    def fit(self, epoch):
        self._model.to(self._device)
        self._model.train()
        print(f"Epoch: {epoch}, the size of training dataset: {len(self._train_data.dataset)}, batch size: {len(self._train_data)}")
        for batch_idx, (data, target) in enumerate(self._train_data):
            data, target = data.to(self._device), target.to(self._device)
            self._optimizer.zero_grad()
            output = self._model(data)
            loss = self._loss(output, target)
            loss.backward()
            self._optimizer.step()

    def evaluate(self, epoch):
        self._model.to(self._device)
        self._model.eval()
        test_loss = 0
        correct = 0
        print(f"The size of testing dataset: {len(self._test_data.dataset)}")
        with torch.no_grad():
            for data, target in self._test_data:
                data, target = data.to(self._device), target.to(self._device)
                output = self._model(data)
                test_loss += self._loss(output, target, reduction='sum').item()  # sum up batch loss
                pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
                correct += pred.eq(target.view_as(pred)).sum().item()
        test_loss /= len(self._test_data.dataset)

        print('Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)'.format(
            test_loss, correct, len(self._test_data.dataset),
            1.   * correct / len(self._test_data.dataset))) 

        return {'loss': test_loss, 'acc': correct}
 ```

#### 3. Start Ifleaner Client

Lastly, you need to write a `main` function to start your client. 

You can do it as the follow:

```python

from iflearner.business.homo.argument import parser

if __name__ == '__main__':
    args = parser.parse_args()
    print(args)

    mnist = Mnist()
    controller = Controller(args, mnist)
    controller.run()

In the main function, you need import parser from iflearner.business.homo.argument firstly and then call parser.parse_args, because we provided some common arguments that need to be parsered. If you want to add addtional arguments for yourself, you can call parser.add_argument repeatedly to add them before parser.parse_args has been called. After parsered arguments, you can create your trainer instance base on previous implemented class, and put it with args to train_client.Controller. In the end, you just need call controller.run to run your client.

You can use follow command to start the first client:

python quickstart_pytorch.py --name client01 --epochs 2 --server "0.0.0.0:50001" --peers "0.0.0.0:50012;0.0.0.0:50013"

Configure peers to use smpc, peers are configured as the listening address of all clients, and the first address is the listening address of the client

Open another terminal and start the second client:

python quickstart_pytorch.py --name client02 --epochs 2  --server "0.0.0.0:50001" --peers "0.0.0.0:50013;0.0.0.0:50012"

Configure peers to use smpc, peers are configured as the listening address of all clients, and the first address is the listening address of the client

After both clients are ready and started, we can see log messages similar to the following on either client

device: cpu 2022-08-08 19:39:37.971 | INFO 2022-08-08 19:39:37.976 | INFO 2022-08-08 19:39:37.977 | INFO 2022-08-08 19:39:37.980 | INFO 2022-08-08 19:39:37.983 | INFO 2022-08-08 19:39:37.988 | INFO 2022-08-08 19:39:37.988 | INFO 2022-08-08 19:39:48.296 | INFO 2022-08-08 19:39:48.298 | INFO 2022-08-08 19:39:48.298 | INFO 2022-08-08 19:39:49.023 | INFO 2022-08-08 19:39:49.025 | INFO 2022-08-08 19:39:49.027 | INFO 2022-08-08 19:39:50.030 | INFO 2022-08-08 19:39:51.033 | INFO Epoch: 1, the size of 2022-08-08 19:40:20.664 | INFO The size of testing dataset: Test set: Average loss: 2022-08-08 19:40:23.830 | INFO 2022-08-08 19:40:23.858 | INFO 2022-08-08 19:40:24.168 | INFO 2022-08-08 19:40:24.862 | INFO 2022-08-08 19:40:24.865 | INFO 2022-08-08 19:40:25.173 | INFO 2022-08-08 19:40:25.871 | INFO Epoch: 2, the size of 2022-08-08 19:41:00.230 | INFO The size of testing dataset: Test set: Average loss: 2022-08-08 19:41:03.992 | INFO 2022-08-08 19:41:04.020 | INFO 2022-08-08 19:41:04.367 | INFO 2022-08-08 19:41:05.024 | INFO 2022-08-08 19:41:05.027 | INFO 2022-08-08 19:41:05.374 | INFO 2022-08-08 19:41:06.032 | INFO Epoch: 3, the size of 2022-08-08 19:41:33.934 | INFO The size of testing dataset: Test set: Average loss: 2022-08-08 19:41:37.425 | INFO 2022-08-08 19:41:37.492 | INFO 2022-08-08 19:41:37.514 | INFO 2022-08-08 19:41:38.496 | INFO 2022-08-08 19:41:38.498 | INFO 2022-08-08 19:41:38.519 | INFO 2022-08-08 19:41:39.503 | INFO Epoch: 4, the size of 2022-08-08 19:42:12.085 | INFO The size of testing dataset: Test set: Average loss: 2022-08-08 19:42:15.499 | INFO 2022-08-08 19:42:15.513 | INFO 2022-08-08 19:42:15.694 | INFO 2022-08-08 19:42:16.517 | INFO 2022-08-08 19:42:16.519 | INFO 2022-08-08 19:42:16.701 | INFO 2022-08-08 19:42:17.524 | INFO Epoch: 5, the size of 2022-08-08 19:42:53.300 | INFO The size of testing dataset: Test set: Average loss: 2022-08-08 19:42:56.654 | INFO 2022-08-08 19:42:56.667 | INFO 2022-08-08 19:42:56.892 | INFO 2022-08-08 19:42:57.672 | INFO 2022-08-08 19:42:57.675 | INFO 2022-08-08 19:42:57.898 | INFO 2022-08-08 19:42:58.679 | INFO Epoch: 6, the size of 2022-08-08 19:43:26.257 | INFO The size of testing dataset: Test set: Average loss: 2022-08-08 19:43:30.128 | INFO 2022-08-08 19:43:30.143 | INFO 2022-08-08 19:43:31.048 | INFO 2022-08-08 19:43:31.148 | INFO 2022-08-08 19:43:31.151 | INFO 2022-08-08 19:43:32.055 | INFO 2022-08-08 19:43:32.153 | INFO Epoch: 7, the size of 2022-08-08 19:43:58.396 | INFO The size of testing dataset: Test set: Average loss: 2022-08-08 19:44:01.113 | INFO 2022-08-08 19:44:01.125 | INFO 2022-08-08 19:44:02.184 | INFO 2022-08-08 19:44:03.132 | INFO 2022-08-08 19:44:03.135 | INFO 2022-08-08 19:44:03.188 | INFO 2022-08-08 19:44:04.140 | INFO Epoch: 8, the size of 2022-08-08 19:44:34.161 | INFO The size of testing dataset: Test set: Average loss: 2022-08-08 19:44:37.132 | INFO 2022-08-08 19:44:37.151 | INFO 2022-08-08 19:44:37.328 | INFO 2022-08-08 19:44:38.153 | INFO 2022-08-08 19:44:38.156 | INFO 2022-08-08 19:44:38.335 | INFO 2022-08-08 19:44:39.161 | INFO Epoch: 9, the size 2022-08-08 19:45:04.166 | INFO The size of testing Test set: Average 2022-08-08 19:45:06.821 | INFO 2022-08-08 19:45:06.841 | INFO 2022-08-08 19:45:07.453 | INFO 2022-08-08 19:45:07.846 | INFO 2022-08-08 19:45:07.848 | INFO 2022-08-08 19:45:09.465 | INFO 2022-08-08 19:45:09.858 | INFO Epoch: 10, the size 2022-08-08 19:45:48.696 | INFO The size of testing Test set: Average 2022-08-08 19:45:53.514 | INFO label: FT, points: label: LT, points: label: FT, points: label: LT, points: 2022-08-08 19:45:54.253 | INFO

congratulations! You have successfully built and run your first federated learning system. The complete terminal: neno-6-1">Namespace(name='client1', epochs=10, server='0.0.0.0:50001', enable_ll=0, peers='0.0.0.0:50012;0.0.0.0:50013', cert=None) | iflearner.business.homo.train_client:run:89 - register to server | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_register, time: 4.347216000000209ms | iflearner.business.homo.train_client:run:106 - use strategy: FedAvg | iflearner.communication.peer.peer_client:get_DH_public_key:43 - Public key: b'\x01\x01\xd0\xf2\xa5\xc0-\x7f\x1b\x88\xcb\xc8\\\x91Ra,4\n\xd4]\x97\x99zs\xae7\x1cK]]\x0c\x06\x85\xa1\xb5\x82\x03.\x9a\xe0m\xa3>#\xf7(\xb3x\x89m\xfa\xfbu\x9ca\x95\xf4\x80GA\xd8z\x8fKs\xe0\x98\xe3\x7fX.\xe2Ej\x04c\x08\xcf\xdeF\'\xcc("@q[\xa5\xdf\xb4#\x1c\xd6\xd8\xd1\x05?\x06tO\xfa~Z\x12\x14\x1e\xba\xbe\xaa\xe5/}\xb1Y\xde]\xd8\\\x17\x9cE\xf3Z\xae(\xbfDsf' | iflearner.business.homo.train_client:do_smpc:73 - secret: 122099796455175621216112096188958830464477667871351715488133066776583905683428683953037290811910449486061004359077608812182806437003480898524406977052511968565063977753455848242565116696939311359584774021461025748485006418803112297959930199282888061745382056940467943791248215701671956530776589875636959329985, type: <class 'str'> | iflearner.communication.peer.peer_client:get_SMPC_random_key:56 - Random float: 0.6339090663897411 | iflearner.business.homo.train_client:do_smpc:77 - random value: 0.6339090663897411 | iflearner.communication.peer.peer_server:send:46 - IN: party: client2, message type: msg_dh_public_key | iflearner.communication.peer.peer_server:send:46 - IN: party: client2, message type: msg_smpc_random_key | iflearner.communication.peer.peer_server:send:56 - Party: client2, Random float: 0.3922334767649399 | iflearner.business.homo.train_client:do_smpc:80 - sum all random values: 0.24167558962480118 | iflearner.business.homo.train_client:run:139 - report client ready | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_client_ready, time: 2.0064270000013096ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_notify_training, time: 1.7326500000010014ms | iflearner.business.homo.train_client:run:149 - ----- fit <FT> ----- training dataset: 60000, batch size: 938 | iflearner.business.homo.train_client:run:167 - ----- evaluate <FT> ----- 10000 0.1058, Accuracy: 9694/10000 (96.94%) | iflearner.business.homo.train_client:run:178 - ----- get <FT> ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_upload_param, time: 17.50900600000449ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_aggregate_result, time: 10.504099000002043ms | iflearner.business.homo.train_client:run:221 - ----- set ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_client_ready, time: 1.2220939999991742ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_notify_training, time: 1.2624359999975354ms | iflearner.business.homo.train_client:run:149 - ----- fit <FT> ----- training dataset: 60000, batch size: 938 | iflearner.business.homo.train_client:run:167 - ----- evaluate <FT> ----- 10000 0.0998, Accuracy: 9726/10000 (97.26%) | iflearner.business.homo.train_client:run:178 - ----- get <FT> ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_upload_param, time: 17.794709000000353ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_aggregate_result, time: 9.483094000003689ms | iflearner.business.homo.train_client:run:221 - ----- set ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_client_ready, time: 1.9964000000101123ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_notify_training, time: 1.6533630000026278ms | iflearner.business.homo.train_client:run:149 - ----- fit <FT> ----- training dataset: 60000, batch size: 938 | iflearner.business.homo.train_client:run:167 - ----- evaluate <FT> ----- 10000 0.0871, Accuracy: 9743/10000 (97.43%) | iflearner.business.homo.train_client:run:178 - ----- get <FT> ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_upload_param, time: 18.108728000001406ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_aggregate_result, time: 8.546628999994255ms | iflearner.business.homo.train_client:run:221 - ----- set ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_client_ready, time: 1.2796400000070207ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_notify_training, time: 0.9813249999979234ms | iflearner.business.homo.train_client:run:149 - ----- fit <FT> ----- training dataset: 60000, batch size: 938 | iflearner.business.homo.train_client:run:167 - ----- evaluate <FT> ----- 10000 0.0989, Accuracy: 9705/10000 (97.05%) | iflearner.business.homo.train_client:run:178 - ----- get <FT> ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_upload_param, time: 2.581355999978996ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_aggregate_result, time: 8.077353999993875ms | iflearner.business.homo.train_client:run:221 - ----- set ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_client_ready, time: 1.3156910000020616ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_notify_training, time: 2.0472559999973328ms | iflearner.business.homo.train_client:run:149 - ----- fit <FT> ----- training dataset: 60000, batch size: 938 | iflearner.business.homo.train_client:run:167 - ----- evaluate <FT> ----- 10000 0.0639, Accuracy: 9806/10000 (98.06%) | iflearner.business.homo.train_client:run:178 - ----- get <FT> ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_upload_param, time: 1.813790000028348ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_aggregate_result, time: 7.262933999982124ms | iflearner.business.homo.train_client:run:221 - ----- set ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_client_ready, time: 1.5553380000028483ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_notify_training, time: 1.6520599999978458ms | iflearner.business.homo.train_client:run:149 - ----- fit <FT> ----- training dataset: 60000, batch size: 938 | iflearner.business.homo.train_client:run:167 - ----- evaluate <FT> ----- 10000 0.0753, Accuracy: 9787/10000 (97.87%) | iflearner.business.homo.train_client:run:178 - ----- get <FT> ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_upload_param, time: 2.0744939999985945ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_aggregate_result, time: 9.83907899998826ms | iflearner.business.homo.train_client:run:221 - ----- set ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_client_ready, time: 2.054303000022628ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_notify_training, time: 1.705947000004926ms | iflearner.business.homo.train_client:run:149 - ----- fit <FT> ----- training dataset: 60000, batch size: 938 | iflearner.business.homo.train_client:run:167 - ----- evaluate <FT> ----- 10000 0.0736, Accuracy: 9768/10000 (97.68%) | iflearner.business.homo.train_client:run:178 - ----- get <FT> ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_upload_param, time: 1.890878999972756ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_aggregate_result, time: 9.515048000025672ms | iflearner.business.homo.train_client:run:221 - ----- set ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_client_ready, time: 1.91251899997269ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_notify_training, time: 1.6274970000154099ms | iflearner.business.homo.train_client:run:149 - ----- fit <FT> ----- training dataset: 60000, batch size: 938 | iflearner.business.homo.train_client:run:167 - ----- evaluate <FT> ----- 10000 0.0646, Accuracy: 9801/10000 (98.01%) | iflearner.business.homo.train_client:run:178 - ----- get <FT> ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_upload_param, time: 12.143503999993754ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_aggregate_result, time: 7.2212239999771555ms | iflearner.business.homo.train_client:run:221 - ----- set ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_client_ready, time: 1.8954930000063541ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_notify_training, time: 2.6670950000493576ms | iflearner.business.homo.train_client:run:149 - ----- fit <FT> ----- of training dataset: 60000, batch size: 938 | iflearner.business.homo.train_client:run:167 - ----- evaluate <FT> ----- dataset: 10000 loss: 0.0694, Accuracy: 9781/10000 (97.81%) | iflearner.business.homo.train_client:run:178 - ----- get <FT> ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_upload_param, time: 12.79627700000674ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_aggregate_result, time: 9.497581000005084ms | iflearner.business.homo.train_client:run:221 - ----- set ----- | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_client_ready, time: 1.1865850000276623ms | iflearner.communication.homo.homo_client:notice:94 - IN: party: message type: msg_notify_training, time: 1.6427019999696313ms | iflearner.business.homo.train_client:run:149 - ----- fit <FT> ----- of training dataset: 60000, batch size: 938 | iflearner.business.homo.train_client:run:167 - ----- evaluate <FT> ----- dataset: 10000 loss: 0.0615, Accuracy: 9814/10000 (98.14%) | iflearner.business.homo.train_client:run:178 - ----- get <FT> ----- ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [0.10584523560330272, 0.09984834497272968, 0.0871084996862337, 0.09891319364532829, 0.063905619766374, 0.07528823107918725, 0.07361029261836957, 0.06460160582875542, 0.0694242621988058, 0.06149101790403947]) ([1], [0.10584523560330272]) ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [0.9694, 0.9726, 0.9743, 0.9705, 0.9806, 0.9787, 0.9768, 0.9801, 0.9781, 0.9814]) ([1], [0.9694]) | iflearner.communication.homo.homo_client:transport:59 - OUT: message type: msg_complete, time: 2.060518000007505ms source code reference for this example Quickstart_SMPC.