Homomorphic Encryption #234

LSnyd · 2022-02-25T16:55:14Z

LSnyd
Feb 25, 2022

Hi all,

I am trying to add Homomorphic Encryption to my FL pipeline, but I keep getting errors that there is no tenseal context. My implementation is based on the HE part of the cifar example. Does anyone know what I am missing?

Here is the whole error on the client side:


starting the client .....
token is: 4163bf65-8ee6-4650-b821-a2558eff7714 run_number is: 1 client_name: site-3 listen_port: 42871
2022-02-25 17:15:50,373 - HEModelEncryptor - INFO - Using HE model encryptor.
2022-02-25 17:15:50,373 - HEModelEncryptor - INFO - client weights control: {'site-1': 1.0, 'site-2': 1.0, 'site-3': 1.0}
2022-02-25 17:15:50,374 - HEModelEncryptor - INFO - Encrypting all layers
2022-02-25 17:15:50,374 - HEModelDecryptor - INFO - Using HE model decryptor.
2022-02-25 17:15:50,375 - MONAITrainer - INFO - [run=1]: Client site-3 initialized at 
 /home/projects/FL/poc/site-3/startup/../run_1/app_site-3 
 with args: Namespace(client_config='config/config_fed_client.json', config_folder='config', env='config/environment.json', fed_client='fed_client.json', local_rank=0, log_config='/home/projects/FL/poc/site-3/startup/../startup/log.config', set=['secure_train=false', 'uid=site-3', 'config_folder=config', 'host=localhost', 'print_conf=True'], startup='/home/projects/FL/poc/site-3/startup/../startup', train_config='config/config_train.json', workspace='/home/projects/FL/poc/site-3/startup/..')
2022-02-25 17:15:51,115 - ProcessExecutor - INFO - waiting for process to finish
BEF CONFIG
Created the listener on port: 42871
AFTER CONFIG
2022-02-25 17:15:54,231 - HEModelEncryptor.local - ERROR - [run=1]: exception when handling event "_start_run"
2022-02-25 17:15:54,232 - HEModelDecryptor.local - ERROR - [run=1]: exception when handling event "_start_run"
2022-02-25 17:15:54,232 - ClientRunner - INFO - [run=1]: client runner started
2022-02-25 17:15:59,238 - ClientRunner - INFO - [run=1]: fetching task from server ...
2022-02-25 17:15:59,241 - FederatedClient - INFO - Starting to fetch execute task.
2022-02-25 17:15:59,252 - Communicator - INFO - Received from example_project server  (609 Bytes). getTask time: 0.004711627960205078 seconds
2022-02-25 17:15:59,254 - FederatedClient - INFO - pull_task completed. Task name:train Status:True 
2022-02-25 17:15:59,255 - ClientRunner - INFO - [run=1, peer=example_project, peer_run=1]: got task assignment: name=train, id=0222b862-54a6-449c-a935-a37e5b373711
2022-02-25 17:15:59,255 - HEModelDecryptor - INFO - [run=1, peer=example_project, peer_run=1, task_name=train, task_id=0222b862-54a6-449c-a935-a37e5b373711]: Running decryption...
2022-02-25 17:15:59,255 - HEModelDecryptor - ERROR - [run=1, peer=example_project, peer_run=1, task_name=train, task_id=0222b862-54a6-449c-a935-a37e5b373711]: shareable is not HE CKKS encrypted
2022-02-25 17:15:59,256 - ClientRunner - INFO - [run=1, peer=example_project, peer_run=1, task_name=train, task_id=0222b862-54a6-449c-a935-a37e5b373711]: invoking task executor <class 'monai_trainer.MONAITrainer'>
2022-02-25 17:15:59,269 - ignite.engine.engine.SupervisedEvaluator - INFO - Engine run resuming from iteration 0, epoch 0 until 1 epochs
`to_onehot=True/False` is deprecated, please use `to_onehot=num_classes` instead.
Traceback (most recent call last):
  File "/home/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/nvflare/private/event.py", line 60, in fire_event
    h.handle_event(event, ctx)
  File "/home/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/nvflare/app_common/homomorphic_encryption/he_model_encryptor.py", line 86, in handle_event
    self.tenseal_context = load_tenseal_context_from_workspace(self.tenseal_context_file, fl_ctx)
  File "/home/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/nvflare/app_common/homomorphic_encryption/homomorphic_encrypt.py", line 42, in load_tenseal_context_from_workspace
    raise ValueError("Cannot load tenseal_context {}: {}".format(ctx_file_name, rc))
ValueError: Cannot load tenseal_context client_context.tenseal: LoadResult.NO_SUCH_CONTENT
Traceback (most recent call last):
  File "/home/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/nvflare/private/event.py", line 60, in fire_event
    h.handle_event(event, ctx)
  File "/home/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/nvflare/app_common/homomorphic_encryption/he_model_decryptor.py", line 50, in handle_event
    self.tenseal_context = load_tenseal_context_from_workspace(self.tenseal_context_file, fl_ctx)
  File "/home/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/nvflare/app_common/homomorphic_encryption/homomorphic_encrypt.py", line 42, in load_tenseal_context_from_workspace
    raise ValueError("Cannot load tenseal_context {}: {}".format(ctx_file_name, rc))
ValueError: Cannot load tenseal_context client_context.tenseal: LoadResult.NO_SUCH_CONTENT
2022-02-25 17:16:00,289 - ignite.engine.engine.SupervisedEvaluator - INFO - Got new best metric of val_mean_dice: 0.002610419411212206

My server train configuration looks like this:

{
  "format_version": 2,
  "executors": [
    {
      "tasks": ["train"],
      "executor": {
        "path": "monai_trainer.MONAITrainer",
        "args": {
          "aggregation_epochs": 10
        }
      }
    }
  ],
  "task_result_filters": [
    {
      "tasks": ["train"],
      "filters":[
          {
          "name": "HEModelEncryptor",
          "args": {
            "aggregation_weights": {
              "site-1":  1.0,
              "site-2":  1.0,
               "site-3":  1.0
            }
          }
        }
      ]
    }
  ],
  "task_data_filters": [
    {
      "tasks": ["train", "validate"],
      "filters":[
          {
          "name": "HEModelDecryptor",
          "args": {
          }
        }
      ]
    }
  ],
  "components": [
  ]
}

and my client config looks like this:

{
  "format_version": 2,

  "server": {
    "heart_beat_timeout": 600
  },
  "task_data_filters": [],
  "task_result_filters": [],
  "components": [
    {
      "id": "persistor",
      "name": "PTFileModelPersistor",
      "args": {

      }
    },
    {
      "id": "shareable_generator",
      "name": "HEModelShareableGenerator",
      "args": {}
    },
    {
      "id": "aggregator",
      "name":"HEInTimeAccumulateWeightedAggregator",
      "args": {
      }
    },
    {
      "id": "model_selector",
      "name": "IntimeModelSelectionHandler",
      "args": {}
    },
    {
      "id": "model_locator",
      "name": "PTFileModelLocator",
      "args": {
        "pt_persistor_id": "persistor"
      }
    }
  ],
  "workflows": [
      {
        "id": "scatter_and_gather",
        "path": "nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather",
        "args": {
            "min_clients" : 1,
            "num_rounds" : 100,
            "start_round": 0,
            "wait_time_after_min_received": 10,
            "aggregator_id": "aggregator",
            "persistor_id": "persistor",
            "shareable_generator_id": "shareable_generator",
            "train_task_name": "train",
            "train_timeout": 0,
            "ignore_result_error": true
        }
      }
  ]
}

Answered by yanchengnv

Mar 16, 2022

The configured number is 8, so you must have at least 8 sites. Otherwise the server will wait for that many updates forever (since the train_timeout of ScatterAndGather is set to 0 in the config).

View full answer

IsaacYangSLA · 2022-02-25T17:04:18Z

IsaacYangSLA
Feb 25, 2022
Maintainer

From the error message in the log LoadResult.NO_SUCH_CONTENT, I am guessing your system did not have tenseal context file. Please check if you have server_context.tenseal in the server's startup folder and client_context.tenseal in all clients' startup folder.

1 reply

LSnyd Feb 28, 2022
Author

Hi @IsaacYangSLA,
you are right, I don't have such files in my folders. What do I need to do that they will be created?

holgerroth · 2022-03-01T00:56:27Z

holgerroth
Mar 1, 2022
Maintainer

Please follow the instructions for creating "secure" workspace using the provisioning tools. You can also refer to the example here.

1 reply

LSnyd Mar 10, 2022
Author

Hi @holgerroth,
thanks a lot for your help. I was able to set up a secure workspace with the provisioning tools.
However, I am now running into another error, which I believe might have something to do with my configurations:

2022-03-10 12:59:51,394 - HEInTimeAccumulateWeightedAggregator - INFO - client weights control: {}
2022-03-10 12:59:51,394 - HEInTimeAccumulateWeightedAggregator - INFO - Only divide by sum of local (weighted) iterations.
2022-03-10 12:59:51,405 - ServerRunner - INFO - [run=1]: Server runner starting ...
2022-03-10 12:59:51,405 - ServerRunner - ERROR - [run=1]: Aborting current RUN due to FATAL_SYSTEM_ERROR received: expect model to be torch.nn.Module but got <class 'dict'>
2022-03-10 12:59:51,406 - ServerRunner - INFO - [run=1]: asked to abort - triggered abort_signal to stop the RUN
2022-03-10 12:59:51,455 - ServerRunner - INFO - [run=1]: starting workflow scatter_and_gather (<class 'nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather'>) ...
2022-03-10 12:59:51,455 - ScatterAndGather - INFO - [run=1]: Initializing ScatterAndGather workflow.
2022-03-10 12:59:51,455 - PTFileModelPersistor - ERROR - [run=1]: error getting state_dict from model object
Traceback (most recent call last):
  File "/home/lisa/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/nvflare/app_common/pt/pt_file_model_persistor.py", line 202, in load_model
    data = self.model.state_dict() if self.model is not None else OrderedDict()
AttributeError: 'dict' object has no attribute 'state_dict'
2022-03-10 12:59:51,455 - ServerRunner - ERROR - [run=1]: Aborting current RUN due to FATAL_SYSTEM_ERROR received: cannot create state_dict from model object
2022-03-10 12:59:51,455 - ServerRunner - INFO - [run=1]: asked to abort - triggered abort_signal to stop the RUN
2022-03-10 12:59:51,456 - ServerRunner - INFO - [run=1]: Workflow scatter_and_gather (<class 'nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather'>) started
2022-03-10 12:59:51,456 - ScatterAndGather - INFO - [run=1, wf=scatter_and_gather]: Beginning ScatterAndGather training phase.
2022-03-10 12:59:51,456 - ScatterAndGather - INFO - [run=1, wf=scatter_and_gather]: Abort signal received. Exiting at round 0.
2022-03-10 12:59:51,456 - ServerRunner - INFO - [run=1, wf=scatter_and_gather]: Workflow: scatter_and_gather finalizing ...

Here is again my server config:



{
  "format_version": 2,

  "server": {
    "heart_beat_timeout": 600
  },
  "task_data_filters": [],
  "task_result_filters": [],
  "components": [
    {
      "id": "persistor",
      "name": "PTFileModelPersistor",
      "args": {
          "model": {
          "name": "monai.networks.nets.unet.UNet",
    "args": {
            "dimensions":2,
            "in_channels":1,
            "out_channels":33,
            "channels":[32, 64, 128, 256,512],
            "strides":[2, 2, 2, 2],
            "num_res_units":2,
            "norm": "batch"
    }
          }

      }
    },
    {
      "id": "shareable_generator",
      "name": "HEModelShareableGenerator",
      "args": {}
    },
    {
      "id": "aggregator",
      "name":"HEInTimeAccumulateWeightedAggregator",
      "args": {
          "expected_data_kind": "WEIGHTS"
      }
    },
    {
      "id": "model_selector",
      "name": "IntimeModelSelectionHandler",
      "args": {}
    },
    {
      "id": "model_locator",
      "name": "PTFileModelLocator",
      "args": {
        "pt_persistor_id": "persistor"
      }
    }
  ],
  "workflows": [
      {
        "id": "scatter_and_gather",
        "path": "nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather",
        "args": {
            "min_clients" : 1,
            "num_rounds" : 100,
            "start_round": 0,
            "wait_time_after_min_received": 10,
            "aggregator_id": "aggregator",
            "persistor_id": "persistor",
            "shareable_generator_id": "shareable_generator",
            "train_task_name": "train",
            "train_timeout": 0,
            "ignore_result_error": true
        }
      }
  ]
}

and my client config:

{
  "format_version": 2,
  "executors": [
    {
      "tasks": ["train"],
      "executor": {
        "path": "custom.monai_trainer.MONAITrainer",
        "args": {
          "aggregation_epochs": 10
        }
      }
    }
  ],
  "task_result_filters": [
    {
      "tasks": ["train"],
      "filters":[
          {
          "name": "HEModelEncryptor",
          "args": {
            "aggregation_weights": {
              "site-1":  1.0,
              "site-2":  1.0,
               "site-3":  1.0
            }
          }
        }
      ]
    }
  ],
  "task_data_filters": [
    {
      "tasks": ["train", "validate"],
      "filters":[
          {
          "name": "HEModelDecryptor",
          "args": {
          }
        }
      ]
    }
  ],
  "components": [
  ]
}

holgerroth · 2022-03-10T15:10:12Z

holgerroth
Mar 10, 2022
Maintainer

I think the model is not being built when using "name" in the model persistor config. You should be using path as shown here.

3 replies

LSnyd Mar 10, 2022
Author

Great, this error is gone. I always thought "name" is used for packages, while "path" was rather for classes/functions from local folders. Thank you :)

It seems like the encryption still does not work though. I am getting the following warning on the client side now ('shareable is not HE CKKS encrypted'). I've tried to base my configurations on the cifar10_fedavg_he example with a couple of elements from the monai example as well.

2022-03-10 16:40:20,453 - ClientRunner - INFO - [run=1]: client runner started
2022-03-10 16:40:25,459 - ClientRunner - INFO - [run=1]: fetching task from server ...
2022-03-10 16:40:25,462 - FederatedClient - INFO - Starting to fetch execute task.
2022-03-10 16:40:27,205 - Communicator - INFO - Received from secure_project server  (26145639 Bytes). getTask time: 1.7405250072479248 seconds
2022-03-10 16:40:27,207 - FederatedClient - INFO - pull_task completed. Task name:train Status:True 
2022-03-10 16:40:27,220 - ClientRunner - INFO - [run=1, peer=secure_project, peer_run=1]: got task assignment: name=train, id=abc44040-ebf5-4474-8ea6-9f2a74f0badb
2022-03-10 16:40:27,220 - HEModelDecryptor - INFO - [run=1, peer=secure_project, peer_run=1, task_name=train, task_id=abc44040-ebf5-4474-8ea6-9f2a74f0badb]: Running decryption...
2022-03-10 16:40:27,221 - HEModelDecryptor - ERROR - [run=1, peer=secure_project, peer_run=1, task_name=train, task_id=abc44040-ebf5-4474-8ea6-9f2a74f0badb]: shareable is not HE CKKS encrypted
2022-03-10 16:40:27,221 - ClientRunner - INFO - [run=1, peer=secure_project, peer_run=1, task_name=train, task_id=abc44040-ebf5-4474-8ea6-9f2a74f0badb]: invoking task executor <class 'custom.monai_trainer.MONAITrainer'>
2022-03-10 16:40:27,544 - ignite.engine.engine.SupervisedEvaluator - INFO - Engine run resuming from iteration 0, epoch 0 until 1 epochs

holgerroth Mar 10, 2022
Maintainer

It might work. In the first round, the initial model coming from the server is not encrypted. The server has no encryption or decryption keys. You should see the first encryption happening the first time a client sends updates to the server.

holgerroth Mar 10, 2022
Maintainer

FYI, the behavior of "path" vs. "name" in configs is motivated by https://docs.nvidia.com/clara/clara-train-ea/4.1/pt/byom.html. We will need to add some description in the nvflare documentation as well.

LSnyd · 2022-03-11T15:02:45Z

LSnyd
Mar 11, 2022
Author

Great, thanks for the explanations @holgerroth
Unfortunately, it also does not work after the first round of client <> server exchange.

It seems like the encryption on the client-side works:


2022-03-11 15:11:57,799 - HEModelEncryptor - INFO - [run=1, peer=secure_project, peer_run=1, task_name=train, task_id=e30983a7-2fe1-410e-96bf-78489d5cb615]: Encrypting vars 140 of 148: model.2.0.conv.bias with 33 values
2022-03-11 15:11:57,801 - HEModelEncryptor - INFO - [run=1, peer=secure_project, peer_run=1, task_name=train, task_id=e30983a7-2fe1-410e-96bf-78489d5cb615]: Encrypting vars 141 of 148: model.2.0.adn.N.weight with 33 values
2022-03-11 15:11:57,804 - HEModelEncryptor - INFO - [run=1, peer=secure_project, peer_run=1, task_name=train, task_id=e30983a7-2fe1-410e-96bf-78489d5cb615]: Encrypting vars 142 of 148: model.2.0.adn.N.bias with 33 values
2022-03-11 15:11:57,806 - HEModelEncryptor - INFO - [run=1, peer=secure_project, peer_run=1, task_name=train, task_id=e30983a7-2fe1-410e-96bf-78489d5cb615]: Encrypting vars 143 of 148: model.2.0.adn.N.running_mean with 33 values
2022-03-11 15:11:57,809 - HEModelEncryptor - INFO - [run=1, peer=secure_project, peer_run=1, task_name=train, task_id=e30983a7-2fe1-410e-96bf-78489d5cb615]: Encrypting vars 144 of 148: model.2.0.adn.N.running_var with 33 values
2022-03-11 15:11:57,811 - HEModelEncryptor - INFO - [run=1, peer=secure_project, peer_run=1, task_name=train, task_id=e30983a7-2fe1-410e-96bf-78489d5cb615]: Encrypting vars 145 of 148: model.2.0.adn.N.num_batches_tracked with 1 values
2022-03-11 15:11:57,813 - HEModelEncryptor - INFO - [run=1, peer=secure_project, peer_run=1, task_name=train, task_id=e30983a7-2fe1-410e-96bf-78489d5cb615]: Encrypting vars 146 of 148: model.2.0.adn.A.weight with 1 values
2022-03-11 15:11:57,816 - HEModelEncryptor - INFO - [run=1, peer=secure_project, peer_run=1, task_name=train, task_id=e30983a7-2fe1-410e-96bf-78489d5cb615]: Encrypting vars 147 of 148: model.2.1.conv.unit0.conv.weight with 9801 values
WARNING: The input does not fit in a single ciphertext, and some operations will be disabled.
The following operations are disabled in this setup: matmul, matmul_plain, enc_matmul_plain, conv2d_im2col.
If you need to use those operations, try increasing the poly_modulus parameter, to fit your input.
2022-03-11 15:11:57,823 - HEModelEncryptor - INFO - [run=1, peer=secure_project, peer_run=1, task_name=train, task_id=e30983a7-2fe1-410e-96bf-78489d5cb615]: Encrypting vars 148 of 148: model.2.1.conv.unit0.conv.bias with 33 values
2022-03-11 15:11:57,825 - HEModelEncryptor - INFO - [run=1, peer=secure_project, peer_run=1, task_name=train, task_id=e30983a7-2fe1-410e-96bf-78489d5cb615]: Encryption time for 6532881 of 6532881 params (encrypted value range [-1.3117870092391968, 420.0]) 4.478200435638428 seconds.
2022-03-11 15:11:57,825 - HEModelEncryptor - INFO - [run=1, peer=secure_project, peer_run=1, task_name=train, task_id=e30983a7-2fe1-410e-96bf-78489d5cb615]: 148 of 148 layers encrypted
2022-03-11 15:11:57,826 - ClientRunner - INFO - [run=1, peer=secure_project, peer_run=1, task_name=train, task_id=e30983a7-2fe1-410e-96bf-78489d5cb615]: finished processing task
2022-03-11 15:11:57,827 - FederatedClient - INFO - Starting to push execute result.
2022-03-11 15:11:58,224 - Communicator - INFO - Send submitUpdate to secure_project server
check status from process listener......

On the server side I am getting the following error. It seems like all shareables are skipped, since only WEIGHT_DIFF type DXO's are supported and this is why the final layers in the end are empty. I've remove the argument of expected_data_kind within the aggregator of my server, but it did not change anything.

2022-03-11 15:12:20,747 - ServerRunner - INFO - [run=1, wf=scatter_and_gather, peer=site-3, peer_run=1]: got result from client for task: name=train, id=3252693a-11e0-4430-b130-9c290c49d708
2022-03-11 15:12:27,676 - ScatterAndGather - INFO - [run=1, wf=scatter_and_gather, peer=site-3, peer_run=1, peer_rc=OK, task_name=train, task_id=3252693a-11e0-4430-b130-9c290c49d708]: invoking result_received_cb ...
2022-03-11 15:12:27,677 - HEInTimeAccumulateWeightedAggregator - ERROR - [run=1, wf=scatter_and_gather, peer=site-3, peer_run=1, peer_rc=OK, task_name=train, task_id=3252693a-11e0-4430-b130-9c290c49d708]: support WEIGHT_DIFF type DXO only, skipping this shareable.
2022-03-11 15:12:27,677 - ScatterAndGather - INFO - [run=1, wf=scatter_and_gather, peer=site-3, peer_run=1, peer_rc=OK, task_name=train, task_id=3252693a-11e0-4430-b130-9c290c49d708]: Contribution from site-3 REJECTED by the aggregator.
2022-03-11 15:12:27,677 - ServerRunner - INFO - [run=1, wf=scatter_and_gather, peer=site-3, peer_run=1, peer_rc=OK, task_name=train, task_id=3252693a-11e0-4430-b130-9c290c49d708]: finished processing client result by scatter_and_gather
2022-03-11 15:12:27,678 - ScatterAndGather - INFO - [run=1, wf=scatter_and_gather]: task train exit with status TaskCompletionStatus.OK
2022-03-11 15:12:27,733 - FederatedServer - INFO - Fetch task requested from client: site-2 (ce5d2de2-6902-4da8-acf7-2fcf1d0f499b)
2022-03-11 15:12:27,734 - ServerRunner - INFO - [run=1, wf=scatter_and_gather, peer=site-2, peer_run=1]: got task request from client
2022-03-11 15:12:27,734 - ServerRunner - INFO - [run=1, wf=scatter_and_gather, peer=site-2, peer_run=1]: no task currently for client - asked client to try again later
2022-03-11 15:12:27,734 - FederatedServer - INFO - Return task:__try_again__ to client:site-2 --- (ce5d2de2-6902-4da8-acf7-2fcf1d0f499b) 
2022-03-11 15:12:28,132 - HEInTimeAccumulateWeightedAggregator - INFO - [run=1, wf=scatter_and_gather]: Aggregated 0 contributions for round 0 time is 9.775161743164062e-06 seconds
2022-03-11 15:12:28,132 - HEInTimeAccumulateWeightedAggregator - INFO - [run=1, wf=scatter_and_gather]: 0 of 0 layers encrypted
2022-03-11 15:12:28,133 - HEModelShareableGenerator - INFO - [run=1, wf=scatter_and_gather]: shareable_to_learnable...
2022-03-11 15:12:28,133 - HEModelShareableGenerator - ERROR - [run=1, wf=scatter_and_gather]: error converting shareable to model
Traceback (most recent call last):
  File "/home/lisa/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/nvflare/app_common/homomorphic_encryption/he_model_shareable_generator.py", line 148, in shareable_to_learnable
    return self._shareable_to_learnable(shareable, fl_ctx)
  File "/home/lisa/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/nvflare/app_common/homomorphic_encryption/he_model_shareable_generator.py", line 92, in _shareable_to_learnable
    raise ValueError(f"encrypt_layers is empty: {encrypt_layers}")
ValueError: encrypt_layers is empty: {}
Traceback (most recent call last):
  File "/home/lisa/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/nvflare/app_common/homomorphic_encryption/he_model_shareable_generator.py", line 148, in shareable_to_learnable
    return self._shareable_to_learnable(shareable, fl_ctx)
  File "/home/lisa/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/nvflare/app_common/homomorphic_encryption/he_model_shareable_generator.py", line 92, in _shareable_to_learnable
    raise ValueError(f"encrypt_layers is empty: {encrypt_layers}")
ValueError: encrypt_layers is empty: {}

0 replies

holgerroth · 2022-03-11T16:19:32Z

holgerroth
Mar 11, 2022
Maintainer

Ah. Thank you for reporting this issue. The current HE aggregator expects the dxo to contain weight differences. But that should be not required. You can make a custom copy of the HE aggregator and replace this block with

        if dxo.data_kind != self.expected_data_kind:
            self.log_error(fl_ctx, f"expected {self.expected_data_kind} type DXO only, skipping this shareable.")
            return False

Your custom aggregator can be used using "path" instead of "name" in your config. You can either provide inside the custom folder of your app or put it in your PYTHONPATH.

I will create an issue and we will fix it in the repo soon.

0 replies

holgerroth · 2022-03-11T16:46:00Z

holgerroth
Mar 11, 2022
Maintainer

FYI, the fixing PR is here #292

0 replies

LSnyd · 2022-03-11T17:27:50Z

LSnyd
Mar 11, 2022
Author

Great, thank you! After adding this, the encryption seems to work, but another error occurs within the model persistor:

2022-03-11 18:00:37,429 - HEInTimeAccumulateWeightedAggregator - INFO - [run=1, wf=scatter_and_gather, peer=site-1, peer_run=1, peer_rc=OK, task_name=train, task_id=843b287a-b255-450f-8eb3-05a96acd6e5d]: Round 0 adding site-1 time is 0.9321484565734863 seconds
2022-03-11 18:00:37,429 - ScatterAndGather - INFO - [run=1, wf=scatter_and_gather, peer=site-1, peer_run=1, peer_rc=OK, task_name=train, task_id=843b287a-b255-450f-8eb3-05a96acd6e5d]: Contribution from site-1 ACCEPTED by the aggregator.
Traceback (most recent call last):
2022-03-11 18:00:37,429 - ScatterAndGather - INFO - [run=1, wf=scatter_and_gather, peer=site-1, peer_run=1, peer_rc=OK, task_name=train, task_id=843b287a-b255-450f-8eb3-05a96acd6e5d]: Result of unknown task train sent to aggregator.
2022-03-11 18:00:37,429 - ServerRunner - INFO - [run=1, wf=scatter_and_gather, peer=site-1, peer_run=1, peer_rc=OK, task_name=train, task_id=843b287a-b255-450f-8eb3-05a96acd6e5d]: finished processing client result by scatter_and_gather
  File "/home/lisa/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/nvflare/app_common/workflows/scatter_and_gather.py", line 211, in control_flow
    self.persistor.save(self._global_weights, fl_ctx)
  File "/home/lisa/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/nvflare/app_common/abstract/model_persistor.py", line 29, in save
    self.save_model(learnable, fl_ctx)
  File "/home/lisa/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/nvflare/app_common/pt/pt_file_model_persistor.py", line 229, in save_model
    self.save_model_file(self._ckpt_save_path)
2022-03-11 18:00:37,446 - ServerRunner - INFO - [run=1, wf=scatter_and_gather, peer=site-3, peer_run=1]: got task request from client
  File "/home/lisa/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/nvflare/app_common/pt/pt_file_model_persistor.py", line 224, in save_model_file
    torch.save(save_dict, save_path)
2022-03-11 18:00:37,447 - ServerRunner - INFO - [run=1, wf=scatter_and_gather, peer=site-3, peer_run=1]: no task currently for client - asked client to try again later
  File "/home/lisa/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/torch/serialization.py", line 379, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
2022-03-11 18:00:37,447 - FederatedServer - INFO - Return task:__try_again__ to client:site-3 --- (efeeb23c-a6fb-4925-939a-5459c3bb64e2) 
  File "/home/lisa/miniconda3/envs/FL_monai_nvflare/lib/python3.8/site-packages/torch/serialization.py", line 484, in _save
    pickler.dump(obj)
TypeError: cannot pickle '_tenseal_cpp.CKKSVector' object

0 replies

holgerroth · 2022-03-11T22:04:07Z

holgerroth
Mar 11, 2022
Maintainer

Hi @LSnyd, I cannot reproduce this error with the cifar10 example and nvflare 2.0.14. I'm using the virtual environment setup described here. It looks like there could be some version conflict in your miniconda environment. @IsaacYangSLA might have more suggestions.

1 reply

LSnyd Mar 15, 2022
Author

Hi @holgerroth, i am actually getting the same error with the conda environment and the virtual env.
Which python and torch version are you using in your environment?

holgerroth · 2022-03-15T16:20:25Z

holgerroth
Mar 15, 2022
Maintainer

My environment looks like this
pip freeze absl-py==1.0.0 cachetools==5.0.0 certifi==2021.10.8 cffi==1.15.0 charset-normalizer==2.0.12 cryptography==36.0.1 google-api-core==2.7.1 google-api-python-client==2.40.0 google-auth==2.6.0 google-auth-httplib2==0.1.0 google-auth-oauthlib==0.4.6 googleapis-common-protos==1.55.0 grpcio==1.44.0 httplib2==0.20.4 idna==3.3 importlib-metadata==4.11.2 Markdown==3.3.6 numpy==1.22.3 nvflare==2.0.14 oauthlib==3.2.0 Pillow==9.0.1 pkg_resources==0.0.0 protobuf==3.19.4 psutil==5.9.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycparser==2.21 pyparsing==3.0.7 PyYAML==6.0 requests==2.27.1 requests-oauthlib==1.3.1 rsa==4.8 six==1.16.0 tenseal==0.3.0 tensorboard==2.8.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 torch==1.11.0 torchvision==0.12.0 typing_extensions==4.1.1 uritemplate==4.1.1 urllib3==1.26.8 Werkzeug==2.0.3 zipp==3.7.0

Have you tried running the HE experiment in the cifar10 example? Do you get the same error?

0 replies

LSnyd · 2022-03-16T07:56:05Z

LSnyd
Mar 16, 2022
Author

My environment looks the same. I can't get the HE experiment in the cifar10 example running. I followed the description and did not alter any config files. It doesn't give me the same error as before. It seems like the encryption works and the clients are able to send their encrypted weights to the server, but the server doesn't move forward. The error "no task currently for client" repeats at this stage over and over und I stop the process.

Server log:

2022-03-16 08:42:15,421 - ServerRunner - INFO - [run=1, wf=scatter_gather_ctl, peer=site-2, peer_run=1]: got result from client for task: name=train, id=043ea31e-bf8b-41e1-b3d0-61e86edc43ee
2022-03-16 08:42:15,421 - IntimeModelSelectionHandler - INFO - [run=1, wf=scatter_gather_ctl, peer=site-2, peer_run=1, peer_rc=OK, task_name=train, task_id=043ea31e-bf8b-41e1-b3d0-61e86edc43ee]: validation metric 0.1 from client site-2
2022-03-16 08:42:18,128 - ScatterAndGather - INFO - [run=1, wf=scatter_gather_ctl, peer=site-2, peer_run=1, peer_rc=OK, task_name=train, task_id=043ea31e-bf8b-41e1-b3d0-61e86edc43ee]: invoking result_received_cb ...
2022-03-16 08:42:18,129 - HEInTimeAccumulateWeightedAggregator - INFO - [run=1, wf=scatter_gather_ctl, peer=site-2, peer_run=1, peer_rc=OK, task_name=train, task_id=043ea31e-bf8b-41e1-b3d0-61e86edc43ee]: Adding contribution from site-2.
2022-03-16 08:42:18,728 - HEInTimeAccumulateWeightedAggregator - INFO - [run=1, wf=scatter_gather_ctl, peer=site-2, peer_run=1, peer_rc=OK, task_name=train, task_id=043ea31e-bf8b-41e1-b3d0-61e86edc43ee]: 18 of 18 layers encrypted
2022-03-16 08:42:18,728 - HEInTimeAccumulateWeightedAggregator - INFO - [run=1, wf=scatter_gather_ctl, peer=site-2, peer_run=1, peer_rc=OK, task_name=train, task_id=043ea31e-bf8b-41e1-b3d0-61e86edc43ee]: Round 0 adding site-2 time is 0.5988037586212158 seconds
2022-03-16 08:42:18,728 - ScatterAndGather - INFO - [run=1, wf=scatter_gather_ctl, peer=site-2, peer_run=1, peer_rc=OK, task_name=train, task_id=043ea31e-bf8b-41e1-b3d0-61e86edc43ee]: Contribution from site-2 ACCEPTED by the aggregator.
2022-03-16 08:42:18,729 - ServerRunner - INFO - [run=1, wf=scatter_gather_ctl, peer=site-2, peer_run=1, peer_rc=OK, task_name=train, task_id=043ea31e-bf8b-41e1-b3d0-61e86edc43ee]: finished processing client result by scatter_gather_ctl
2022-03-16 08:42:18,737 - FederatedServer - INFO - Fetch task requested from client: site-1 (7fb78c66-3174-4085-8bf5-7ef7c6e353a1)
2022-03-16 08:42:18,738 - ServerRunner - INFO - [run=1, wf=scatter_gather_ctl, peer=site-1, peer_run=1]: got task request from client
2022-03-16 08:42:18,738 - ServerRunner - INFO - [run=1, wf=scatter_gather_ctl, peer=site-1, peer_run=1]: no task currently for client - asked client to try again later

2 replies

holgerroth Mar 16, 2022
Maintainer

Seems like there is a mismatch in the minimum number of clients needed to move to the next round and the number of clients which have submitted their updates.

LSnyd Mar 16, 2022
Author

Oh wow - yes that worked. The cifar example works without errors for me as well. I will use this to hunt down the error that I am getting for my own use-case. I will let you know, once I found the mistake or if I have more questions! Thanks a lot for your support!

yanchengnv · 2022-03-16T16:24:29Z

yanchengnv
Mar 16, 2022
Maintainer

The configured number is 8, so you must have at least 8 sites. Otherwise the server will wait for that many updates forever (since the train_timeout of ScatterAndGather is set to 0 in the config).

1 reply

LSnyd Mar 17, 2022
Author

Hi @yanchengnv,
thanks a lot, I've changed the settings and got the cifar example running now.

LSnyd · 2022-03-17T10:03:02Z

LSnyd
Mar 17, 2022
Author

Hi @holgerroth,
my project implementation is based on the Monai Trainer and not on the Cifar Trainer. Do you think that this might be a problem for the HE addition? The cifar example for instance works with weight differences, while the Monai Trainer shares the weights themselves and not their difference. I added the specific configuration of weights/weight differences to the HEInTimeAccumulateWeightedAggregator in the config files. However, I am suspecting that this difference might be the reason, why I don't get the error for the Cifar example, but only for my own implementation. If you agree that this might be the cause, I would adapt my project to the cifar trainer instead.

0 replies

holgerroth · 2022-03-17T15:00:13Z

holgerroth
Mar 17, 2022
Maintainer

I see. Let me try the MONAI Trainer with HE on my side to see if I can reproduce the issue.

0 replies

holgerroth · 2022-03-18T00:25:19Z

holgerroth
Mar 18, 2022
Maintainer

I was able to reproduce the model saving issue in 2.0.6. Opened an issue #323

0 replies

holgerroth · 2022-03-18T01:50:50Z

holgerroth
Mar 18, 2022
Maintainer

I provided a fix #326

You can use this version of HEModelShareableGenerator (import using "path") for a temporary to work with full WEIGHTS.

1 reply

LSnyd Mar 18, 2022
Author

@holgerroth great, thanks a lot for your help. It works perfectly now :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Homomorphic Encryption #234

{{title}}

Replies: 15 comments 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Homomorphic Encryption #234

LSnyd Feb 25, 2022

Replies: 15 comments · 10 replies

IsaacYangSLA Feb 25, 2022 Maintainer

LSnyd Feb 28, 2022 Author

holgerroth Mar 1, 2022 Maintainer

LSnyd Mar 10, 2022 Author

holgerroth Mar 10, 2022 Maintainer

LSnyd Mar 10, 2022 Author

holgerroth Mar 10, 2022 Maintainer

holgerroth Mar 10, 2022 Maintainer

LSnyd Mar 11, 2022 Author

holgerroth Mar 11, 2022 Maintainer

holgerroth Mar 11, 2022 Maintainer

LSnyd Mar 11, 2022 Author

holgerroth Mar 11, 2022 Maintainer

LSnyd Mar 15, 2022 Author

holgerroth Mar 15, 2022 Maintainer

LSnyd Mar 16, 2022 Author

holgerroth Mar 16, 2022 Maintainer

LSnyd Mar 16, 2022 Author

yanchengnv Mar 16, 2022 Maintainer

LSnyd Mar 17, 2022 Author

LSnyd Mar 17, 2022 Author

holgerroth Mar 17, 2022 Maintainer

holgerroth Mar 18, 2022 Maintainer

holgerroth Mar 18, 2022 Maintainer

LSnyd Mar 18, 2022 Author

LSnyd
Feb 25, 2022

Replies: 15 comments 10 replies

IsaacYangSLA
Feb 25, 2022
Maintainer

LSnyd Feb 28, 2022
Author

holgerroth
Mar 1, 2022
Maintainer

LSnyd Mar 10, 2022
Author

holgerroth
Mar 10, 2022
Maintainer

LSnyd Mar 10, 2022
Author

holgerroth Mar 10, 2022
Maintainer

holgerroth Mar 10, 2022
Maintainer

LSnyd
Mar 11, 2022
Author

holgerroth
Mar 11, 2022
Maintainer

holgerroth
Mar 11, 2022
Maintainer

LSnyd
Mar 11, 2022
Author

holgerroth
Mar 11, 2022
Maintainer

LSnyd Mar 15, 2022
Author

holgerroth
Mar 15, 2022
Maintainer

LSnyd
Mar 16, 2022
Author

holgerroth Mar 16, 2022
Maintainer

LSnyd Mar 16, 2022
Author

yanchengnv
Mar 16, 2022
Maintainer

LSnyd Mar 17, 2022
Author

LSnyd
Mar 17, 2022
Author

holgerroth
Mar 17, 2022
Maintainer

holgerroth
Mar 18, 2022
Maintainer

holgerroth
Mar 18, 2022
Maintainer

LSnyd Mar 18, 2022
Author