From dc2179a208e2fbbad4d1e7fbc584402db15cd929 Mon Sep 17 00:00:00 2001 From: Matt Craddock <5796417+craddm@users.noreply.github.com> Date: Mon, 22 Jan 2024 14:17:30 +0000 Subject: [PATCH 1/5] Add some additional multi-provider guidance --- docs/source/processes/data_ingress.md | 10 +++++++++ .../roles/project_manager/data_ingress.md | 2 ++ .../roles/system_manager/manage_data.md | 22 +++++++++++++++++++ 3 files changed, 34 insertions(+) diff --git a/docs/source/processes/data_ingress.md b/docs/source/processes/data_ingress.md index 171453b05b..04ede70c63 100644 --- a/docs/source/processes/data_ingress.md +++ b/docs/source/processes/data_ingress.md @@ -29,6 +29,16 @@ If data sets consist of multiple files, collect them in uniquely named directori If there are multiple data providers uploading data for a single work package, each provider should use a uniquely named directory, or prepend their files with a unique name. +### Avoiding data leakage + +If all data providers are uploading to the same storage container, then they may be able to see the files uploaded by other data providers. + +Although they will not be able to access or download these files, a potential issue is that sensitive information may be visible in either the file names or directory structure of the uploaded data. + +If possible, data providers should avoid the use of any identifying information in the filenames or directory structure of the data that they upload. +This is not always possible, since some data providers may require identifying information to be part of filenames or directory structures. + + ### Describe the data Explaining the structure and format of the data will help researchers be most effective. diff --git a/docs/source/roles/project_manager/data_ingress.md b/docs/source/roles/project_manager/data_ingress.md index 8817ddc320..12bf067064 100644 --- a/docs/source/roles/project_manager/data_ingress.md +++ b/docs/source/roles/project_manager/data_ingress.md @@ -28,3 +28,5 @@ If ingress of new data would change the classification of a project, we suggest At the end of this process they should have classified the work package into one of the Data Safe Haven security tiers. Follow the guide to [data ingress](data_ingress.md) to bring all necessary code and data into the secure research environment. + +If there are multiple data providers, please see the guidance for {ref}`roles_system_manager_multiple_providers` diff --git a/docs/source/roles/system_manager/manage_data.md b/docs/source/roles/system_manager/manage_data.md index 2e2f9138e6..4688c87c14 100644 --- a/docs/source/roles/system_manager/manage_data.md +++ b/docs/source/roles/system_manager/manage_data.md @@ -34,6 +34,28 @@ The following steps show how to generate a temporary write-only upload token tha - The data provider should now be able to upload data by following {ref}`these instructions ` - You can validate successful data ingress by logging into the SRD for the SRE and checking the `/data` volume, where you should be able to view the data that the data provider has uploaded +(roles_system_manager_multiple_providers)= + +## Multiple data providers + +In some projects, there may be more than one data provider responsible for uploading data. Two potential issues that may occur are _name clashes_ and _data leakage_. + +If all data providers are uploading to the same storage container, then name clashes may occur. There is no protection against overwriting files during upload. +Thus, if that different data providers upload files with the same name, then the earlier upload will be overwritten. +This can be avoided by providing each data provider with their own subfolder on the storage container. + +If all data providers are uploading to the same storage container, then they may be able to see the files uploaded by other data providers. +Although they will not be able to access or download these files, a potential issue is that sensitive information may be visible in either the file names or directory structure of the uploaded data. + +If possible, data providers should avoid using any identifying information in the filenames or directory structure of the data that they upload. +This is not always possible, since some data providers may require identifying information to be part of filenames or directory structures. + +An alternative is to provide separate storage containers for upload for each data provider. +These containers should have all the same access restrictions as used for a single ingress storage container. + +After the data has been uploaded, the {ref}`role_system_manager` can transfer the uploaded data to a single storage container accessible to {ref}`role_researcher`s from the relevant SRE, as per the normal data ingress process. +The data-provider-specific containers should be deleted once the data has been transferred. + (roles_system_manager_software_ingress)= ## Software Ingress From 4abb6667bae414fcf25bd4f42b9eaf17c08d9e54 Mon Sep 17 00:00:00 2001 From: Matt Craddock <5796417+craddm@users.noreply.github.com> Date: Mon, 22 Jan 2024 14:38:32 +0000 Subject: [PATCH 2/5] fix linting error --- docs/source/processes/data_ingress.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/source/processes/data_ingress.md b/docs/source/processes/data_ingress.md index 04ede70c63..3e333ca9db 100644 --- a/docs/source/processes/data_ingress.md +++ b/docs/source/processes/data_ingress.md @@ -38,7 +38,6 @@ Although they will not be able to access or download these files, a potential is If possible, data providers should avoid the use of any identifying information in the filenames or directory structure of the data that they upload. This is not always possible, since some data providers may require identifying information to be part of filenames or directory structures. - ### Describe the data Explaining the structure and format of the data will help researchers be most effective. From 81a92c69131c1f69db5b7aca54288f9d24586dfb Mon Sep 17 00:00:00 2001 From: Matt Craddock Date: Thu, 1 Feb 2024 12:27:08 +0000 Subject: [PATCH 3/5] Update docs/source/roles/system_manager/manage_data.md Co-authored-by: Jim Madge --- docs/source/roles/system_manager/manage_data.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/roles/system_manager/manage_data.md b/docs/source/roles/system_manager/manage_data.md index 4688c87c14..de279b78e6 100644 --- a/docs/source/roles/system_manager/manage_data.md +++ b/docs/source/roles/system_manager/manage_data.md @@ -41,7 +41,7 @@ The following steps show how to generate a temporary write-only upload token tha In some projects, there may be more than one data provider responsible for uploading data. Two potential issues that may occur are _name clashes_ and _data leakage_. If all data providers are uploading to the same storage container, then name clashes may occur. There is no protection against overwriting files during upload. -Thus, if that different data providers upload files with the same name, then the earlier upload will be overwritten. +Thus, if more than one data provider uploads files with the same path, then the earlier upload will be overwritten. This can be avoided by providing each data provider with their own subfolder on the storage container. If all data providers are uploading to the same storage container, then they may be able to see the files uploaded by other data providers. From 9ae01d929bab3cfba76310a8228370c476a83c45 Mon Sep 17 00:00:00 2001 From: Matt Craddock Date: Thu, 1 Feb 2024 12:27:15 +0000 Subject: [PATCH 4/5] Update docs/source/roles/system_manager/manage_data.md Co-authored-by: Jim Madge --- docs/source/roles/system_manager/manage_data.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/roles/system_manager/manage_data.md b/docs/source/roles/system_manager/manage_data.md index de279b78e6..9eff808585 100644 --- a/docs/source/roles/system_manager/manage_data.md +++ b/docs/source/roles/system_manager/manage_data.md @@ -42,7 +42,7 @@ In some projects, there may be more than one data provider responsible for uploa If all data providers are uploading to the same storage container, then name clashes may occur. There is no protection against overwriting files during upload. Thus, if more than one data provider uploads files with the same path, then the earlier upload will be overwritten. -This can be avoided by providing each data provider with their own subfolder on the storage container. +This can be avoided by providing each data provider with their own subfolder on the storage container and ensuring that each uploads only to their subfolder. If all data providers are uploading to the same storage container, then they may be able to see the files uploaded by other data providers. Although they will not be able to access or download these files, a potential issue is that sensitive information may be visible in either the file names or directory structure of the uploaded data. From a9da256bb8f81a3479c559f694c900c7ac7d2ef7 Mon Sep 17 00:00:00 2001 From: JimMadge Date: Fri, 2 Feb 2024 13:38:40 +0000 Subject: [PATCH 5/5] Update SRD package versions --- .../packages/dbeaver-driver-versions.json | 2 +- .../packages/deb-azuredatastudio.version | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/deployment/secure_research_desktop/packages/dbeaver-driver-versions.json b/deployment/secure_research_desktop/packages/dbeaver-driver-versions.json index a865bf9060..f803f502dc 100644 --- a/deployment/secure_research_desktop/packages/dbeaver-driver-versions.json +++ b/deployment/secure_research_desktop/packages/dbeaver-driver-versions.json @@ -1,5 +1,5 @@ { - "mssql_jdbc": "12.4.2.jre8", + "mssql_jdbc": "12.6.0.jre8", "pgjdbc": "1.1.6", "postgis_geometry": "2023.1.0", "postgis_jdbc": "2023.1.0", diff --git a/deployment/secure_research_desktop/packages/deb-azuredatastudio.version b/deployment/secure_research_desktop/packages/deb-azuredatastudio.version index 61e5d84054..81fe580df7 100644 --- a/deployment/secure_research_desktop/packages/deb-azuredatastudio.version +++ b/deployment/secure_research_desktop/packages/deb-azuredatastudio.version @@ -1,4 +1,4 @@ -hash: 9e136808d5f28a13cc743951c203391dbcf19c16e7eac417ab3501fd9a962fc3 -version: 1.47.1 +hash: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 +version: x64/stable debfile: azuredatastudio-linux-|VERSION|.deb -remote: https://sqlopsbuilds.azureedge.net/stable/b6f7beb01f92adaa4b79b6b6f3ac704e95cafe6e/|DEBFILE| +remote: https://azuredatastudio-update.azurewebsites.net/latest/linux-deb-x64/|DEBFILE|