Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update of previously imported observed data from another data source #785

Closed
Yuri05 opened this issue May 26, 2020 · 4 comments
Closed

Update of previously imported observed data from another data source #785

Yuri05 opened this issue May 26, 2020 · 4 comments
Assignees
Labels
Importer Observed data importer RFC Request For Comments

Comments

@Yuri05
Copy link
Member

Yuri05 commented May 26, 2020

When importing observed data sets, one import process results in (generally) N observed data sets imported into a project. N>=1 (based on the grouping information).

It should be possible to update observed data sets imported from one data source by selecting another data source.

Proposed workflow:

  1. User selects ONE observed data set in a project and then "Update from new data source"

  2. Software detects if there are other observed data sets in the project which were imported from the same data source. If YES: user is informed that further data sets will be updated.

  3. User defines a new data source

  4. Software checks if the new data source has the same structure (e.g. data columns used by the import configuration) as the original one. If NOT: ERROR (Update from the new data source not possible)

  5. Software checks if the new data source has the same combinations of metadata relevant for grouping. If NOT:

  • Software checks, if there are data sets which were imported previously but not available in the new data source. If YES: Software checks, if any of these data sets is used in the project (e.g. used in a simulation / Parameter identification/ ...).
    • If this is the case: ERROR, Import cannot be performed. User is informed that those data sets must be manually deleted from all simulations/PIs/... first.
    • Otherwise: all those data sets will be deleted from the project when the new import is finished.
  1. Preview is shown (s. Preview of data in the import configuration editor #625)
    • In the preview in this use case it will not be possible to change the import configuration

Example:

observed data is imported from

Time [min] Organ Compartment Species Concentration [mg/ml]
1 Brain Plasma Human 0,1
2 Brain Plasma Human 12
3 Brain Plasma Human 2
0 Liver Plasma Human 0,2
1 Liver Plasma Human 8
2 Liver Plasma Human 2

which results in 2 data sets: Human|Brain|Plasma and Human|Liver Plasma
grafik

Use case 1

New data source contains the same grouping data (Human|Brain|Plasma and Human|Liver Plasma) e.g.

Time [min] Organ Compartment Species Concentration [mg/ml]
1 Brain Plasma Human 0,1
2 Brain Plasma Human 12
3 Brain Plasma Human 2
4 Brain Plasma Human 1
0 Liver Plasma Human 0,2
1 Liver Plasma Human 8
2 Liver Plasma Human 2
3 Liver Plasma Human 1

In this case time and concentration of previosly imported data sets (Human|Brain|Plasma and Human|Liver Plasma) will be just updated with the new values

Use case 2

Information about some previously available data sets is not available in the new data source, e.g.

Time [min] Organ Compartment Species Concentration [mg/ml]
1 Brain Plasma Human 0,1
2 Brain Plasma Human 12
3 Brain Plasma Human 2
4 Brain Plasma Human 1

Software checks if Human|Liver|Plasma is used in the project.

  • If YES: ERROR, user must delete it from all simulations/PIs/etc. first and then repeat the import procedure.
  • If NO: Human|Brain|Plasma will be updated with the new data and Human|Liver|Plasma will be deleted from the project

Use case 3

Information about all previously available data sets is available in the new data source; ADDITIONALY information about new data sets was added, e.g.

Time [min] Organ Compartment Species Concentration [mg/ml]
1 Brain Plasma Human 0,1
2 Brain Plasma Human 12
3 Brain Plasma Human 2
4 Brain Plasma Human 1
0 Liver Plasma Human 0,2
1 Liver Plasma Human 8
2 Liver Plasma Human 2
3 Liver Plasma Human 1
0 Heart Plasma Human 1
1 Heart Plasma Human 3
2 Heart Plasma Human 4
3 Heart Plasma Human 5

In this case time and concentration of previosly imported data sets (Human|Brain|Plasma and Human|Liver Plasma) will be updated with the new values AND the new data set for Human|Heart|Plasma will be added automatically

Previous state of the discussion When importing observed data sets, one import process results in (generally) N observed data sets imported into a project. N>=1 (based on the grouping information).

It should be possible to update observed data sets imported from one data source by selecting another data source.

Proposed workflow:

  1. User selects ONE observed data set in a project and then "Update from new data source"

  2. Software detects if there are other observed data sets in the project which were imported from the same data source. If YES: user is asked if ALL those data sets should be updated or only the selected one (to be discussed: do we need this step or should ALL datasets be updated automatically?)

  3. User defines a new data source

  4. Software checks if the new data source has the same structure (e.g. data columns used by the import configuration) as the original one. If NOT: ERROR (Update from the new data source not possible)

  5. Software checks if the new data source has the same combinations of metadata relevant for grouping. If NOT: to be discussed. Following scenarios are possible

    a) Option 1: ERROR (Update from the new data source not possible)

    b) Option 2: Observed data sets not available in the new data source are removed from project (complicated if some of observed data sets are used in simulations/PIs, etc!)

    c)Option 3: Data sets available in the new data source are updated with the new data. Data sets not available in the new data source are kept AS IS.

    d)Option 4: User can select and choose between Error/Delete(?)/Keep (previous options)

  6. Preview is shown (s. Preview of data in the import configuration editor #625)

Example:

observed data is imported from

Time [min] Organ Compartment Species Concentration [mg/ml]
1 Brain Plasma Human 0,1
2 Brain Plasma Human 12
3 Brain Plasma Human 2
0 Liver Plasma Human 0,2
1 Liver Plasma Human 8
2 Liver Plasma Human 2

which results in 2 data sets: Human|Brain|Plasma and Human|Liver Plasma
grafik

Use case 1

New data source contains the same grouping data (Human|Brain|Plasma and Human|Liver Plasma) e.g.

Time [min] Organ Compartment Species Concentration [mg/ml]
1 Brain Plasma Human 0,1
2 Brain Plasma Human 12
3 Brain Plasma Human 2
4 Brain Plasma Human 1
0 Liver Plasma Human 0,2
1 Liver Plasma Human 8
2 Liver Plasma Human 2
3 Liver Plasma Human 1

In this case time and concentration of previosly imported data sets (Human|Brain|Plasma and Human|Liver Plasma) will be just updated with the new values

Use case 2

Information about some previously available data sets is not available in the new data source, e.g.

Time [min] Organ Compartment Species Concentration [mg/ml]
1 Brain Plasma Human 0,1
2 Brain Plasma Human 12
3 Brain Plasma Human 2
4 Brain Plasma Human 1

If the user decided to update Human|Brain|Plasma ONLY - that's fine. However if the user decided to update ALL data sets (thus Human|Brain|Plasma and Human|Liver|Plasma), then it depends on how we want to proceed

  • if we decide to go with 5a): ERROR
  • if we decide to go with 5b): Human|Brain|Plasma will be updated with the new data and Human|Liver|Plasma will be deleted from the project (complicated if some of observed data sets are used in simulations/PIs, etc!)
  • if we decide to go with 5c): Human|Brain|Plasma will be updated with the new data and Human|Liver|Plasma will be kept in the project AS IS

Use case 3

Information about all previously available data sets is available in the new data source; ADDITIONALY information about new data sets was added, e.g.

Time [min] Organ Compartment Species Concentration [mg/ml]
1 Brain Plasma Human 0,1
2 Brain Plasma Human 12
3 Brain Plasma Human 2
4 Brain Plasma Human 1
0 Liver Plasma Human 0,2
1 Liver Plasma Human 8
2 Liver Plasma Human 2
3 Liver Plasma Human 1
0 Heart Plasma Human 1
1 Heart Plasma Human 3
2 Heart Plasma Human 4
3 Heart Plasma Human 5

In this case time and concentration of previosly imported data sets (Human|Brain|Plasma and Human|Liver Plasma) will be updated with the new values. If the new data set for Human|Heart|Plasma will be added automatically depends on the decision for the step 6 above

@ju-rgen
Copy link
Member

ju-rgen commented Jan 13, 2021

Before update is finally performed a confirmation dialog should be displayed, where the user sees, how many new datasets are added, how many old datasets are deleted, how many datasets are updated, how many old datasets remain identical.
This allows the user to stop before somethin unintended happens, e.g. majority of datasets is deleted, because some wrong datasource was selected.

@ju-rgen
Copy link
Member

ju-rgen commented Jan 18, 2021

Doesn't the uses invisible datasource grouping bear the risk, that in any non trivial distribution of the datasets of a datasource to multiple folders the user does not oversee, what s/he is updating?

I personally find it more clear to have a view where the datasets (= timeseries = curves) are grouped into imports (based on a import configuration and a data source), which could perhaps be done in a hierarchical view like in #786.
There I would start the workflow at a Import and call it "update data source".

But I admit, that the impact of such an update, e.g. update of plots is anyway somehow hidden to the user and requires a careful follow up in non trivial cases.

So the users should reflect this feature carefully.

@georgeDaskalakis
Copy link
Contributor

The reload process currently has two options: reloading one specific dataset or reloading all the datasets that come from an excel file. The first option should not be available anymore - only reloading a whole file should be possible.

To do this the reload process should not delete the old datasets as is being done currently, but should load all the datasets from the file again, then present the user with an overview of what is currently loaded, what will be overwritten and what will be loaded as a new dataset (because of changes/additions to the excel file). Afterwards the data in the datasets that are going to be overwritten have to be edited - that way the simulations using those datasets will not lose their references to them.

@georgeDaskalakis georgeDaskalakis self-assigned this Mar 9, 2021
@Yuri05
Copy link
Member Author

Yuri05 commented Nov 9, 2022

Implemented as part of Importer Redesign

@Yuri05 Yuri05 closed this as completed Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Importer Observed data importer RFC Request For Comments
Projects
None yet
Development

No branches or pull requests

3 participants