Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1409 importer handling of imvalid files #1437

Merged
merged 21 commits into from
Feb 9, 2022

Conversation

abdelr
Copy link
Member

@abdelr abdelr commented Jan 11, 2022

Fixes #1409

@abdelr abdelr added the Importer Observed data importer label Jan 11, 2022
@abdelr abdelr self-assigned this Jan 11, 2022
Copy link
Member

@msevestre msevestre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion to avoid side effects

@@ -113,10 +119,6 @@ public void CalculateFormat(IDataSourceFile dataSource, IReadOnlyList<ColumnInfo
throw new UnsupportedFormatException(dataSource.Path);

dataSource.AvailableFormats = AvailableFormats(dataSource.DataSheets[sheetName].RawData, columnInfos, metaDataCategories).ToList();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this code, we are modifying the dataSource given as parameter. Can I suggest to change this methods so that it does not have side effect and returns the availableFormats instead? Line 108 for instance . It is not clear at all that the format will be set in this method. Thougts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdelr not sure if you saw this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdelr not sure if you saw this

Sure, but this is still a work in progress since the last discussion on markers for the sheets

@abdelr abdelr marked this pull request as draft January 15, 2022 20:51
@abdelr abdelr requested a review from msevestre January 28, 2022 13:19
@abdelr abdelr marked this pull request as ready for review January 28, 2022 13:20
@abdelr abdelr marked this pull request as draft January 28, 2022 14:01
@abdelr abdelr marked this pull request as ready for review January 28, 2022 14:14
Copy link
Member

@msevestre msevestre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big PR with some interesting changes. I need some more time to review

@@ -1275,6 +1277,9 @@ public static class Error
public static readonly string NaNOnData = "Data contains NaN values at imported columns. Select a different action for NaN values or clean your data.";
public static readonly string UnsupportedFileType = "The type of file that you are trying to open is not currently supported";
public static readonly string CannotRemoveBaseGridColumnStillInUse = "Cannot remove base grid column still used by other columns";
public static readonly string SimpleParseErrorMessage = "There were errors while parsing your data. Navigate to the sheets to read the concrete error.";

public static string ParseErrorMessage(IEnumerable<string> errors) => "There were errors while parsing your data: " + string.Join(". ", errors);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you could use string interpolation and not the old school +

Copy link
Member

@msevestre msevestre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in ospsuite.Dimensions is suspicious

Copy link
Member

@msevestre msevestre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see wha't's happening and this is ok. We are breaking a lot of encapsulation but passing Cache back and forth .
Cache is great because you can define a way to retrieve keys automatically by specifing a lambda. We are never using it. WE also never know what the key is. Is it the sheetName sometimes? what's a DataSet compared to a Sheet. Why are we adding DataSet with the sheet key as key..
Lots of questions for me...also coming from older code. We need to run resharper on those files because some basic optimization and code clean up are not done

@@ -16,15 +19,15 @@ public interface IDataSource
{
void SetDataFormat(IDataFormat dataFormat);
void SetNamingConvention(string namingConvention);
void AddSheets(Cache<string, DataSheet> dataSheets, IReadOnlyList<ColumnInfo> columnInfos, string filter);
Cache<IDataSet, List<ParseErrorDescription>> AddSheets(Cache<string, DataSheet> dataSheets, IReadOnlyList<ColumnInfo> columnInfos, string filter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unexpected signature. What does this returned cache represent? .Can you comment the interface so that we know what's going on?
In general, returning a cache is not commun weird because you are exposing the API to the consumer but it might be required

void SetMappings(string fileName, IEnumerable<MetaDataMappingConverter> mappings);
ImporterConfiguration GetImporterConfiguration();
IEnumerable<MetaDataMappingConverter> GetMappings();
Cache<string, IDataSet> DataSets { get; }
IEnumerable<string> NamesFromConvention();
NanSettings NanSettings { get; set; }
ImportedDataSet DataSetAt(int index);
void ValidateDataSourceUnits(IReadOnlyList<ColumnInfo> columnInfos);
Cache<IDataSet, List<ParseErrorDescription>> ValidateDataSourceUnits(IReadOnlyList<ColumnInfo> columnInfos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with this one.

}
}

public class MismatchingArrayLengthsParseErrorDescription : ParseErrorDescription
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have all those classes instead of using the string? associated with the message?

{
public EmptyDataSetsParseErrorDescription(IEnumerable<string> dataSetNames)
{
Message = Error.EmptyDataSet($"{string.Join(", ", dataSetNames)}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use dataSetNames.ToString(", ")

}

public static void Add<T1, T2>(Cache<T1, List<T2>> cache, T1 key, IEnumerable<T2> list)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make in an extension method (Call this CachedListExtensions) and the first one should call the other instead of duplicating.

{
if (_data.Any(dataSet => dataSet.Data.Any(pair => pair.Key.ColumnInfo.IsMandatory && pair.Value.Any(point => point.Measurement == indicator || double.IsNaN(point.Measurement)))))
throw new NanException();
return _data.Any(dataSet => dataSet.Data.Any(pair => pair.Key.ColumnInfo.IsMandatory && pair.Value.Any(point => point.Measurement == indicator || double.IsNaN(point.Measurement))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we comment a bit on this line ...please. And maybe we can break the logic a bit here because it's very complicated to decipher
Also run resharper on this file

foreach(var x in dataSource.DataSheets.Keys)
{
dataSource.AvailableFormats = CalculateFormat(dataSource, columnInfos, metaDataCategories, x).ToList();
if (dataSource.AvailableFormats?.Count > 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can it be nulll if you call ToList before?

throw new UnsupportedFormatException(dataSource.Path);

dataSource.Format = dataSource.AvailableFormats.FirstOrDefault();
return AvailableFormats(dataSource.DataSheets[sheetName].RawData, columnInfos, metaDataCategories);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could return a IReadOnly so that you don't have to TOList earlier.

@@ -16,15 +19,15 @@ public interface IDataSource
{
void SetDataFormat(IDataFormat dataFormat);
void SetNamingConvention(string namingConvention);
void AddSheets(Cache<string, DataSheet> dataSheets, IReadOnlyList<ColumnInfo> columnInfos, string filter);
Cache<IDataSet, List<ParseErrorDescription>> AddSheets(Cache<string, DataSheet> dataSheets, IReadOnlyList<ColumnInfo> columnInfos, string filter);
void SetMappings(string fileName, IEnumerable<MetaDataMappingConverter> mappings);
ImporterConfiguration GetImporterConfiguration();
IEnumerable<MetaDataMappingConverter> GetMappings();
Cache<string, IDataSet> DataSets { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this as a cache exposed. DataSet are added outside of the DataSource as it stands

_importer.AddFromFile(_configuration.Format, filterSheets(dataSheets, filter), columnInfos, this);
if (NanSettings == null || !double.TryParse(NanSettings.Indicator, out var indicator))
indicator = double.NaN;
var errors = new Cache<IDataSet, List<ParseErrorDescription>>();
foreach (var dataSet in DataSets.KeyValues)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you iterating over the keyValues? You are never using the Key

@abdelr abdelr requested a review from msevestre February 4, 2022 14:22
@abdelr
Copy link
Member Author

abdelr commented Feb 4, 2022

@msevestre I encapsulated the errors on a new class. I hope it makes more sense now?

}

void IDataSource.ValidateDataSourceUnits(IReadOnlyList<ColumnInfo> columnInfos)
ParseErrors IDataSource.ValidateDataSourceUnits(IReadOnlyList<ColumnInfo> columnInfos)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the IDataSource. here?


namespace OSPSuite.Infrastructure.Import.Core.Helpers
{
static public class CachedListHelpers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be removed right?

if (_data.Any(dataSet => dataSet.Data.Any(pair => pair.Key.ColumnInfo.IsMandatory && pair.Value.Any(point => point.Measurement == indicator || double.IsNaN(point.Measurement)))))
throw new NanException();
//returns true if for any ParsedDataSet
return _data.Any(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES!!!!

@@ -0,0 +1,3 @@
DevExpress.XtraEditors.CheckEdit, DevExpress.XtraEditors.v21.2, Version=21.2.3.0, Culture=neutral, PublicKeyToken=b88d1754d700e49a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this file

@@ -222,13 +228,12 @@ public string GetActiveFilterCriteria()

public void AddTabs(List<string> sheetNames)
{
//we should seek an alternative
importerTabControl.Images = imageCollection1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this control

if (_lastLoadedDataSets.Contains(SelectedTab) && _lastErrors.Contains(_lastLoadedDataSets[SelectedTab]))
{
labelControlError.Text = Error.ParseErrorMessage(_lastErrors.ErrorsFor(_lastLoadedDataSets[SelectedTab]).Select(x => x.Message));
layoutControlItem2.Visibility = DevExpress.XtraLayout.Utils.LayoutVisibility.Always;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename layoutControlItem2
Also use resharper to simplify some of the name space stuff

<value>163, 17</value>
</metadata>
<assembly alias="DevExpress.Utils.v21.2" name="DevExpress.Utils.v21.2, Version=21.2.3.0, Culture=neutral, PublicKeyToken=b88d1754d700e49a" />
<data name="imageCollection1.ImageStream" type="DevExpress.Utils.ImageCollectionStreamer, DevExpress.Utils.v21.2" mimetype="application/x-microsoft.net.object.bytearray.base64">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not use images like that at all. Icons are added to the ImageList and are loaded all dynamically

Copy link
Member

@msevestre msevestre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The domain codes is SO Much better. Thanks
The UI code needs to be updated :
Some controls have default naming with 1 at the end
Not proper usage of icons

private void refreshErrorMessage()
{

if (_lastLoadedDataSets.Contains(SelectedTab) && _lastErrors.Contains(_lastLoadedDataSets[SelectedTab]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somehow this feels like this code belongs in the presenter.

{
importerTabControl.TabPages.Each(x =>
{
if (_lastLoadedDataSets.Contains(x.Text))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this codes for which image to show also belongs in the presenter
And hopefully, we won't have those magin number anymore 1 and 0 once we use ApplicationIcons

<value>17, 17</value>
</metadata>
<data name="useForImportCheckEdit.ToolTip" xml:space="preserve">
<value>When selected, the filter will apply to the data during the import process. When deselected, the filter only affects this view. Check documentation for more information on defining filters: &lt;href=https://docs.open-systems-pharmacology.org/shared-tools-and-example-workflows/features-of-tables#filtering&gt;https://docs.open-systems-pharmacology.org/shared-tools-and-example-workflows/features-of-tables#filtering&lt;/href&gt;</value>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tootip needs to be put into constants!

var accumulatedIndexes = 0;
while (sheet.MoveNext() && index >= 0)
{
if (sheet.Current.Data.Count() > index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(sheet.Current.Data.Count() 3 times in the same while loop. Maybe cache it?

@abdelr abdelr requested a review from msevestre February 7, 2022 21:06
@abdelr
Copy link
Member Author

abdelr commented Feb 9, 2022

@msevestre Only open PR. Just adding you to the comment to make sure you know this is ready for revision.

@@ -117,7 +155,27 @@ public IEnumerable<string> NamesFromConvention()
return _importer.NamesFromConvention(_configuration.NamingConventions, _configuration.FileName, DataSets, _mappings);
}

public ImportedDataSet DataSetAt(int index)
public IDataSet DataSetAt(int index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk about this in the meeting. I don't understand what this is doing.

@msevestre
Copy link
Member

I think this is working as expected. I have my concerns regarding the deeply nested structure in DataSource.cs
I am going to merge this however and this can be addressed at some other time. I will create an issue for this

@msevestre msevestre merged commit 428ef3a into develop Feb 9, 2022
@msevestre msevestre deleted the 1409_importer_handling_of_imvalid_files branch February 9, 2022 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Importer Observed data importer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Importer: Handling of "invalid" files
2 participants