Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excel Worksheet Name Limitations #345

Merged

Conversation

travispotterBH
Copy link

@travispotterBH travispotterBH commented Jun 23, 2022

Issues addressed by this PR

Closes #344

Excel Adapter now alerts the user by recording an error when the name of a data table that is intended to be exported to excel is too long to be used as the name of a worksheet. This is favorable to automatically renaming the worksheet to something that conforms to the 31 character limit since that would be an unexpected behaviour from to view of the user .

Test files

Changelog

  • ExcelAdapter.Create() Create method modified to return an error when table name length exceeds Excel's limitations.

Additional comments

@travispotterBH travispotterBH added the type:bug Error or unexpected behaviour label Jun 23, 2022
@travispotterBH travispotterBH self-assigned this Jun 23, 2022
@travispotterBH travispotterBH changed the title Excel_Adapter-#Excel Worksheet Name Limitations Excel_Adapter-Excel Worksheet Name Limitations Jun 23, 2022
@travispotterBH
Copy link
Author

@BHoMBot check required

@bhombot-ci
Copy link

bhombot-ci bot commented Jun 24, 2022

@travispotterBH to confirm, the following checks are now queued:

  • code-compliance
  • documentation-compliance
  • project-compliance
  • core
  • null-handling
  • serialisation
  • versioning
  • installer

There are 770 requests in the queue ahead of you.

Copy link
Contributor

@FraserGreenroyd FraserGreenroyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so the issue I have with this PR is this is trying to tackle a fire without cutting off the source (between this PR and the PR on the tool which is trying to use it).

The issue is that Excel has a limit on the length of the worksheet name to 31 characters. Currently this just crashes and prevents export to Excel where a sheet name has more than 31 characters, and while this PR prevents the crash, it doesn't tackle the fire at the source. It pushes the responsibility of making sure a worksheet name does not exceed 31 characters to the user (which can be either a human user, or a code user where the code is called by another toolkit).

If Excel later changes the limit again, say shortening it to 15 characters, this PR will ensure nothing crashes, however, if we have 3 toolkits depending on this by then, we will have 3 toolkits to update to this change. This increases maintenance overhead.

Thus, the more correct way the solve this problem is to cut off the fire at the source - in this case it means this toolkit, at the point of creating the worksheet, is the one that needs to handle the character limit, rather than bucking the responsibility downstream.

For this, I would prefer we implement an algorithm which will check if the worksheet name is valid (within range, does not contain illegal characters, etc.) and if it isn't, change the worksheet name to some given conventions to make it valid (trimming length, replacing illegal characters, etc.). A warning should be given to the user that this has happened (so they don't get surprised by the change to sheet name) but it should otherwise just be handled and the workbook created.

Doing this would prevent the need for any PR on any other toolkit handling this. Other toolkits will need to provide a suitable worksheet name still, and where it's a coded name, we can make sure it's valid, but if the validity ever changes, we only then need to change it once here rather than updating several toolkits at once. Does that make sense?

travispotterBH added 2 commits June 28, 2022 10:04
@travispotterBH
Copy link
Author

travispotterBH commented Jun 28, 2022

@FraserGreenroyd

I created a new partial class called Validation which include Validation.WorksheetName. Here I added an algorithm that checks the worksheet name against a set of rules and offers an alternate name if the name does not comply. The alternate name is simply "BHoM_Export_ddMMYY_HHmmss". I am open to suggestions on the alternate name. I figured this was a good alternate since we do not know what type of tables that may be used to create worksheets. They may not always be logs.

I am unsure if Excel.Adapter is an acceptable place for the Validation partial class, and am open to comment.

https://github.com/BHoM/Excel_Toolkit/blob/7507fdf3858cd9d258b54d90d39c01a5ae9810d0/Excel_Adapter/Validation/WorksheetName.cs#L28-L100

Copy link
Contributor

@FraserGreenroyd FraserGreenroyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely closer to what I had in mind @travispotterBH 😄

Ordinarily, I would say this would need to be a Query or Modify method within the Excel_Engine part of the project, however, this toolkit's engine is relevant to its UI, not its adapter, and needs to be separated which is a different exercise out of scope of this PR. As such, I am happy with it currently existing within a Validation class instead for now as a temporary housing space.

The workflow though is a little bit involved though, and I think could be simplified into a method that looks like this:

public static string ValidateWorksheetName(string worksheetName, ...)
{
    if(worksheetName.Length <=31)
        return worksheetName;

    //Else, do some fixes
    string fixedName = worksheetName.Replace(" ", ""); //Remove spaces
    
    //So on and so forth
    return fixedName;
}

I would do away with the ModifyWorksheetName() method purely because a worksheet name with that name would not be very beneficial to users as it could be completely random compared to the function/data on that sheet, and could also be duplicated (only going to the seconds leaves the risk of 2 quick calls coming in on the same second, and although its unlikely, its a possibility).

Thus I would go for modifying the provided name to shorten it and make it legal instead, this would result in a worksheet name that is closer to the original and provide some meaning to the worksheet.

For example, if we have a worksheet name that is Revit Spaces That Do Not Have Ducts which is 35 chars long (the first example I had was exactly 31, not helpful 🤣 ), if we remove spaces we get RevitSpacesThatDoNotHaveDucts which becomes 29 chars, is valid as a result, and still provides a meaningful worksheet name for the user compared to BHoM_Export_280622163004.

An alternative is to simply truncate the worksheet name, which turns Revit Spaces That Do Not Have Ducts into Revit Spaces That Do Not Have D which is potentially still more relevant than BHoM_Export_280622163004 as it at least tells you that the sheet contains Revit spaces that don't have something (the something is the bit that becomes lost).

The algorithm that I would use for this would probably be a while loop with a counter to determine which action to take, for example:

int counter = 0;
while(fixedName.Length > 31)
{
    if(counter == 0)
        fixedName = fixedName.Replace(" ", "");
    else if(counter == 1)
        //Do something else that will trim down the name
    else if(counter == 2)
        //Do something else that will trim down the name
    else if(counter == ?) //Last option standing, run out of other ways to fix
        fixedName = fixedName.Substring(0, 31);

    counter++;
}

Does that make sense?

Excel_Adapter/CRUD/Create/Create.cs Outdated Show resolved Hide resolved
@travispotterBH
Copy link
Author

@FraserGreenroyd
Overall agreed.

Though, in the spirit of:

Thus I would go for modifying the provided name to shorten it and make it legal instead, this would result in a worksheet name that is closer to the original and provide some meaning to the worksheet.

I do want to makes sure that we are adding as much robustness as possible and capturing all the possible ways a user could give an invalid name.
image
So I don't think it is a simple as iterating to trim down to length.

I think that I should extend, rather than get rid of ModifyWorksheetName() so that it rids the string of each illegal element, while still conforming to your suggestion of a descriptive title.

Example:

'PressureCalc : Export/Log [Excel]' would be a very illegal name.
It fails the length check, valid characters checks, ending/begining with " ' " check, etc.

So I would create an algorithm that first removes illegal characters, then run checks again. Then if it is still too long, then truncates the string to fit length limit.

I also think it is worthwhile to check the workbook for any other worksheet that might be named the exact same to avoid that issue, and offer a suggestion (probably increment at the end).

@FraserGreenroyd
Copy link
Contributor

@travispotterBH yes agreed, I was happy with the methods which confirmed a name wasn't history and didn't contain illegal characters.

I'm happy with any order of how we change the string (whether we remove blanks first or something else - arguably it is better to remove illegal characters first as they'll have to be removed regardless), I'd just prefer to see it done in a workflow where the string is returned at the point it becomes legal rather than a hard/fast rule of legal or not legal going to the modify method which I feel is a bit clunky in terms of programming workflow.

That said we are getting closer to putting out the source of the fire, and would support a validate method which will check all of the issues you highlight in that screenshot 👍

@travispotterBH
Copy link
Author

travispotterBH commented Jun 28, 2022

@FraserGreenroyd Please check the most recent commit to this. I think I've covered all the cases, and conform to the naming suggestions that you noted.

Essentially I check the name for validity and temporarily log the type of invalidity. Then under ModifyWorksheetName() I check the first invalidity and run a method to resolve that invalidity. Then the IsValidName method is run again to see if the modification resolved the worksheet name to a valid name. If not, the loop continues until it is.

I've set it to check in the following order:

  1. Uniqueness
  2. Null or Whitespace
  3. Reserved Words
  4. Invalid Characters
  5. Begin/End with invalid characters (important distinction being that " ' " is not invalid unless in the first or last position of the name.
  6. Character Limit

After each check, if invalid, the invalidity is logged, so the modifications to the worksheet will also happen in the same order or the logged invalidity.

Modification attempts:
For each check, the modification that is attempted is as follows:

  1. Uniqueness
    Prepend timestamp (important to prepend since later on if we need to trim the string for character limit we do so by dropping the ending.)

  2. Null or Whitespace
    If null or blank, give generalized name of "BHoM_Export_" + timestamp

  3. Reserved Words
    if is a reserved word, , give generalized name of "BHoM_Export_" + timestamp

  4. Invalid Characters
    remove invalid characters

  5. Begin/End with invalid characters (important distinction being that " ' " is not invalid unless in the first or last position of the name.
    remove invalid characters

  6. Character Limit
    trim string to match character limits

Given that the validity check is done after each modification, if the modification then causes a different invalid state (such as prepending a timestamp which may increase the length beyond the character limit), the process continues until fully resolved to a unique name.

Example:
The invalid name I noted above:
'PressureCalc : Export/Log [Excel]' would be a very illegal name.

resolves to:
image

After a few runs it reached a point where it was legal to excel, except that it was not unique and the prepended timestamp took over.

I'll still want to test a bit more tomorrow to verify there are no bugs with this, but feel confident about the overall design of the algorithm.

Copy link
Contributor

@FraserGreenroyd FraserGreenroyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look good now @travispotterBH , nice work 😄

While there isn't anything fundamentally wrong with what's here now (as in it'll do the job), we can improve it slightly and keep it more in line with the guidelines we have for BHoM code (remembering that BHoM code is designed to be understood by engineers rather than computer scientists which does add its own unique flavour to the code base).

A key part is the use of the out keyword to obtain additional outputs from a method. On the one hand, we could use C#7 and use multiple return values from a method, but we don't currently support the use of C#7 in the code base (that's an ongoing discussion currently). So the other way to handle it also allows us to keep track of only one variable rather than 2 in the first place. For this, if our IsValidName method returns a List<ValidationError> objects, then we can infer whether the name is valid by whether or not the count of the list is 0 or not. If validationIssues.Count == 0 then we can infer we have no validation issues, and the name is thus valid. The while loop can then become a while(validationIssues.Count > 0) to achieve the same effect as you're proposing currently, but now we don't need to keep track of the bool variable as well. That said, using a list doesn't make much sense given we only work with the first item when modifying the string. So we either need to change the modification to work on all the options for invalidity, or remove the list and use a single object that returns null when everything is valid instead.

For the reserved words solution, I think it would be sufficient to simply prepend BHoM to the start rather than renaming the sheet completely. Someone wanted the reserved word as the original sheet name for a reason, so we should aim to accommodate that as much as possible within the constraints of Excel. In this case, History is reserved but BHoM History isn't from my understanding of the screenshot, so changing to BHoM History is slightly more beneficial than BHoM_Export_290622102534 for example.

I've also put some comments around some specific code parts to do a batch update on, but I think we're nearly there with this now 👍

Excel_Adapter/Validation/WorksheetName.cs Outdated Show resolved Hide resolved
Excel_Adapter/Validation/WorksheetName.cs Show resolved Hide resolved
Excel_Adapter/Validation/WorksheetName.cs Outdated Show resolved Hide resolved
Excel_Adapter/Validation/WorksheetName.cs Outdated Show resolved Hide resolved
Excel_Adapter/Validation/WorksheetName.cs Outdated Show resolved Hide resolved
Excel_Adapter/Validation/WorksheetName.cs Outdated Show resolved Hide resolved
Excel_Adapter/Validation/WorksheetName.cs Outdated Show resolved Hide resolved
Excel_Adapter/Validation/WorksheetName.cs Outdated Show resolved Hide resolved
Excel_Adapter/Validation/WorksheetName.cs Outdated Show resolved Hide resolved
Excel_Adapter/Validation/WorksheetName.cs Outdated Show resolved Hide resolved
@travispotterBH
Copy link
Author

travispotterBH commented Jun 29, 2022

This is starting to look good now @travispotterBH , nice work 😄

Thanks :) :)

A key part is the use of the out keyword to obtain additional outputs from a method. On the one hand, we could use C#7 and use multiple return values from a method, but we don't currently support the use of C#7 in the code base (that's an ongoing discussion currently). So the other way to handle it also allows us to keep track of only one variable rather than 2 in the first place. For this, if our IsValidName method returns a List<ValidationError> objects, then we can infer whether the name is valid by whether or not the count of the list is 0 or not. If validationIssues.Count == 0 then we can infer we have no validation issues, and the name is thus valid. The while loop can then become a while(validationIssues.Count > 0) to achieve the same effect as you're proposing currently, but now we don't need to keep track of the bool variable as well. That said, using a list doesn't make much sense given we only work with the first item when modifying the string. So we either need to change the modification to work on all the options for invalidity, or remove the list and use a single object that returns null when everything is valid instead.

For the reserved words solution, I think it would be sufficient to simply prepend BHoM to the start rather than renaming the sheet completely. Someone wanted the reserved word as the original sheet name for a reason, so we should aim to accommodate that as much as possible within the constraints of Excel. In this case, History is reserved but BHoM History isn't from my understanding of the screenshot, so changing to BHoM History is slightly more beneficial than BHoM_Export_290622102534 for example.

Makes since! I updated to get rid of the bools and the list. I now use the switch statements to return the enum value for the issue directly. This eliminates both the out and the list, as will cleans up the code a bit. Now the trigger on the while loop is based on the enum value, and if the issue != WorksheetValidation.Valid then the code will continue until it is.

https://github.com/BHoM/Excel_Toolkit/blob/8ef28092835b04aae1236c625cbedbe6cc950b70/Excel_Adapter/Validation/WorksheetName.cs#L60-L82

https://github.com/BHoM/Excel_Toolkit/blob/8ef28092835b04aae1236c625cbedbe6cc950b70/Excel_Adapter/Validation/WorksheetName.cs#L40-L45

Copy link
Contributor

@FraserGreenroyd FraserGreenroyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor change on the alphabetising of the enum and then I think this is good from a code review perspective @travispotterBH 👍 I'll run compliance checks now too just to check if we have any issues from there to consider

Excel_Adapter/Validation/Enums/WorksheetValidation.cs Outdated Show resolved Hide resolved
Excel_Adapter/Validation/WorksheetName.cs Outdated Show resolved Hide resolved
Excel_Adapter/Validation/WorksheetName.cs Outdated Show resolved Hide resolved
Excel_Adapter/Validation/WorksheetName.cs Outdated Show resolved Hide resolved
@FraserGreenroyd
Copy link
Contributor

@BHoMBot check compliance

@bhombot-ci
Copy link

bhombot-ci bot commented Jun 29, 2022

@FraserGreenroyd to confirm, the following checks are now queued:

  • code-compliance
  • documentation-compliance
  • project-compliance
  • branch-compliance
  • dataset-compliance
  • copyright-compliance

There are 3 requests in the queue ahead of you.

@FraserGreenroyd
Copy link
Contributor

Compliance looks happy @travispotterBH so just the final comments and this should be good from the code-review perspective 👍

@FraserGreenroyd
Copy link
Contributor

@BHoMBot check required

@bhombot-ci
Copy link

bhombot-ci bot commented Jun 29, 2022

@FraserGreenroyd to confirm, the following checks are now queued:

  • code-compliance
  • documentation-compliance
  • project-compliance
  • core
  • null-handling
  • serialisation
  • versioning
  • installer

@travispotterBH
Copy link
Author

@BHoMBot check dataset-compliance
@BHoMBot check ready-to-merge

@bhombot-ci
Copy link

bhombot-ci bot commented Jun 30, 2022

@travispotterBH to confirm, the following checks are now queued:

  • dataset-compliance
  • ready-to-merge

There are 9 requests in the queue ahead of you.

@travispotterBH
Copy link
Author

@BHoMBot check copyright-compliance

@bhombot-ci
Copy link

bhombot-ci bot commented Jul 1, 2022

@travispotterBH to confirm, the following checks are now queued:

  • copyright-compliance

Copy link
Author

@travispotterBH travispotterBH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed and modified per PR comments

Copy link
Contributor

@FraserGreenroyd FraserGreenroyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review now looks good, merging this to unblock a UI/Adapter split later today.

@FraserGreenroyd
Copy link
Contributor

@BHoMBot check ready-to-merge

@bhombot-ci
Copy link

bhombot-ci bot commented Jul 4, 2022

@FraserGreenroyd to confirm, the following checks are now queued:

  • ready-to-merge

@FraserGreenroyd FraserGreenroyd merged commit f23b9a2 into main Jul 4, 2022
@FraserGreenroyd FraserGreenroyd deleted the Excel_Adapter-#344-Excel-Worksheet-Name-Limits branch July 4, 2022 08:15
@FraserGreenroyd FraserGreenroyd changed the title Excel_Adapter-Excel Worksheet Name Limitations Excel Worksheet Name Limitations Aug 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Error or unexpected behaviour
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adapter: Excel limitation of 31 character for worksheet name
2 participants