-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excel Worksheet Name Limitations #345
Excel Worksheet Name Limitations #345
Conversation
@BHoMBot check required |
@travispotterBH to confirm, the following checks are now queued:
There are 770 requests in the queue ahead of you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so the issue I have with this PR is this is trying to tackle a fire without cutting off the source (between this PR and the PR on the tool which is trying to use it).
The issue is that Excel has a limit on the length of the worksheet name to 31 characters. Currently this just crashes and prevents export to Excel where a sheet name has more than 31 characters, and while this PR prevents the crash, it doesn't tackle the fire at the source. It pushes the responsibility of making sure a worksheet name does not exceed 31 characters to the user (which can be either a human user, or a code user where the code is called by another toolkit).
If Excel later changes the limit again, say shortening it to 15 characters, this PR will ensure nothing crashes, however, if we have 3 toolkits depending on this by then, we will have 3 toolkits to update to this change. This increases maintenance overhead.
Thus, the more correct way the solve this problem is to cut off the fire at the source - in this case it means this toolkit, at the point of creating the worksheet, is the one that needs to handle the character limit, rather than bucking the responsibility downstream.
For this, I would prefer we implement an algorithm which will check if the worksheet name is valid (within range, does not contain illegal characters, etc.) and if it isn't, change the worksheet name to some given conventions to make it valid (trimming length, replacing illegal characters, etc.). A warning should be given to the user that this has happened (so they don't get surprised by the change to sheet name) but it should otherwise just be handled and the workbook created.
Doing this would prevent the need for any PR on any other toolkit handling this. Other toolkits will need to provide a suitable worksheet name still, and where it's a coded name, we can make sure it's valid, but if the validity ever changes, we only then need to change it once here rather than updating several toolkits at once. Does that make sense?
updated excel toolkit ability to validate a worksheet name based on rules: https://www.keynotesupport.com/excel-basics/worksheet-names-characters-allowed-prohibited.shtml
I created a new partial class called I am unsure if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely closer to what I had in mind @travispotterBH 😄
Ordinarily, I would say this would need to be a Query or Modify method within the Excel_Engine part of the project, however, this toolkit's engine is relevant to its UI, not its adapter, and needs to be separated which is a different exercise out of scope of this PR. As such, I am happy with it currently existing within a Validation
class instead for now as a temporary housing space.
The workflow though is a little bit involved though, and I think could be simplified into a method that looks like this:
public static string ValidateWorksheetName(string worksheetName, ...)
{
if(worksheetName.Length <=31)
return worksheetName;
//Else, do some fixes
string fixedName = worksheetName.Replace(" ", ""); //Remove spaces
//So on and so forth
return fixedName;
}
I would do away with the ModifyWorksheetName()
method purely because a worksheet name with that name would not be very beneficial to users as it could be completely random compared to the function/data on that sheet, and could also be duplicated (only going to the seconds leaves the risk of 2 quick calls coming in on the same second, and although its unlikely, its a possibility).
Thus I would go for modifying the provided name to shorten it and make it legal instead, this would result in a worksheet name that is closer to the original and provide some meaning to the worksheet.
For example, if we have a worksheet name that is Revit Spaces That Do Not Have Ducts
which is 35 chars long (the first example I had was exactly 31, not helpful 🤣 ), if we remove spaces we get RevitSpacesThatDoNotHaveDucts
which becomes 29 chars, is valid as a result, and still provides a meaningful worksheet name for the user compared to BHoM_Export_280622163004
.
An alternative is to simply truncate the worksheet name, which turns Revit Spaces That Do Not Have Ducts
into Revit Spaces That Do Not Have D
which is potentially still more relevant than BHoM_Export_280622163004
as it at least tells you that the sheet contains Revit spaces that don't have something (the something is the bit that becomes lost).
The algorithm that I would use for this would probably be a while loop with a counter to determine which action to take, for example:
int counter = 0;
while(fixedName.Length > 31)
{
if(counter == 0)
fixedName = fixedName.Replace(" ", "");
else if(counter == 1)
//Do something else that will trim down the name
else if(counter == 2)
//Do something else that will trim down the name
else if(counter == ?) //Last option standing, run out of other ways to fix
fixedName = fixedName.Substring(0, 31);
counter++;
}
Does that make sense?
@FraserGreenroyd Though, in the spirit of:
I do want to makes sure that we are adding as much robustness as possible and capturing all the possible ways a user could give an invalid name. I think that I should extend, rather than get rid of Example:
So I would create an algorithm that first removes illegal characters, then run checks again. Then if it is still too long, then truncates the string to fit length limit. I also think it is worthwhile to check the workbook for any other worksheet that might be named the exact same to avoid that issue, and offer a suggestion (probably increment at the end). |
@travispotterBH yes agreed, I was happy with the methods which confirmed a name wasn't I'm happy with any order of how we change the string (whether we remove blanks first or something else - arguably it is better to remove illegal characters first as they'll have to be removed regardless), I'd just prefer to see it done in a workflow where the string is returned at the point it becomes legal rather than a hard/fast rule of legal or not legal going to the modify method which I feel is a bit clunky in terms of programming workflow. That said we are getting closer to putting out the source of the fire, and would support a validate method which will check all of the issues you highlight in that screenshot 👍 |
@FraserGreenroyd Please check the most recent commit to this. I think I've covered all the cases, and conform to the naming suggestions that you noted. Essentially I check the name for validity and temporarily log the type of invalidity. Then under I've set it to check in the following order:
After each check, if invalid, the invalidity is logged, so the modifications to the worksheet will also happen in the same order or the logged invalidity. Modification attempts:
Given that the validity check is done after each modification, if the modification then causes a different invalid state (such as prepending a timestamp which may increase the length beyond the character limit), the process continues until fully resolved to a unique name. Example: After a few runs it reached a point where it was legal to excel, except that it was not unique and the prepended timestamp took over. I'll still want to test a bit more tomorrow to verify there are no bugs with this, but feel confident about the overall design of the algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is starting to look good now @travispotterBH , nice work 😄
While there isn't anything fundamentally wrong with what's here now (as in it'll do the job), we can improve it slightly and keep it more in line with the guidelines we have for BHoM code (remembering that BHoM code is designed to be understood by engineers rather than computer scientists which does add its own unique flavour to the code base).
A key part is the use of the out
keyword to obtain additional outputs from a method. On the one hand, we could use C#7 and use multiple return values from a method, but we don't currently support the use of C#7 in the code base (that's an ongoing discussion currently). So the other way to handle it also allows us to keep track of only one variable rather than 2 in the first place. For this, if our IsValidName
method returns a List<ValidationError>
objects, then we can infer whether the name is valid by whether or not the count
of the list is 0 or not. If validationIssues.Count == 0
then we can infer we have no validation issues, and the name is thus valid. The while
loop can then become a while(validationIssues.Count > 0)
to achieve the same effect as you're proposing currently, but now we don't need to keep track of the bool
variable as well. That said, using a list doesn't make much sense given we only work with the first item when modifying the string. So we either need to change the modification to work on all the options for invalidity, or remove the list and use a single object that returns null
when everything is valid instead.
For the reserved words solution, I think it would be sufficient to simply prepend BHoM
to the start rather than renaming the sheet completely. Someone wanted the reserved word as the original sheet name for a reason, so we should aim to accommodate that as much as possible within the constraints of Excel. In this case, History
is reserved but BHoM History
isn't from my understanding of the screenshot, so changing to BHoM History
is slightly more beneficial than BHoM_Export_290622102534
for example.
I've also put some comments around some specific code parts to do a batch update on, but I think we're nearly there with this now 👍
Thanks :) :)
Makes since! I updated to get rid of the bools and the list. I now use the switch statements to return the enum value for the issue directly. This eliminates both the out and the list, as will cleans up the code a bit. Now the trigger on the while loop is based on the enum value, and if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor change on the alphabetising of the enum and then I think this is good from a code review perspective @travispotterBH 👍 I'll run compliance checks now too just to check if we have any issues from there to consider
@BHoMBot check compliance |
@FraserGreenroyd to confirm, the following checks are now queued:
There are 3 requests in the queue ahead of you. |
Compliance looks happy @travispotterBH so just the final comments and this should be good from the code-review perspective 👍 |
@BHoMBot check required |
@FraserGreenroyd to confirm, the following checks are now queued:
|
@travispotterBH to confirm, the following checks are now queued:
There are 9 requests in the queue ahead of you. |
@BHoMBot check copyright-compliance |
@travispotterBH to confirm, the following checks are now queued:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed and modified per PR comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review now looks good, merging this to unblock a UI/Adapter split later today.
@BHoMBot check ready-to-merge |
@FraserGreenroyd to confirm, the following checks are now queued:
|
Issues addressed by this PR
Closes #344
Excel Adapter now alerts the user by recording an error when the name of a data table that is intended to be exported to excel is too long to be used as the name of a worksheet. This is favorable to automatically renaming the worksheet to something that conforms to the 31 character limit since that would be an unexpected behaviour from to view of the user .
Test files
Changelog
ExcelAdapter.Create()
Create method modified to return an error when table name length exceeds Excel's limitations.Additional comments