-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #121 #124
Fix #121 #124
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -412,26 +412,86 @@ public override string ReadScalarValue(System.IO.BinaryReader br) | |||
uint size = br.ReadUInt32(); | ||||
data = br.ReadBytes((int)size); | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it actually is possible for |
||||
|
||||
return Encoding.GetEncoding(codePage).GetString(data); | ||||
var result = Encoding.GetEncoding(codePage).GetString(data); | ||||
//result = result.Trim(new char[] { '\0' }); | ||||
|
||||
//if (this.codePage == CodePages.CP_WINUNICODE) | ||||
//{ | ||||
// result = result.Substring(0, result.Length - 2); | ||||
//} | ||||
//else | ||||
//{ | ||||
// result = result.Substring(0, result.Length - 1); | ||||
//} | ||||
|
||||
return result; | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had taken a stab at this a few weeks ago, and ended up with this:
I thought this approach would save some allocations, rather than relying on OLEProperty.Value to Trim every time it's called. |
||||
} | ||||
|
||||
public override void WriteScalarValue(BinaryWriter bw, string pValue) | ||||
{ | ||||
data = Encoding.GetEncoding(codePage).GetBytes(pValue); | ||||
uint dataLength = (uint)data.Length; | ||||
//bool addNullTerminator = true; | ||||
|
||||
if (String.IsNullOrEmpty(pValue)) //|| String.IsNullOrEmpty(pValue.Trim(new char[] { '\0' }))) | ||||
{ | ||||
bw.Write((uint)0); | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does the zero length string case still need to be null terminated? (the docs at https://learn.microsoft.com/en-us/openspecs/office_file_formats/ms-oshared/fac324c9-ff39-442e-bd18-1a91a723a818 sound unclear to me when it says "SHOULD specify the number of characters in the value field including the terminating NULL character" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No: if i'm reading the correctly, current ms-oleps states that if string is zero length, characters field shouldn't be present (2.5) so the zero length string IS a valid case imho There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, after reading the docs for CodePageString rather than Lpstr I was going to agree with you, but then I tried writing a .doc file with a zero length user defined property string using this branch, and when I tried looking at the properties with Windows Explorer, the whole of Explorer crashed :-( (Though the file displays correctly in Word) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's different verbiage here: https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-oleps/a4c32611-5b79-4965-8f50-50639c138e16 That seems to say that if the length is zero, you write nothing.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The behavior potentially depends on if you're treating the strings as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, some of these specs are really confusing to interpret. |
||||
} | ||||
else if (this.codePage == CodePages.CP_WINUNICODE) | ||||
{ | ||||
|
||||
// The string data must be null terminated, so add a null byte if there isn't one already | ||||
bool addNullTerminator = | ||||
dataLength == 0 || data[dataLength - 1] != '\0'; | ||||
data = Encoding.GetEncoding(codePage).GetBytes(pValue); | ||||
|
||||
//if (data.Length >= 2 && data[data.Length - 2] == '\0' && data[data.Length - 1] == '\0') | ||||
// addNullTerminator = false; | ||||
|
||||
uint dataLength = (uint)data.Length; | ||||
|
||||
//if (addNullTerminator) | ||||
dataLength += 2; // null terminator \u+0000 | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MSOLEPS says this should be the length in characters (not bytes) including the null terminator but not including the padding (if any) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Possibly the write should be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No, MS-OLEPS for CodeString says this:
The semantic of Size field is different between CodePage String with CP_WINUNICODE and Unicode String |
||||
|
||||
// var mod = dataLength % 4; // pad to multiple of 4 bytes | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't unicode strings also have to be padded to 4 bytes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's another set of padding code up where this is called from (
|
||||
|
||||
bw.Write(dataLength); // datalength of string + null char (unicode) | ||||
bw.Write(data); // string | ||||
|
||||
|
||||
//if (addNullTerminator) | ||||
//{ | ||||
bw.Write('\0'); // first byte of null unicode char | ||||
bw.Write('\0'); // second byte of null unicode char | ||||
//} | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So for writing the null terminator + padding, what about this for non-unicode LPSTR ...
This way you're not issuing individual writes for each null padding, and it includes the null terminator. For unicode LPWSTR just needs minimal modification:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Padding is generalized in Parent method |
||||
|
||||
//for (int i = 0; i < (4 - mod); i++) // padding | ||||
// bw.Write('\0'); | ||||
} | ||||
else | ||||
{ | ||||
data = Encoding.GetEncoding(codePage).GetBytes(pValue); | ||||
|
||||
//if (data.Length >= 1 && data[data.Length - 1] == '\0') | ||||
// addNullTerminator = false; | ||||
|
||||
uint dataLength = (uint)data.Length; | ||||
|
||||
//if (addNullTerminator) | ||||
dataLength += 1; // null terminator \u+0000 | ||||
|
||||
var mod = dataLength % 4; // pad to multiple of 4 bytes | ||||
|
||||
bw.Write(dataLength); // datalength of string + null char (unicode) | ||||
bw.Write(data); // string | ||||
|
||||
|
||||
//if (addNullTerminator) | ||||
//{ | ||||
bw.Write('\0'); // null terminator'\0' | ||||
//} | ||||
|
||||
//for (int i = 0; i < (4 - mod); i++) // padding | ||||
// bw.Write('\0'); | ||||
} | ||||
|
||||
if (addNullTerminator) | ||||
dataLength += 1; | ||||
|
||||
bw.Write(dataLength); | ||||
bw.Write(data); | ||||
|
||||
if (addNullTerminator) | ||||
bw.Write((byte)0); | ||||
} | ||||
} | ||||
|
||||
|
@@ -449,8 +509,11 @@ public VT_LPWSTR_Property(VTPropertyType vType, int codePage, bool isVariant) : | |||
public override string ReadScalarValue(System.IO.BinaryReader br) | ||||
{ | ||||
uint nChars = br.ReadUInt32(); | ||||
data = br.ReadBytes((int)(nChars * 2)); //WChar | ||||
return Encoding.Unicode.GetString(data); | ||||
data = br.ReadBytes((int)((nChars * 2) - 2)); //WChar- null terminator | ||||
var result = Encoding.Unicode.GetString(data); | ||||
//result = result.Trim(new char[] { '\0' }); | ||||
|
||||
return result; | ||||
} | ||||
|
||||
public override void WriteScalarValue(BinaryWriter bw, string pValue) | ||||
|
@@ -460,20 +523,25 @@ public override void WriteScalarValue(BinaryWriter bw, string pValue) | |||
// The written data length field is the number of characters (not bytes) and must include a null terminator | ||||
// add a null terminator if there isn't one already | ||||
var byteLength = data.Length; | ||||
bool addNullTerminator = | ||||
byteLength == 0 || data[byteLength - 1] != '\0' || data[byteLength - 2] != '\0'; | ||||
|
||||
if (addNullTerminator) | ||||
byteLength += 2; | ||||
//bool addNullTerminator = | ||||
// byteLength == 0 || data[byteLength - 1] != '\0' || data[byteLength - 2] != '\0'; | ||||
|
||||
//if (addNullTerminator) | ||||
byteLength += 2; | ||||
|
||||
bw.Write((uint)byteLength / 2); | ||||
bw.Write(data); | ||||
|
||||
if (addNullTerminator) | ||||
{ | ||||
bw.Write((byte)0); | ||||
bw.Write((byte)0); | ||||
} | ||||
//if (addNullTerminator) | ||||
//{ | ||||
bw.Write((byte)0); | ||||
bw.Write((byte)0); | ||||
//} | ||||
|
||||
//var mod = byteLength % 4; // pad to multiple of 4 bytes | ||||
//for (int i = 0; i < (4 - mod); i++) // padding | ||||
// bw.Write('\0'); | ||||
} | ||||
} | ||||
|
||||
|
@@ -645,7 +713,7 @@ public override void WriteScalarValue(BinaryWriter bw, object pValue) | |||
} | ||||
} | ||||
|
||||
#endregion | ||||
#endregion | ||||
|
||||
} | ||||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be using
Trim
orTrimEnd
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my interpretation of specs, it sounds better as a fulltrim. Only embedded nulls are preserved in the presentation. Any thought on this? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for
CodePageString
saysWhich sounds like you can take whichever approach you want, and I do like the idea of being able to see as much of the data as possible.
However, for comparison, it does look to me like if I use OpenMcdf to write a user property with leading and trailing nulls then niether Windows Explorer nor Word will display the rest of the data:
data:image/s3,"s3://crabby-images/7126f/7126f311ad3d0f5581e52bf14a45401d42d7bb07" alt="image"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is anyone worried about performance for the check and trim on every call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-oleps/9660cb24-953a-4e60-adf2-37cc0e779d19
So, I think you're safe with
TrimEnd