-
-
Notifications
You must be signed in to change notification settings - Fork 854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Implement Tiff codec #119
[WIP] Implement Tiff codec #119
Conversation
Codecov Report
@@ Coverage Diff @@
## master #119 +/- ##
=========================================
Coverage ? 89.59%
=========================================
Files ? 1038
Lines ? 45854
Branches ? 3255
=========================================
Hits ? 41081
Misses ? 4058
Partials ? 715
Continue to review full report at Codecov.
|
Just seeding this [WIP] PR with the initial project structure. A bit of a mutant at the moment as I've only got the dotnet-preview4 tooling (.csproj based) on my dev PC at the moment, but this should all converge well before this gets merged. I've also introduced a separate unit testing project for rapid TDD, but this can be combined into the main test project before merging. Quick comment on the |
Great to see this opened! |
|
||
public TiffIfd ReadIfd(uint offset) | ||
{ | ||
InputStream.Seek(offset, SeekOrigin.Begin); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can't assume that the TIFF image starts at the beginning of the stream. (Or at least we don't do this in other codecs.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we can't assume that. Someone could have multiple images in a stream or other information. Best we can do is look for the identifier at the position we are currently at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay... The easy fix is to store the offset that we start at, and do all Seeks relative to this, i.e. InputStream.Seek(StartOffset + offset, SeekOrigin.Begin)
.
I'd really love to get rid of the Seeks entirely, but unfortunately the TIFF file-format relies heavily on referencing offsets within the file. There is no way you can tell what the next bit of data is otherwise. As an example a file layout could be,
0-8 -> TIFF file header (8 bytes) - references first IFD at offset 30000
8-10000 -> Raw image data here
30000-... -> IFD (basically an image header) - references image data at offset 8
Until you read the IFD at offset 30000, you have no way of telling what data is at offset 8 - it could be image data in an unknown compression/format, another IFD, strings from the metadata. The format doesn't have any identifiers to let you know until you have read the IFD itself.
Any clever ideas to avoid Seeks would be great though! 😄
throw new ImageFormatException($"A value of type '{entry.Type}' cannot be converted to a SignedRational."); | ||
|
||
byte[] bytes = ReadBytes(ref entry); | ||
SignedRational[] result = new SignedRational[entry.Count]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it possible to do buffered read without allocating new arrays?
You can store them as pre-allocated members, if the arrays are expected to have a known upper limit. If the limit is not known and/or or the arrays are too big, we should introduce the following utility class to reuse buffers:
class AutoGrowBuffer<T> : IDisposable
{
private PinnedBuffer<T> currentBuffer;
public AutoGrowBuffer(int initialSize)
{
this.currentBuffer = new PinnedBuffer<T>(initialSize);
}
// BufferSpan<T> should be very similar to BufferPointer<T> (or a replacement!), having a Length property. I will implement it, if we agree on the design.
public BufferSpan<T> GetBufferSpan(int length)
{
if (this.currentBuffer.Array.Length < length)
{
this.currentBuffer.Dispose();
this.currentBuffer = new PinnedBuffer<T>(length);
}
return new BufferSpan<T>(this.currentBuffer, length);
}
public void Dispose()
{
this.currentBuffer .Dispose();
}
}
// USAGE:
private AutoGrowBuffer<SignedRational> signedRationalBuffer;
// BufferSpan<T> should be very similar to BufferPointer<T> (or a replacement!), having a Length property. I will implement it, if we agree on the design.
public BufferSpan<SignedRational> ReadSignedRationalSpan(ref TiffIfdEntry entry)
{
....
BufferSpan<SignedRational> result = this.signedRationalBuffer.GetBufferSpan(entry.Count);
....
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used known max values in the past. That would be my favored approach
For additional larger values I've rented/returned the buffer and made sure I was careful reading them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like that class though. Clever idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely a place for Span-like stuff. Better encapsulation for data-chunk operations without unecessary allocations! I think I need to turn my 'BufferPointer' into 'BufferSpan'. It will be almost System.Span compatible then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be great! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea... There's definitely a place for sharing buffers in managing the performance of the existing code. I'm not sure how much benefit there will be for SignedRational
arrays - they are pretty rare in TIFF files and I included them mainly for completeness. I'll get more of a feel for this as I try out a number of sample files.
On the other hand - I'm allocating a new byte[] in the GetBytes(...)
method that should really be allocated once and reused. The returned byte[] doesn't even need to be the correct length, so I can make a good guess of the maximum likely size, and revert to a new byte[] only if there is a need to exceed this... I'll look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! We dont need to bother with uncommon cases, I just wanted to give some general hints based on the code :)
|
||
private Int32 ToInt32(byte[] bytes, int offset) | ||
{ | ||
if (IsLittleEndian) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JimBobSquarePants has introduced several classes to deal with endianness in streams. Shouldn't we use them in Tiff decoder too?
Their implementations definitely need some optimization by eliminating super-frequent virtual calls, but we could follow an "optimize once, win everywhere" approach by using them!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like this to be a thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a definitely a good rationale for centralising this logic. I'll see if I can do this in the future once I've got the main features working.
|
||
[Theory] | ||
[MemberData(nameof(IsLittleEndianValues))] | ||
public void ReadHeader_ReadsEndianness(bool isLittleEndian) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely <3 this approach in testing!
I wish all our codecs were under this level of unit testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, great stuff. The others really should be. We should add a chore tag for issues so we can do this 😝
@Andy-Wilkinson I wanted to try the code, but I couln't find a working solution. |
Yeah - Sorry about that @antonfirsov . It is a bit of a mutant project at the with a mix of the project.json/.csproj approaches. I see that @JimBobSquarePants has merged the VS2017 branch, so I'll try to get everything consistent soon! |
@Andy-Wilkinson Hoping the merge makes things a lot easier for you! |
Thanks @JimBobSquarePants . The VS2017 merge has allowed me to bring everything together into the main ImageSharp solution. Just wishing now that I'd enabled StyleCop and Doc comments from the start! 😉 Almost there with fixing the warnings, and everything will be right going forward. |
@Andy-Wilkinson I've had a go at fixing those merge conflicts for you but I could have made a mistake. Please let me know I've I've broken things for you. |
I managed to update your fork with the latest code from our master with all test passing. I had a look at the effort, so far.. It's absolutely incredible! 👍 The API is solidifying now so we should be able to make the effort to help you out a lot more to complete this (We should have helped more earlier). There's a lot of new performance goodies on offer within the codebase that we can apply in some places (memory management for example). I'd love to get the ball rolling again so let me know when you have some free time. |
Just trying to keep this so that it builds and doesn't get too out of date. |
Just dropping you a message to see whether this PR is something that that you will be able to continue with or whether we should merge it now into our own separate branch to continue working on? Ideally we would love to have your help but if that's not possible that's totally cool also. Additionally would it be possible to re-sign the CLA? We lost a lot of signatures by accident when we globalized the tool across all our repositories. Cheers James |
Apologies that there's not been much progress recently - to be honest, I've not had time for much dev work so I've tended to drop in and out of my own experimental mini-projects when I get chance. The TIFF codec is going to require more of a concerted effort so it is probably best if you merge into a separate branch. It will open it up to multiple contributors to provide smaller PRs (I might get time for small chunks too). Cheers, PS. I've signed the CLA. |
Thanks for the update, I appreciate it and all your amazing efforts so far. I'll merge this in as-is then into a new |
[WIP] Implement Tiff codec
[WIP] Implement Tiff codec
Prerequisites
Description
This pull request implements (or will implement when complete) a TIFF decoder and encoder for ImageSharp. A new library and NuGet package will be introduced (
ImageSharp.Formats.Tiff
) in line with the existing formats.Feature Checklist
Compression Types
Photometric Interpretation Formats