Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JpegEncoder producing incorrect output on windows-arm (was working on beta0005) #1275

Closed
4 tasks done
peter-bozovic opened this issue Jul 14, 2020 · 47 comments · Fixed by #1304
Closed
4 tasks done

JpegEncoder producing incorrect output on windows-arm (was working on beta0005) #1275

peter-bozovic opened this issue Jul 14, 2020 · 47 comments · Fixed by #1304

Comments

@peter-bozovic
Copy link

peter-bozovic commented Jul 14, 2020

Prerequisites

  • I have written a descriptive issue title
  • I have verified that I am running the latest version of ImageSharp
  • I have verified if the problem exist in both DEBUG and RELEASE mode
  • I have searched open and closed issues to ensure it has not already been reported

Description

I already found the issue #871 which seems to be resolved, but as the issue was for linux-arm this one is maybe specific to windows-arm.
On beta0005, the image was saving correctly, on rc3 it is kinda grayish with squares.

Original image:
image

Resized image:
image

Steps to Reproduce

Just loading and saving the image:

// Loading originalImage
using (var targetImage = new MemoryStream())
{
    using (var image = SixLabors.ImageSharp.Image.Load(originalImage))
    {
        image.SaveAsJpeg(targetImage);
    }
}
// Saving targetImage 

System Configuration

  • ImageSharp version: 1.0.0-rc0003
  • Other ImageSharp packages and versions: N/A
  • Environment (Operating system, version and so on): Windows 10 IoT Core 10.0.17763.1217
  • .NET Framework version: UWP - Windows 10, version 1903
  • Additional information: Application running on Raspberry PI 3 B / ARM architecture
@JimBobSquarePants
Copy link
Member

@Sergio0694 @MattWhilden This um... This looks right up your street.

@Sergio0694
Copy link
Member

@peter-bozovic Well for one it's weird that you're seeing this both in DEBUG and RELEASE modes on UWP, as those use two completely different runtimes (UWP-flavored .NET Core 2.1-like runtime in DEBUG vs. .NET Native in RELEASE). First thing, what I would suggest is to first try to isolate the issue a bit more, eg.:

  • You haven't specified this, I assume you did check to be using the latest .NETCore-UWP lib? That'd be 6.2.10 I think.
  • Can you repro this with another framework? Eg. does this happen with a .NET Core 2.1/3.1 or .NET 5 console app too?
  • Can you try this with the previous RC builds too, to see if this has been there for a while or somehow started recently?

@JimBobSquarePants FYI Matthew is no longer in the .NET Native team (😢), pinging @tommcdon for more info 👍

@peter-bozovic
Copy link
Author

Yes, it's weird and sorry for such specific configuration where I found it :(

About the Relase, I'm building it without the "Compile with .NET native tool chain", so it's probably not using the .NET Native. I don't know if it's the limitation of the Windows IoT Core, or the ARM architecture, but with that option, the app doesn't build.

I don't know where to check the .NETCore-UWP ? I'm using the "Universal Windows" app project type and all I see are Min and Target versions for the SDK. It doesn't mention anywhere .NET Core or .NET Framework.

Yes, I have tested the .NET 4.5 console app and it works correctly. It also works on ASPNET Core 3.1 API website.

About other versions of the library:

  • beta0006
    image

  • beta0007
    image

  • rc0001 & rc0002 - they are the same size, assuming they produce same result
    image

  • rc0003 produced few kb smaller file (in the initial post)

@Sergio0694
Copy link
Member

About the Relase, I'm building it without the "Compile with .NET native tool chain", so it's probably not using the .NET Native. I don't know if it's the limitation of the Windows IoT Core, or the ARM architecture, but with that option, the app doesn't build.

Mh, that kinda sounds similar to #1204, in fact I've had to drop the ARM arch in my UWP app as well as I just couldn't get it to work fine with .NET Native using the latest RC builds of ImageSharp. I wonder if these two issues are somewhat related 🤔

As for the .NETCore-UWP package, you need to open the NuGet packages window for your UWP app and check this one:

image

Make sure you have the latest version installed, just to be extra sure.
Besides, staying up to date is always great regardless of ImageSharp issues 😄

@peter-bozovic
Copy link
Author

Ah yes, didn't see it in the list since it was filtered with "ImageSharp", daaah !
It's 6.2.10 :)

Sorry about the #1204, I just found for you the ARM32 architecture with Windows :D

And guess what, when I removed ImageSharp from our main solution, it's building for relase with .NET native !
ImageSharp was the culprit ...
Not good for those who really wants .NET native, but good to know there's workaround with disabling this build option ...

@JimBobSquarePants
Copy link
Member

ImageSharp was the culprit

Well no... No it wasn't. A bug/limitation in the compiler is responsible for any issues, not this library.

I'd really like to see issues like this fixed upstream. We can only build upon what the framework(s) provide us.

@Sergio0694
Copy link
Member

Yes that's the same thing I noticed, there's something in ImageSharp that's causing issues with the .NET Native compiler. I mean, that's not all that surprising considering the huge amount of heavily generic code that ImageSharp uses, that's one of the most difficult things to handle for a fully AOT compiler, in fact we've already had similar issues in the past (though those have been resolved thankfully, see #828). Right now I can't personally complain though, at least the crash seems to be isolated on ARM32, as all other archs are building just fine in UWP (including with .NET Native) as far as I can tell.

Your best bet at this point would probably be to either wait for Tom to chime in, or to prepare a .NET Native repro for the ARM32 build error (see the guide here) and send that to the .NET Native team for them to investigate 👍

@tocsoft
Copy link
Member

tocsoft commented Jul 14, 2020

@peter-bozovic can you please try saving out your test case in alternative formats, i.e. save as png/bmp too... just to verify that it is definitely the encoder and not the decoder that's at fault.

@peter-bozovic
Copy link
Author

Saving to PNG was working fine

@JimBobSquarePants
Copy link
Member

I'll be something to do with the way we translate from Rgbr32 to Rgb24. I'm curious. @peter-bozovic Does using Image<Rgb24> work with jpeg?

@peter-bozovic
Copy link
Author

peter-bozovic commented Jul 16, 2020

If you ment trying Image<Rgba32>.Load and Image<Rgb24>.Load then I'm getting the same result

@JimBobSquarePants
Copy link
Member

Thanks, that cuts down some of the options.

@tocsoft
Copy link
Member

tocsoft commented Jul 26, 2020

Ok been trying to/managed reproduce this one.

First some goodish news (may actually be bad news) if I take a dependency on the source directly it doesn't corrupt and seems to work fine. However it does break when using the nuget dependency. I wonder if it chooses different targets for package reference vs project reference.

@JimBobSquarePants
Copy link
Member

Oh wow @tocsoft that's progress! So what is the underlying target framework being used by UWP in the failing instance?

@tocsoft
Copy link
Member

tocsoft commented Jul 26, 2020

it seems to be using the netstandard1.3 build

@Sergio0694
Copy link
Member

Wait a second, that's weird - UWP should be using the .NET Standard 2.0 target, no? 🤔
I mean, when using an SDK >= 16299, like in this case.

@JimBobSquarePants
Copy link
Member

JimBobSquarePants commented Jul 26, 2020

@Sergio0694 I have no idea so any guidance on how we can get it to fail locally against the source would be great.

@Sergio0694
Copy link
Member

I'm 100% positive that UWP on SDK >= 16299 should pick the .NET Standard 2.0 target.
@tocsoft to rule out weird things, could you try double checking whether it's actually picking up the 1.3 target for you?
And if so, can you confirm that the UWP project you setup as a test is using the SDK 16299 (or above) as minimum target?

@tocsoft
Copy link
Member

tocsoft commented Jul 26, 2020

will check once I stop debugging .. though I'm note sure it actually matters too much in this case because the code part of the code where we we seems to be having issues will be identical for both targets.

@tocsoft
Copy link
Member

tocsoft commented Jul 26, 2020

oh I can get it to fail locally against source now btw... just needs ImageSharp to be in release mode

@tocsoft
Copy link
Member

tocsoft commented Jul 26, 2020

ok we end up with a difference in the encoding after the 2nd execution of

for (int i = 0; i < 4; i++)
{
int xOff = (i & 1) * 8;
int yOff = (i & 2) * 4;
// TODO: Try pushing this to the outer loop!
var currentRows = new RowOctet<TPixel>(pixelBuffer, y + yOff);
pixelConverter.Convert(frame, x + xOff, y + yOff, currentRows);
cbPtr[i] = pixelConverter.Cb;
crPtr[i] = pixelConverter.Cr;
prevDCY = this.WriteBlock(
QuantIndex.Luminance,
prevDCY,
&pixelConverter.Y,
&temp1,
&temp2,
&onStackLuminanceQuantTable,
unzig.Data);
}

ie x=0, y = 0 and i = 0 then identical results locally, i=1 then results start to differ

@antonfirsov
Copy link
Member

@tocsoft can you check if there any tests failing?

@tocsoft
Copy link
Member

tocsoft commented Jul 26, 2020

not really no way to get the tests over to the external device... well not without a significant amount of rebuilding/refactoring

@tocsoft
Copy link
Member

tocsoft commented Jul 26, 2020

from this point forward there too much pointer magic going on for me to follow/track down the exact point values start to change at the moment.

Doesn't help that its only a release build issue makes it a nightmare half the local just seem to get optimised away half the time so you can't even easily inspect half for the stuff I'm trying to debug, that's before you start adding in all the pointer mess.

@JimBobSquarePants
Copy link
Member

JimBobSquarePants commented Jul 26, 2020

That pointer code is likely a relic of the first color converter I put together years ago. I’m confident we have language features available now that we can use to to avoid using them.

Looking at the code we do a lot of pointer arithmetic where we don't need to now for Block8x8F. We have a fast indexer there now.

@JimBobSquarePants
Copy link
Member

@tocsoft I don't know if this will help but I made a branch with about 90% of the pointer stuff removed.

https://github.com/SixLabors/ImageSharp/tree/js/simplify-jpeg-encoder

@antonfirsov
Copy link
Member

@JimBobSquarePants worth a try, but I'm afraid it's more likely that the issue is other way around: GC-references are more prone to runtime bugs, than basic pointers. I suspect this code, since we already had similar issues with Rgb24 refs before:

this.pixelBlock.LoadAndStretchEdges(frame.PixelBuffer, x, y, currentRows);
Span<Rgb24> rgbSpan = this.rgbBlock.AsSpanUnsafe();
PixelOperations<TPixel>.Instance.ToRgb24(frame.GetConfiguration(), this.pixelBlock.AsSpanUnsafe(), rgbSpan);
ref float yBlockStart = ref Unsafe.As<Block8x8F, float>(ref this.Y);
ref float cbBlockStart = ref Unsafe.As<Block8x8F, float>(ref this.Cb);
ref float crBlockStart = ref Unsafe.As<Block8x8F, float>(ref this.Cr);
ref Rgb24 rgbStart = ref rgbSpan[0];
for (int i = 0; i < 64; i++)
{
ref Rgb24 c = ref Unsafe.Add(ref rgbStart, i);
this.colorTables.ConvertPixelInto(
c.R,
c.G,
c.B,
ref Unsafe.Add(ref yBlockStart, i),
ref Unsafe.Add(ref cbBlockStart, i),
ref Unsafe.Add(ref crBlockStart, i));

Other candidate for runtime bugs: Vector4

Yet another thing to consider to check is if the issue reproduces on Linux-ARM, since it's easier to investigate (and maybe CI-test in the future) than UWP.

@antonfirsov
Copy link
Member

antonfirsov commented Jul 27, 2020

Another idea: do a diff between beta5 and todays state and look for changes in code being invoked from Encode420 that involves Unsafe.Add and/or Vector4 trickery.

@JimBobSquarePants
Copy link
Member

JimBobSquarePants commented Jul 27, 2020

@antonfirsov Yeah possibly.... The changes in the branch should make it a lot easier to debug though.

Last time we "fixed" on Linux ARM the issue was here.

#915

My money is the inlining in combination with Unsafe code in the caller. I would try removing it and testing on both ARM platforms if possible.

Looking at that sample above we don't even need the Unsafe.As nor Unsafe.Add code. We could try using the indexer on Block8x8F directly. We've not had any issues elsewhere where we've used the indexer trick.

@JimBobSquarePants
Copy link
Member

@peter-bozovic @tocsoft Can you do me a favor and please run your application with a build from this branch?

https://github.com/SixLabors/ImageSharp/tree/js/simplify-jpeg-encoder

I've done a fair amount of rewriting to reuse paths we know work on the decoder. It removes a LOT of the Unsafe.XXX code but does not sacrifice performance. (I'm actually pretty impressed with the speed of our encoder considering we have zero SIMD hardware accelerated color conversion)

Fingers crossed it works... It's much easier to debug anyway.

@brianpopow Can you test it on your Linux environment also? I want to make sure I've not introduced issues.

@brianpopow
Copy link
Collaborator

@JimBobSquarePants works on linux + ARM. I just did load the matrix.png and then save it with image.SaveAsJpeg. As far as i understood this is how this should happen.

@JimBobSquarePants
Copy link
Member

Thanks @brianpopow that's reassuring. I really hope it clears up the Windows issues now. Indexing and assignment during colorspace transform is a lot more conventional now.

@antonfirsov
Copy link
Member

@brianpopow does the issue actually reproduce on Linux? I thought that platform is not affected.

@JimBobSquarePants I still put my bet on the code in YCbCrForwardConverter.

@brianpopow
Copy link
Collaborator

brianpopow commented Jul 29, 2020

@brianpopow does the issue actually reproduce on Linux? I thought that platform is not affected.

@antonfirsov sorry maybe i was unclear: I was able to load and save the image without any issue on linux + ARM.

edit: to be more specific, it works with the current master branch and the branch js/simplify-jpeg-encoder without any issue

@JimBobSquarePants
Copy link
Member

@antonfirsov That and the RgbToYCbCrTables code has rewritten in my branch.

https://github.com/SixLabors/ImageSharp/compare/js/simplify-jpeg-encoder

I've made a point of refactoring the code to use the indexers and methods we use successfully during decoding.

I wanted @brianpopow to check that there were no regressive issues caused by my changes.

@antonfirsov
Copy link
Member

antonfirsov commented Jul 29, 2020

@JimBobSquarePants the change is quite small, wouldn't be sufficient I'm afraid. For the sake of investigation I would rather suggest testing out more aggressive changes:

  1. (The changes already in place)
  2. Wire out .AsSpanUnsafe() and PixelOperations<TPixel>.Instance.ToRgb24() doing dumb, safe & slow conversion
  3. Remove all Unasafe.* trickery from LoadAndStretchEdges
  4. Even consider NoInling on the block indexer

We can then test different permutations of 0-3 on different branches to find the root cause, and decide what goes into product code in the end.

@JimBobSquarePants
Copy link
Member

JimBobSquarePants commented Jul 29, 2020

@antonfirsov I'd like to get 0 tested first before attempting anything else.

Update

Looking at AsSpanUnsafe(). I cannot see anything that could corrupt memory there; everything looks stack only. I've updated the constraint of GenericBlock8x8<T> to use unmanaged to ensure the type parameter is though (I know everything we pass is unmanaged but convention is king) as I don't know whether the runtime helpers in the Span<T> constructor exist in earlier target frameworks. It's unfortunate MemoryMarshal.CreateSpan only exists in later targets.

I've updated our method implementation to use that where we can in the branch.

        public Span<T> AsSpanUnsafe()
        {
#if SUPPORTS_CREATESPAN
            Span<GenericBlock8x8<T>> s = MemoryMarshal.CreateSpan(ref this, 1);
            return MemoryMarshal.Cast<GenericBlock8x8<T>, T>(s);
#else
            return new Span<T>(Unsafe.AsPointer(ref this), Size);
#endif
        }

@JimBobSquarePants
Copy link
Member

JimBobSquarePants commented Aug 6, 2020

@tocsoft How did you get the issue to replicate locally?

I'm running via Visual Studio in Release mode x86 and cannot get my custom branch to replicate.

I can try deploying to the PI I bought to fix this but I'd rather work locally if I can.

@tocsoft
Copy link
Member

tocsoft commented Aug 6, 2020

I had to deploy to the pi to replicate, you can just set the PI as a remote debug target for UWP apps and debug that way, its still should still just be an F5 debug experience

@JimBobSquarePants
Copy link
Member

Ok thanks!

Should I be targeting x86? I assumed ARM64 since the issue title specifically mentions ARM and you can't build ARM32 but that didn't allow me to install.

Also what folder did you load/save the images to? I'm attempting to use the following but I don't want to keep getting it wrong as building the app takes a lifetime.

private async Task InitAsync()
{
    StorageFolder picturesFolder = KnownFolders.PicturesLibrary;
    if (string.IsNullOrWhiteSpace(picturesFolder.Path))
    {
        // Fallback for running in VS
        picturesFolder = KnownFolders.SavedPictures;
    }

    using (var outStream = await picturesFolder.OpenStreamForWriteAsync("1275.jpg", CreationCollisionOption.ReplaceExisting).ConfigureAwait(false))
    using (var inStream = await picturesFolder.OpenStreamForReadAsync("1275.png").ConfigureAwait(false))
    {
        using (var image = await ISImage.LoadAsync(inStream).ConfigureAwait(false))
        {
            await image.SaveAsJpegAsync(outStream).ConfigureAwait(false);
        }
    }
}

@tocsoft
Copy link
Member

tocsoft commented Aug 6, 2020

To be fair I didn't save the image to disk I cheated and just rendered to as an image to the screen. This is what I used to reproduce https://github.com/tocsoft/ImageSharpUwpArmIssueRepro it has an image control on the screen and I just render/display it in the constructor

I can't remember exactly what settings I ended up with, I just chose the device using the debug dropdown, I think I chose remote and it popped up a screen to pick the device, once I did this VS just changed the settings as required for the device. Only add thing i ended up needing to do was force image sharp to be release build and the uwp app to be debug (I couldn't get it to attach the debugger if memory served) ... also needed to reduce target frameworks down to netstandard1.3 only as it seemed to have been getting confused with the multiple frameworks being defined.

@JimBobSquarePants
Copy link
Member

image

This is the result of an attempted installation with ARM with "Compile with .NET native tool chain" turned off.

OK... I'm stumped. I need idiot proof instructions for how to get this on my device.

@tocsoft
Copy link
Member

tocsoft commented Aug 6, 2020

I am confused on how you are trying to run it... all I did was was hit debug from VS I never tried 'installing' it or anything like that just ran from VS targeting the PI and it just work. I might have had to run it from right clicking the project and starting it from there but nothing too unusual.

@JimBobSquarePants
Copy link
Member

JimBobSquarePants commented Aug 6, 2020

I've been trying to do it via Visual Studio but I keep hitting the following issue.

Failed to connect to device '127.0.0.1' using Universal Authentication. Please verify the correct remote authentication mode is specified in the project debug settings.

So I thought I'd deploy direct to see if anything works.

@JimBobSquarePants
Copy link
Member

Ok... I'm up and running and can replicate. I'll see what I can do....

JimBobSquarePants added a commit that referenced this issue Aug 6, 2020
@JimBobSquarePants
Copy link
Member

Ok.... Looks like I've fixed it. Tested both Rgba32 and Rgb24 and everything is working.

I'm going to open a PR using the branch I've been testing with which quite a few changes to the encoder which make it easier to debug.

I only bought the one Raspberry PI (I'm gonna get another) so @brianpopow will need to double check my fix on Linux.

@brianpopow
Copy link
Collaborator

I only bought the one Raspberry PI (I'm gonna get another) so @brianpopow will need to double check my fix on Linux.

I have just tested it on Linux + ARM. Works fine, no problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants