Skip to content
This repository has been archived by the owner on Jun 1, 2022. It is now read-only.

how to improve compression speed #37

Open
yanliasdf789 opened this issue May 9, 2018 · 5 comments
Open

how to improve compression speed #37

yanliasdf789 opened this issue May 9, 2018 · 5 comments

Comments

@yanliasdf789
Copy link

I test a 1024*1024 png image, the project run fine , the compresion formate is RGB8, file formate is pkm, But it cost 8 senconds.
But now i have 100000 image to do like this, so how to improve the speed. I use the "mali Texture compresion tool v4.3.0 " with fast mode , it only cost 1.7 sconds.

@tommego
Copy link

tommego commented May 16, 2019

The same problem I got, I have hundred thousand of images to work, and it is very hard to speed up the work of compression...

@alecazam
Copy link

alecazam commented Oct 11, 2020

This library seem to treat memory quite liberally just to support multithreading. All inputs from an LDR are float4, which for sRGB or premultiplied LDR isn't that bad, but could be around 11 or 12 bits per channel.

Then on top of that multiple block and encoding elements are allocated that include large amounts of data. I'm running single threaded, and even for R/RG11 conversion of a 256x256 mipped texture in release, the timings are 12s at quality 50, and 4s at 49. This is usually done off a table lookup for all other LDR 1/2 channel formats I've seen in a few milliseconds on a single thread. I don't quite understand the performance tradeoffs made in this library.

@alecazam
Copy link

alecazam commented Oct 11, 2020

This seems to be the hotspot now that I'm reusing the same block and encoder.

void Block4x4Encoding_R11::CalculateR11(unsigned int a_uiSelectorsUsed, 
												float a_fBaseRadius, float a_fMultiplierRadius)
{
....
   void Block4x4Encoding_R11::CalculateR11(unsigned int a_uiSelectorsUsed, 
	float a_fBaseRadius, float a_fMultiplierRadius)
{
....
    for (float fMultiplier = fMinMultiplier; fMultiplier <= fMaxMultiplier; fMultiplier += 1.0f)
	{
		// find best selector for each pixel
		unsigned int auiBestSelectors[PIXELS];
		float afBestRedError[PIXELS];
		float afBestPixelRed[PIXELS];

        // TODO: this brute force loop does 16 x 8 = (256 calls x multiplier x base) x blocks
        // to CalcPixelError that results in 2.4s/3.9s of time spent in CalcR11, +G11 doubles that time.
        // CalcPixelError returns dx^2 + dy^2  in  my impl.

		for (unsigned int uiPixel = 0; uiPixel < PIXELS; uiPixel++)
		{
			float fBestPixelRedError = FLT_MAX;

			for (unsigned int uiSelector = 0; uiSelector < SELECTORS; uiSelector++)
			{
				float fPixelRed = DecodePixelRed(fBase * 255.0f, fMultiplier, uiTableEntry, uiSelector);

				ColorFloatRGBA frgba(fPixelRed, m_pafrgbaSource[uiPixel].fG,0.0f,1.0f);

				float fPixelRedError = CalcPixelError(frgba, 1.0f, m_pafrgbaSource[uiPixel]);

				if (fPixelRedError < fBestPixelRedError)
				{
					fBestPixelRedError = fPixelRedError;
					auiBestSelectors[uiPixel] = uiSelector;
					afBestRedError[uiPixel] = fBestPixelRedError;
					afBestPixelRed[uiPixel] = fPixelRed;
				}
			}
		}

@richgel999
Copy link

This library appears dead.

@Calinou
Copy link

Calinou commented Dec 17, 2021

For future readers, using a library like https://github.com/wolfpld/etcpak should provide faster compression. See this benchmark: https://aras-p.info/blog/2020/12/08/Texture-Compression-in-2020/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants