Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactoring... WIP] Implement CPython-compatible gzip.decompress, restore uzlib. Enable both for atmel-samd #1274

Closed
wants to merge 3 commits into from

Conversation

klardotsh
Copy link

This on its own is likely of only marginal use (though I'm sure someone will find a use for it). My real motivation for this PR lies in KMKfw/kmk_firmware#52 (comment), where I'd eventually like to be able to both import a single gzipped Python module (this one is arguably less useful, but again, I'm sure some advanced user would find use for it - this functionality is not part of this pull request), and eventually import modules from a ZIP folder. Since compressed ZIPs and GZIP files use the same algo under the hood (DEFLATE), this felt like a great starting point to both get an understanding of CircuitPython's internals (and especially the buffer system), as well as contribute a useful CPython backport to the project.

An example of this in use is below. The file sizes are also listed here for giggles.

(gravity) atmel-samd  » topic-modgzip * » du -h mnt/big_lorem_ipsum.txt*
3.0K	mnt/big_lorem_ipsum.txt
2.0K	mnt/big_lorem_ipsum.txt.gz
Adafruit CircuitPython 4.0.0-alpha.1-100-g2f1d594c0-dirty on 2018-10-14; Adafruit ItsyBitsy M4 Express with samd51g19
>>> import gzip

>>> with open('big_lorem_ipsum.txt', 'r') as f: f.read()
... 
'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam id accumsan\nmassa, sit amet molestie lectus. Etiam lobortis, enim quis laoreet sagittis,\nsem ex porttitor leo, vel mollis dui ante vitae diam. Duis auctor nibh in nibh\nimperdiet, eget consequat arcu imperdiet. Nam vitae purus a risus eleifend\nrhoncus sit amet in tellus. Praesent et dolor faucibus, fringilla tortor vitae,\nullamcorper leo. Proin non massa erat. Integer tempor nunc vel ultricies\nvenenatis. Sed nisl quam, ultricies et rhoncus ac, vehicula in risus.\n\nMauris massa nisi, dictum eu ante vel, tincidunt efficitur tortor. Suspendisse\net elit a nibh faucibus varius. Etiam sed finibus risus. Nam nibh quam, euismod\net tellus non, facilisis venenatis nisl. Vestibulum eget iaculis metus. Donec\nquis leo blandit, porta orci sit amet, maximus quam. Vestibulum lacus tortor,\nvolutpat nec ullamcorper non, porttitor ac enim.\n\nNulla vestibulum vel risus ac euismod. Cras venenatis neque vel magna posuere\nultrices sit amet vitae leo. Vestibulum feugiat laoreet urna, eu condimentum mi\nviverra vitae. Praesent auctor, sapien non sagittis fringilla, erat ligula\nmalesuada tellus, mattis iaculis tellus eros varius sem. Quisque maximus vel\nleo eget auctor. Nullam a nunc ipsum. Fusce ac risus ante. In congue, justo at\nviverra pharetra, purus erat suscipit ante, et lacinia erat elit vel odio. Sed\nat tortor auctor, porta magna dictum, tincidunt metus. In in risus dui. Aenean\nnon leo nisi. Praesent sodales libero sit amet quam elementum molestie. Aliquam\nrhoncus, justo sit amet dapibus aliquet, massa nunc tempor ipsum, elementum\nviverra est sapien vitae felis. Vivamus a rutrum orci, sit amet molestie metus.\nPraesent dictum lorem vel felis aliquam, eu pellentesque est hendrerit.\n\nNam sed orci leo. Praesent facilisis lectus interdum elementum mollis. Cras\naliquam sed turpis vitae lobortis. Ut faucibus nisl non massa rutrum ultricies.\nDonec facilisis neque id elit tincidunt gravida. Nulla gravida nunc mattis\nipsum varius suscipit. Integer efficitur finibus metus, in pretium dolor\nconsequat et. Aliquam ut gravida augue. Vestibulum nisl felis, condimentum quis\nrhoncus eget, posuere eu dui. Nunc malesuada nisi nulla, vitae facilisis mi\nvestibulum vitae. In hac habitasse platea dictumst. Nam eu sem scelerisque,\negestas mi at, pharetra nisi. Praesent fringilla condimentum efficitur. Duis\naliquet nunc nec lacus venenatis efficitur.\n\nCurabitur lacinia consectetur augue, eu consectetur arcu porttitor non. Morbi\nnec posuere nibh. Suspendisse condimentum, purus at fermentum rhoncus, lectus\nerat pretium tellus, eu aliquam libero dui ac magna. Donec urna est, laoreet\nsit amet nulla ut, gravida aliquam mi. Quisque vel mattis risus. Nulla rutrum\nipsum leo, laoreet hendrerit velit pretium eu. Nullam quis dui est. Donec\nsuscipit massa in velit pellentesque, sed eleifend mauris malesuada. Phasellus\neget arcu porta, interdum dui quis, facilisis ipsum. Quisque eget feugiat est,\nluctus commodo augue. In pretium ante id augue accumsan viverra quis quis urna.\nNullam dignissim id orci id semper.\n'


>>> with open('big_lorem_ipsum.txt.gz', 'rb') as f: gzip.decompress(f.read())
... 
bytearray(b'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam id accumsan\nmassa, sit amet molestie lectus. Etiam lobortis, enim quis laoreet sagittis,\nsem ex porttitor leo, vel mollis dui ante vitae diam. Duis auctor nibh in nibh\nimperdiet, eget consequat arcu imperdiet. Nam vitae purus a risus eleifend\nrhoncus sit amet in tellus. Praesent et dolor faucibus, fringilla tortor vitae,\nullamcorper leo. Proin non massa erat. Integer tempor nunc vel ultricies\nvenenatis. Sed nisl quam, ultricies et rhoncus ac, vehicula in risus.\n\nMauris massa nisi, dictum eu ante vel, tincidunt efficitur tortor. Suspendisse\net elit a nibh faucibus varius. Etiam sed finibus risus. Nam nibh quam, euismod\net tellus non, facilisis venenatis nisl. Vestibulum eget iaculis metus. Donec\nquis leo blandit, porta orci sit amet, maximus quam. Vestibulum lacus tortor,\nvolutpat nec ullamcorper non, porttitor ac enim.\n\nNulla vestibulum vel risus ac euismod. Cras venenatis neque vel magna posuere\nultrices sit amet vitae leo. Vestibulum feugiat laoreet urna, eu condimentum mi\nviverra vitae. Praesent auctor, sapien non sagittis fringilla, erat ligula\nmalesuada tellus, mattis iaculis tellus eros varius sem. Quisque maximus vel\nleo eget auctor. Nullam a nunc ipsum. Fusce ac risus ante. In congue, justo at\nviverra pharetra, purus erat suscipit ante, et lacinia erat elit vel odio. Sed\nat tortor auctor, porta magna dictum, tincidunt metus. In in risus dui. Aenean\nnon leo nisi. Praesent sodales libero sit amet quam elementum molestie. Aliquam\nrhoncus, justo sit amet dapibus aliquet, massa nunc tempor ipsum, elementum\nviverra est sapien vitae felis. Vivamus a rutrum orci, sit amet molestie metus.\nPraesent dictum lorem vel felis aliquam, eu pellentesque est hendrerit.\n\nNam sed orci leo. Praesent facilisis lectus interdum elementum mollis. Cras\naliquam sed turpis vitae lobortis. Ut faucibus nisl non massa rutrum ultricies.\nDonec facilisis neque id elit tincidunt gravida. Nulla gravida nunc mattis\nipsum varius suscipit. Integer efficitur finibus metus, in pretium dolor\nconsequat et. Aliquam ut gravida augue. Vestibulum nisl felis, condimentum quis\nrhoncus eget, posuere eu dui. Nunc malesuada nisi nulla, vitae facilisis mi\nvestibulum vitae. In hac habitasse platea dictumst. Nam eu sem scelerisque,\negestas mi at, pharetra nisi. Praesent fringilla condimentum efficitur. Duis\naliquet nunc nec lacus venenatis efficitur.\n\nCurabitur lacinia consectetur augue, eu consectetur arcu porttitor non. Morbi\nnec posuere nibh. Suspendisse condimentum, purus at fermentum rhoncus, lectus\nerat pretium tellus, eu aliquam libero dui ac magna. Donec urna est, laoreet\nsit amet nulla ut, gravida aliquam mi. Quisque vel mattis risus. Nulla rutrum\nipsum leo, laoreet hendrerit velit pretium eu. Nullam quis dui est. Donec\nsuscipit massa in velit pellentesque, sed eleifend mauris malesuada. Phasellus\neget arcu porta, interdum dui quis, facilisis ipsum. Quisque eget feugiat est,\nluctus commodo augue. In pretium ante id augue accumsan viverra quis quis urna.\nNullam dignissim id orci id semper.\n')

extmod/modgzip.c Outdated
// information anywhere (which feels like an odd oversight, and some third
// parties seem to agree: the pure-Python gzip implementation "gzippy"
// explicitly has a Header class which can be read from)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use uzlib_gzip_parse_header() to do the parsing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had I known that function existed, yep, probably could have. It'd require a bit more tinkering to moduzlib.c to rebase to that, but... seems sane to me. Looks like they do the same "discard most everything" logic I do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactor was WAY easier than I thought and the final implementation is significantly cleaner, thank you for this!

extmod/modgzip.h Outdated
extern mp_obj_t mod_gzip_decompress(size_t n_args, const mp_obj_t *args);

MP_DECLARE_CONST_FUN_OBJ_VAR_BETWEEN(mod_gzip_decompress_obj);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this? I can't see that they are used outside the module.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically copied this file from some other module so this header file existing is mostly a leftover. Not really needed I don't think?

return mp_obj_new_int_from_uint(crc ^ 0xffffffff);
}
STATIC MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(mod_uzlib_crc32_obj, 1, 2, mod_uzlib_crc32);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look like you've copied mod_binascii_crc32() here.
I think you can just use mod_binascii_crc32_obj() directly instead protected by MICROPY_PY_UBINASCII_CRC32.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did copy that function directly - I made this separate so there'd be no config clashing (if MICROPY_PY_UBINASCII_CRC32 is defined, it shouldn't impact something in uzlib, I guess?)

Happy to refactor this out to a generic crc32 function that both modules use behind their own mpconfigport.h flags, though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly this function isn't even used by my gzip implementation - it was used in my Python prototype of importing gzipped Python modules. Can also just... rip this out, though it seems nice to have in uzlib for compatibility with CPython's zlib

@@ -23,7 +23,7 @@
// Turn off for consistency
#define MICROPY_CPYTHON_COMPAT (0)
#define MICROPY_MEM_STATS (0)
#define MICROPY_DEBUG_PRINTERS (0)
#define MICROPY_DEBUG_PRINTERS (1)
#define MICROPY_ENABLE_GC (1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you forgot to remove this debug option before committing.

@tannewt
Copy link
Member

tannewt commented Oct 17, 2018

Would you mind refactoring this into the CircuitPython shared-bindings and common-hal structure? That way it'll list with all of our other modules in the docs. time, os and struct have already been done here: https://github.com/adafruit/circuitpython/tree/master/shared-bindings

I think for making uzlib a subset of CPython's zlib you'll need to remove DecompIO or move it to a new module.

While you do that I'll make some space for you in the CPX crickit build.

@tannewt
Copy link
Member

tannewt commented Oct 19, 2018

Now that I think about this more, let's make it M4 only. The M0 is quickly running out of code space and can't hold much code in memory anyway. Thanks!

@klardotsh
Copy link
Author

Sorry for the over a month of radio silence on this - burned out for a bit there on OSS work (and the project this PR was ultimately to be in support of). I'm back and plan to fix this branch up some time in the next week or two - looks like the only conflicts to merge in are localization-related, thankfully.

@ladyada
Copy link
Member

ladyada commented Nov 25, 2018

@klardotsh no rush :) whenever you're ready - take breaks whenever ya need!

@jepler
Copy link
Member

jepler commented Aug 3, 2019

The cause of the build failures is that flash is too full on circuitplayground_express_crickit -- disabling the new feature for that specific board might allow the build to be green. The translations need to be refreshed again (no surprise there) and there are conflict(s) in mpconfigport.h to be resolved which shouldn't be too subtle.

@klardotsh klardotsh changed the title Implement CPython-compatible gzip.decompress, restore uzlib. Enable both for atmel-samd [Refactoring... WIP] Implement CPython-compatible gzip.decompress, restore uzlib. Enable both for atmel-samd Aug 13, 2019
@dhalbert dhalbert added this to the Long term milestone Oct 31, 2019
@tannewt
Copy link
Member

tannewt commented Nov 19, 2019

I'm going to close this. Please reopen when the refactoring is complete. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants