-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve streamable #3031
Improve streamable #3031
Conversation
This no longer requires an extra `_parse_functions` attribute on a `Streamable`, as it may be confusing serializers or other functions that use `__annotations__`.
8ea1b38
to
fc159fe
Compare
Revert an earlier commit and error out on class creation in case a Streamable subclass is instantiated incorrectly, e.g. containing a non-serializable member. Fix cases where Streamable parent class was forgotten.
fc159fe
to
e350aae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. One thing I don't understand in the first question.
|
||
def parse_bool(f): | ||
bool_byte = f.read(1) | ||
assert bool_byte is not None and len(bool_byte) == 1 # Checks for EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will these asserts activate in any run mode, or is it like C, where they can be deactivated in some run modes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert
is removed in python -OO
mode, which we don't do. So we shouldn't use assert
for flow control. However, this problem was here before this change, so it doesn't make these semantics any worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These asserts are important. If we don't have them, people can Deserialize a bytes object which contains extra data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the bool example, read(1) which returns empty would also succeed, so we need to check that the exact size was read, any time we do a read.
bytes_read = f.read(list_size) | ||
assert bytes_read is not None and len(bytes_read) == list_size | ||
return bytes_read | ||
return parse_bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is nice.
def coin_record_from_row(self, row: sqlite3.Row) -> WalletCoinRecord: | ||
coin = Coin(bytes32(bytes.fromhex(row[6])), bytes32(bytes.fromhex(row[5])), uint64.from_bytes(row[7])) | ||
return WalletCoinRecord( | ||
coin, uint32(row[1]), uint32(row[2]), bool(row[3]), bool(row[4]), WalletType(row[8]), row[9] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the type of row[9] - I notice it is the only one not explicitly cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a comment with the schema of this table would be very helpful, but I suppose this is not quite in scope for this PR
G1Element.from_bytes(bytes.fromhex(row[1])), | ||
row[3], | ||
row[4], | ||
WalletType(row[3]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WalletType: nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this broken before?
fields = {} | ||
|
||
for _, f_type in fields.items(): | ||
parse_functions.append(cls.function_to_parse_one_item(f_type)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intent here to automatically add parse functions for only the primitive types that are contained as members?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general idea is to traverse the __annotations__
tree just once at class instantiation time, and put into a list the sub-parsing functions which do parsing and not introspection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach is a lot nicer than the previous one. It especially makes it easy to test the parser for each primitive type, and really cover all the edge cases. I really think we should do that. This code has better work, everything else relies on it.
I would also have expected to see tests where we ensure that some (or maybe all) network messages encode to specific bytes. Just passing a round-trip test isn't enough to ensure we don't break the protocol.
I'll take a look at adding some tests today.
@@ -74,7 +74,7 @@ class NewEndOfSubSlotVDF(Streamable): | |||
|
|||
@dataclass(frozen=True) | |||
@streamable | |||
class RequestCompactProofOfTime: | |||
class RequestCompactProofOfTime(Streamable): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the annotation could add the base class too, no? (maybe for later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annotation could do the dataclass wrapping as well, but the comment in def streamable
states that this is intended for linters so they could recognize the methods of the base class as well as the __init__
method provided by dataclass
.
def coin_record_from_row(self, row: sqlite3.Row) -> WalletCoinRecord: | ||
coin = Coin(bytes32(bytes.fromhex(row[6])), bytes32(bytes.fromhex(row[5])), uint64.from_bytes(row[7])) | ||
return WalletCoinRecord( | ||
coin, uint32(row[1]), uint32(row[2]), bool(row[3]), bool(row[4]), WalletType(row[8]), row[9] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a comment with the schema of this table would be very helpful, but I suppose this is not quite in scope for this PR
row[0], | ||
bytes.fromhex(row[2]), | ||
uint32(row[0]), | ||
bytes32.fromhex(row[2]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we really store binary data in hex form in the database?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we currently do. See add_derivation_paths()
in the same file.
return bytes.decode(str_read_bytes, "utf-8") | ||
raise RuntimeError(f"Type {f_type} does not have parse") | ||
return parse_str | ||
raise NotImplementedError(f"Type {f_type} does not have parse") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this message doesn't look right to me. Shouldn't it say "Type {f_type} does not have from_bytes()"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, parse()
is the main way of supporting the streamable protocol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently from_bytes()
is used only for sized integer types. All other types that have custom serialization need to provide their own .parse()
function.
I performed the following benchmark on old and new Streamable to compare their performance: parsing the first 10000 mainnet blocks. The old result: 1.39s; the new result: 1.08s. The speed improvement is 23%. |
@@ -76,6 +76,7 @@ class RejectBlock(Streamable): | |||
|
|||
|
|||
@dataclass(frozen=True) | |||
@streamable | |||
class RequestBlocks(Streamable): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did this work before without streamable decorator?
|
||
def parse_bool(f): | ||
bool_byte = f.read(1) | ||
assert bool_byte is not None and len(bool_byte) == 1 # Checks for EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These asserts are important. If we don't have them, people can Deserialize a bytes object which contains extra data.
|
||
def parse_bool(f): | ||
bool_byte = f.read(1) | ||
assert bool_byte is not None and len(bool_byte) == 1 # Checks for EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the bool example, read(1) which returns empty would also succeed, so we need to check that the exact size was read, any time we do a read.
fields = {} | ||
for _, f_type in fields.items(): | ||
values.append(cls.parse_one_item(f_type, f)) # type: ignore | ||
values = [parse_f(f) for parse_f in PARSE_FUNCTIONS_FOR_STREAMABLE_CLASS[cls]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember why I had a try clause here, maybe it was to support Empty streamables? (for example RequestPeers) message. Have we confirmed that those still work?
G1Element.from_bytes(bytes.fromhex(row[1])), | ||
row[3], | ||
row[4], | ||
WalletType(row[3]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this broken before?
Nice work. If nothing is broken with the wallet, I think we can probable merge it (and if arvids tests pass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been running the spend bot with it, no issues
* Added new issue templates (#3033) * added new issue templates for support and requestes * spelling is hard... * Add lock, keep cache consistent (#3051) * execute task decorator * use blockchain lock * indentation * lint * execute_task * Ms.empty blocks3 (#3064) * Fix empty blocks * Remove error log * Missing imports * .header_hash instead of .prev_hash in simulator * Fix typo (#3069) * Use secure coin ids (#3095) * Add announcements for standard tx (#3023) * changed bahviour of wallet and wallet tools for standard solutions * black formatting wallet_tools * added a test for hypothetical stolen zero_output coin * swap to sha256 from sha256tree for message * fix wallet_tools version, address lint issues * correctly int_from_bytes * fix uninstantiated key in dict * Comment out broken test * Fix types (used SerializedProgram) Co-authored-by: Mariano <sorgente711@gmail.com> * cache VDF validation results (per process) (#3110) * Update GUI for plot refresh issue (#3135) Co-authored-by: Adam Kelly <aqk@aqk.im> * remove commit and not needed query (#3148) * Copyright 2021 Chia Network in LICENSE (#3153) * just filter limit (#3152) * fix plot dupe (#3154) * increase ratio (#3155) * wait to end of process to avoid zombies (#3160) * fix takes 1 positional argument but 2 were given (#3202) * Update CLI tools to use EiB when appropriate. (#3117) * Improve streamable (#3031) * Avoid importing `test_constants` as it takes a long time. * Factor out `parse_*` functions. * First crack at refactoring `Streamable.parse`. * Don't add `_parse_functions` attribute to `Streamable`. This no longer requires an extra `_parse_functions` attribute on a `Streamable`, as it may be confusing serializers or other functions that use `__annotations__`. * Fix lint problems with `black`. * Fix `parse_tuple`. * Defer some parsing failures to parse time rather than class-creation time. * Tidy up & remove some obsolete stuff. * Decorate `RequestBlocks` as `streamable`. * Fix wrong uses of Streamable class Revert an earlier commit and error out on class creation in case a Streamable subclass is instantiated incorrectly, e.g. containing a non-serializable member. Fix cases where Streamable parent class was forgotten. * Fix wrong types when creating DerivationRecord and WalletCoinRecord * additional unit tests for streamable parsers * add type annotations (#3222) Co-authored-by: Rostislav <rostislav@users.noreply.github.com> Co-authored-by: arvidn <arvid@libtorrent.org> * Tests skipping mempool (#3065) * Avoid importing `test_constants` as it takes a long time. * Some new tests that circumvents mempool. * Fix lint problems. Co-authored-by: J Eckert <sargonas@users.noreply.github.com> Co-authored-by: Yostra <straya@chia.net> Co-authored-by: Mariano Sorgente <3069354+mariano54@users.noreply.github.com> Co-authored-by: Antonio Borrero Granell <me@antoniobg.com> Co-authored-by: matt-o-how <48453825+matt-o-how@users.noreply.github.com> Co-authored-by: Mariano <sorgente711@gmail.com> Co-authored-by: Arvid Norberg <arvid@libtorrent.org> Co-authored-by: Adam Kelly <338792+aqk@users.noreply.github.com> Co-authored-by: Adam Kelly <aqk@aqk.im> Co-authored-by: Kyle Altendorf <sda@fstab.net> Co-authored-by: Joel <34619326+joelcho@users.noreply.github.com> Co-authored-by: Gene Hoffman <30377676+hoffmang9@users.noreply.github.com> Co-authored-by: Richard Kiss <him@richardkiss.com> Co-authored-by: Rostislav <rostislav@users.noreply.github.com>
No description provided.