-
Notifications
You must be signed in to change notification settings - Fork 102
DealProposal's Label field serializes as non-UTF8 #1248
Comments
I haven't investigated deeply, but it sounds like the cbor-gen generated code is permitting non-UTF8 bytestrings in this field. |
I think @willscott might be on the bleeding edge of this one with this work. Also /cc @warpfork @ribasushi as we've collectively been circling around these concerns since Filecoin has been mildly pushing at the definition of "strings" in some situations (thanks mainly to the simple Go mapping of I'll put this on the agenda for our IPLD meeting tomorrow https://github.com/ipld/team-mgmt/#weekly-call although we can't resolve Filecoin concerns there, but @willscott et. al. might have insights that help us suggest a way forward or at least comment further and maybe document some of this at least. |
Do note that |
We talked about this in the IPLD team call.
|
Ok, representing this as We can't make this change until we do a major-version upgrade whereupon we can change state schema. At that point we can also migrate all deals in the chain state to encode as Changing might also be a pain for the markets modules and other software that deals in deals, which will have to handle pre-migration and post-migration values distinctly. We can't do this without buy-in from the teams who will wear that pain on the currently-in-use software. cc @hannahhoward @arajasek @whyrusleeping |
what is the reason for this being removed from v3 milestone @ZenGround0 ? |
I'm not sure of Zen's explicit reason, but the focus of v3 (and likely v4 too) is chain efficiency, attempting to remove the bottleneck that is the chain validation throughput (i.e. gas usage). It's a rather pressing problem until we can improve throughput to match miner sealing hardware. |
To add some more weight for getting this changed - it turns out that it's as difficult in JavaScript as it is in Rust to deal with this. We currently have no way to reliably round-trip these blocks in our JavaScript stack since the standard bytes->string conversion methods available all deal with invalid UTF-8 characters by inserting We're talking about strategies for doing a reach-around for special cases like this but there's no clear way to deal with it as the standard case (which is why we've hardened up on the SHOULD for Strings as UTF-8 in the IPLD Data Model https://specs.ipld.io/data-model-layer/data-model.html#kinds-reference). But it would be a special case thing where you know ahead of time that you're dealing with a block that has dodgy strings in it so you'd either explicitly ask to have strings dealt with as bytes, or request an alternative decoding scheme that can get you closer to the bytes you care about, such as |
I'll just add my motivation for this, aside from the fact that it's invalid Cbor based on the spec (https://tools.ietf.org/html/rfc7049#section-2.1). The main reason is that we can't validate that any string that gets decoded from Cbor is valid utf-8 and can introduce bugs in the future. I get that the transition and the removal of unsafely decoding strings cannot happen soon now that it's baked into the v0 and v2 actors, but would be nice to remove ASAP so it hopefully won't be a permanent problem in the future. If this gets changed, it allows us to validate the whole protocol does not include this. This was the only occurrence of invalid utf8 when it was introduced, and the value is never being used. These interactions may not be a big deal for the go implementations since the behaviour might be expected or deterministic, but causes issues for other implementations to match. Also, external tooling or interfacing of this data may cause inconsistencies (for example through JSON because it also only supports unicode) Having said this, I don't have a problem if this doesn't change and understand why it wouldn't, just wanted to say my piece for why I think the change should be made. |
While not critical to do right now I would like to get this scheduled as a fix for v4. I need some help understanding how extensive a change this is on the lotus side. @magik6k @arajasek could you estimate how much effort will need to go into supporting a change to the on chain deal proposal format? Relatedly are proposal cids used by the markets code extensively? |
So was the decision just slinging around bytes, or erroring if it's non-UTF8? |
Rereading this issue properly and the consensus to encode as bytes is better than what I just said. This has the nice property of all deal labels being unambiguously convertible in a state migration. --edit-- The fact that the proposal signature never makes it on chain makes changing the serialization to cbor byte string acceptable imo. But this will need a FIP and lots of comms to users of the storage market so that we don't break tooling. |
Aye, going with |
ok cool got a PR ready to go/passing CI for this |
* changing the deal label to bytes not strings * removing a todo comment for something that was actually a to-done * wrote migration.... how do i test?? * fixing tests, determinism-gen * left some debugging prints in there * fixes some of the code review * fixing one more issue * fixing rest of code review things- all that's left isss testing * half a test * there's a bug in the migration, test is done * fixing code review
Done in code, merged into next, FIP hopefully coming soon!! |
* changing the deal label to bytes not strings * removing a todo comment for something that was actually a to-done * wrote migration.... how do i test?? * fixing tests, determinism-gen * left some debugging prints in there * fixes some of the code review * fixing one more issue * fixing rest of code review things- all that's left isss testing * half a test * there's a bug in the migration, test is done * fixing code review
FIP 0027 is currently WIP. To read more about the approach we are taking please read FIP 0027 There are four major work items to get this implemented for the upcoming v16 network upgrade
|
The
Label
field inDealProposal
is a string that is not guaranteed to be UTF8 as required in the CBOR spec. This causes issues for other client implementations (like us). It will take a LOT of effort for us to work around this bug.I would suggest somehow enforcing UTF-8 or switching the field to be arbitrary bytes instead.
The text was updated successfully, but these errors were encountered: