Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove access to fields by index #2952

Closed
fxamacker opened this issue Nov 21, 2023 · 19 comments · Fixed by #3290 or #3432
Closed

Remove access to fields by index #2952

fxamacker opened this issue Nov 21, 2023 · 19 comments · Fixed by #3290 or #3432
Assignees
Labels
Feedback Go API Breaking Change Breaks programs which use the Cadence repo as a Go dependency Improvement Technical Debt

Comments

@fxamacker
Copy link
Member

fxamacker commented Nov 21, 2023

Issue to be solved

Accessing fields by index (instead of by name) can create dependency on sort order.

For example, issue #2558:

Some client programs are accessing fields by index rather than name. This created a dependency on the sort order of fields which surfaced when we switched from JSON-CDC to CCF for encoding events because CCF explicitly sorts as specified in CCF specification for deterministic encodings.

PR #2559 resolved issue 2558 by updating CCF Codec to make CCF Determinisitic Encoding optional. It was a stop-gap solution to avoid abruptly breaking compatibility for encoded events.

For exported values, not allowing access to fields by index can help prevent Cadence developers from unintentionally creating dependency on sort order. This can prevent unintended consequences like blocking deterministic encoding, etc.

However, this issue can be viewed as broader topic than CCF, sort order, or access by field index.

Current Status

CCF encoding is not currently sorting the fields (in order to be compatible with old programs that access fields by index).

I didn't find any programs using the sort option in CCF codec (as of Nov 28, 2023).

Suggested Solution

Programming languages before 1.0 can benefit from taking away the ability for programmers to (sometimes unintentionally) create dependencies on implementation details of compilers and interpreters.

For example, a few months before Go 1.0 (~12 years ago), https://codereview.appspot.com/5285042/ added a random offset for map iteration and changed Go language specification.

Index: doc/go_spec.html
===================================================================
--- a/doc/go_spec.html
+++ b/doc/go_spec.html
@@ -4085,7 +4085,8 @@
 </li>
 
 <li>
-The iteration order over maps is not specified.
+The iteration order over maps is not specified
+and is not guaranteed to be the same from one iteration to the next.
 If map entries that have not yet been reached are deleted during iteration,
 the corresponding iteration values will not be produced. If map entries are
 inserted during iteration, the behavior is implementation-dependent, but the

That change to Go intentionally broke some existing Go programs by taking away existing behavior that wouldn't be ideal to officially or unofficially support forever after Go 1.0 release.

In the same way, maybe we can determine if some existing features (accessing field by index) should be removed before Cadence 1.0 release.

EDIT: added Current Status section about CCF for more context.

@turbolent turbolent added Go API Breaking Change Breaks programs which use the Cadence repo as a Go dependency and removed Language Breaking Change Breaks Cadence contracts deployed on Mainnet labels Nov 27, 2023
@turbolent
Copy link
Member

Problem:

  • cadence package's composite value Go types, like Struct, Resource, etc have a Fields field, an array of the fields' values
  • The order previously matched the order of how the fields are declared in the code
  • CCF encoding now sorts the fields
  • Developers assumed the order in the Fields array matches the order of how the fields are declared in the code, so they index directly into it using constants
  • However, this will break once field values are sorted

Considered:

  • Restoring the ordering fixes the issue, but leaves the brittle code: Future changes to type in deployed contract might break server-side decoding based on hard-coded ordering
  • Code should have performed a lookup by name in composite type' fields StructType's Fields array of name/type pairs

Proposed fix:

  • Unexport Fields in composite value types (Struct, Resource, etc.)
  • Introduce field value lookup function FieldByName(name string) Value for composite value types
  • The helps developers write less brittle code (removes assumption about order of fields in deployed code)

@turbolent turbolent changed the title For exported values, determine if access to fields by index should be removed in Stable Cadence Remove access to fields by index Nov 28, 2023
@bluesign
Copy link
Contributor

bluesign commented Nov 29, 2023

Doesn't this lose information about ordering? Let's say it is an event signature from past.

@turbolent
Copy link
Member

@fxamacker I assume both values and type information will be re-ordered?

@fxamacker
Copy link
Member Author

@fxamacker I assume both values and type information will be re-ordered?

@turbolent Yes. Both would be re-ordered, and maybe both should be unexported.

@turbolent
Copy link
Member

@bluesign when would the order be required to know? Like I mentioned in my last comment, code that relies on the order is brittle anyways and should probably be updated

@bluesign
Copy link
Contributor

bluesign commented Dec 6, 2023

@bluesign when would the order be required to know? Like I mentioned in my last comment, code that relies on the order is brittle anyways and should probably be updated

Yeah I agree on that part, but we had similar problem with event encoding order before. I think latest victim was @bjartek he can chime in.

But main problem is; if we make old code ( pre 1.0 ) compatible with new deterministic ccf, then there is a risk of breaking silently. If it breaks with new encoding with some error etc, it is ok I guess.

PS: I still have a feeling order of things carry a meaning ( in event signatures for example ) maybe I am thinking too much objective-c like in my head.

@bjartek
Copy link
Contributor

bjartek commented Dec 7, 2023

Pre candidate7 the order of event field names was done alphabetically and not in definition order. Luckily it obly affected topshot and i made a manual fix

@bjartek
Copy link
Contributor

bjartek commented Dec 7, 2023

Not sure if that last comment is relevant to this thread though. Ss long as the name of fields and content of fields are ordered correctly so they match I have no issue with it.

@bluesign
Copy link
Contributor

bluesign commented Dec 7, 2023

I should have been more clear in my comment, I think order still has importance but ignoring that my main issue here is, if we have data in 2 versions ( non-sorted D0 and sorted D1 ) and functions to access ( F0 and F1 )

  • F1(D0) and F1(D1) will be ok, F0(D0) will be still ok ( as it is currently ), but the problem is F0(D1) will fail silently. ( that's why I gave example of old events in candidate7, they fail at some time silently too )

I think this is important thing to consider as Cadence is not backwards compatible, while accessing old data we need matching Cadence version.

@turbolent
Copy link
Member

But main problem is; if we make old code ( pre 1.0 ) compatible with new deterministic ccf, then there is a risk of breaking silently. If it breaks with new encoding with some error etc, it is ok I guess.

The plan isn't to make pre-1.0 compatible with new behaviour. Just like before, when interacting with the chain or its data, the appropriate Cadence version must be used (I know this isn't ideal, just pointing out the fact).

@turbolent
Copy link
Member

turbolent commented Jan 12, 2024

Doesn't this lose information about ordering? [...]

PS: I still have a feeling order of things carry a meaning ( in event signatures for example ) [...]

Events are just composites, so their field order is insignificant. We're probably not pointing that out sufficiently in documentation.

What probably adds to the assumption that event fields have some sort of order is that the syntax for event declarations is different from other composite declarations, and thus kind of suggests an order like in a parameter list.

Hyrum's law strikes again. This post from a couple days ago is very relevant: https://twitter.com/jimmykoppel/status/1744544779138978006

@bluesign
Copy link
Contributor

I think "events are composites" is a bit unfair statement especially with the last link you gave (which focuses on observability)

"Events are composites" is not observable to the user of Cadence, basically it is an implementation detail. Alternatively they could be implemented as non cadence type even. You cannot instantiate in normal cases without emit, or save them etc.

As I pointed before, from outside perspective, they are function signatures ( constructor more likely ) so order of things matter ( as we don't have named arguments )

This is not so important ( in this case ) of course.

@turbolent
Copy link
Member

@fxamacker is this purely related to CCF or also an problem with the introduction of the atree register inlining?

@turbolent
Copy link
Member

@fxamacker Can you please verify that #3290 is sufficient? Or do we also need to unexport field types?

@turbolent turbolent reopened this Apr 30, 2024
@fxamacker
Copy link
Member Author

@fxamacker Can you please verify that #3290 is sufficient? Or do we also need to unexport field types?

@turbolent I can probably take a look tomorrow at CCF after looking more closely into migration issue #3288 while it is still fresh.

BTW, I'm not sure if everyone already knows about CCF codec support for different "modes" at runtime. A single program can support different CCF-based formats, versions, etc. by using modes. A codec's mode is a set of encoding or decoding options made immutable during startup for concurrent use later.

// Programs can create and choose different CCF encoding modes (and decoding modes) at runtime.
// CCF modes are safe for concurrent use (e.g. create modes at startup and reuse from multiple goroutines).

if fooVersion == 0 {
  // Keep original sort order of fields provided by Cadence.
  b, err := ccfFoofUnsortedMode.Marshal(v)  
} else if foolVersion == 1 {
  // Sorts fields as specified in CCF Deterministic Encoding Requirements.
  b, err := ccfFooSortedMode.Marshal(v)  
} // else if etc.

CCF codec modes work like CBOR codec modes described at fxamacker/cbor Quick Start .

@fxamacker
Copy link
Member Author

@fxamacker Can you please verify that #3290 is sufficient? Or do we also need to unexport field types?

@turbolent I think we should also unexport field types for consistency with field values.

We can unexport field type and provide a new function FieldTypes() that returns map[string]Type. This way, there isn't any implied ordering for either field definitions or field values. It also gives us flexibility to change implementation details without breaking API.

@turbolent
Copy link
Member

@fxamacker Sounds good, I was thinking the same. I'll work on that tomorrow

@fxamacker
Copy link
Member Author

fxamacker commented May 8, 2024

@turbolent BTW, in PR #3290 it seems FieldsMappedByName() includes attachment as a field if present, while SearchFieldByName() only looks up field type definition. Is it OK for these two functions to differ in this way?

Sorry about reposting this comment 🙏 I accidentally deleted it from merged PR 3290 and moved it here.

@turbolent
Copy link
Member

turbolent commented May 10, 2024

@fxamacker Sounds good! I had started trying to change this, but it looks like the CCF codec relies on being able to set the type slice. Maybe we can have a short sync (not urgent, sometime next or the following week), to see how we could refactor this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feedback Go API Breaking Change Breaks programs which use the Cadence repo as a Go dependency Improvement Technical Debt
Projects
None yet
5 participants