Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbol Token Write Support #165

Merged
merged 29 commits into from
Nov 9, 2020

Conversation

justing-bq
Copy link
Contributor

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-commenter
Copy link

codecov-commenter commented Oct 6, 2020

Codecov Report

Merging #165 into master will decrease coverage by 0.12%.
The diff coverage is 55.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #165      +/-   ##
==========================================
- Coverage   74.90%   74.78%   -0.13%     
==========================================
  Files          25       25              
  Lines        4857     4893      +36     
==========================================
+ Hits         3638     3659      +21     
- Misses        724      734      +10     
- Partials      495      500       +5     
Impacted Files Coverage Δ
ion/symboltable.go 56.29% <21.05%> (-3.18%) ⬇️
ion/marshal.go 69.39% <66.66%> (ø)
ion/binarywriter.go 76.00% <68.75%> (-0.08%) ⬇️
ion/textwriter.go 61.45% <78.94%> (+2.25%) ⬆️
ion/reader.go 65.00% <100.00%> (-0.35%) ⬇️
ion/symboltoken.go 92.45% <100.00%> (+1.34%) ⬆️
ion/writer.go 66.66% <100.00%> (ø)
ion/bitstream.go 79.46% <0.00%> (-0.39%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d00cfc...2c8bed2. Read the comment docs.

return err
}
if err := w.WriteString(internal.GitCommit); err != nil {
return err
}

if err := w.FieldName(ion.NewSimpleSymbolToken("build_time")); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't w.FieldName() happens only if ion.NewTimestampFromStr() returns no error? Condition line 93 new

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the old code. w.FieldName("build_time") is in both the nil err case and the non-nil err case.

ion/skipper.go Outdated
@@ -546,44 +546,42 @@ func (t *tokenizer) skipLongStringHelper(handler commentHandler) error {
// false indicating this is not the end of the long string, and true for consumed '
// as we have read the closing ''' of the first long string.
func (t *tokenizer) skipEndOfLongString(handler commentHandler) (bool, bool, error) {
isConsumed := false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think this can be helpful. reading return false, isConsumed, err is easier to understand compared to return false, false, err

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted back my change.

@@ -202,7 +214,7 @@ func (s *sst) String() string {
buf := strings.Builder{}

w := NewTextWriter(&buf)
s.WriteTo(w)
_ = s.WriteTo(w)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think WriteTo() returns error.
Why not checking the error whether it's nil or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this were inside a function that returns an error then I would have added the error checking. But it doesn't and I'm not gonna add an error to the return type because that would block sst from satisfying the Stringer interface.
I added this explicit _ = so that Goland would no longer flag this as a warning.

@@ -281,19 +293,22 @@ func (s *bogusSST) String() string {

buf := strings.Builder{}
w := NewTextWriter(&buf)
w.Annotations(SymbolToken{Text: &ionSharedSymbolTableText, LocalSID: 9}, SymbolToken{Text: &bogusText, LocalSID: SymbolIDUnknown})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings, ha!? Yikes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes all of this was just to make Goland stop flagging it.

ion/writer.go Outdated

// FieldNameSymbol sets the field name symbol for the next value written.
// It may only be called while writing a struct.
func (w *writer) FieldNameSymbol(val SymbolToken) error {
if !w.IsInStruct() {
w.err = &UsageError{"Writer.FieldNameSymbol", "called when not writing a struct"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Writer.FieldName" instead of "Writer.FieldNameSymbol"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

return *val, err
case Int64:
val, err := d.r.Int64Value()
if err != nil {
return nil, err
}
return *val, err
default:
return d.r.BigIntValue()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigIntValue also returns value, error.
We can the same check as above caseses for default too

}

if err := writeSymbol(text, w.out); err != nil {
if err := writeSymbol(a.toText(), w.out); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read the code correctly, when we have a.Text is nil and a.LocalSID is Unknown, then toText() will return an empty string. And this empty text is sent to writeSymbol.
I think if a.Text is nil and a.LocalSID is Unknown, error should be returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

return SymbolToken{Text: &text, LocalSID: SymbolIDUnknown}
}

func newSimpleSymbolTokenPtr(text string) *SymbolToken {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think we should not have this. we have NewSimpleSymbolToken() and if its pointer is needed that should be sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to not have this either but without it I would need two lines to get the address.

ie.

st := NewSimpleSymbolToken(text)
&st

instead of

newSimpleSymbolTokenPtr(text)

@@ -274,10 +272,9 @@ func (d *Decimal) trunc() (int64, error) {
if err != nil {
return 0, err
}
d = ud
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need this.
ud is different from the original d. I think when we change the scale, we need to keep d with its updated version in case another operation is called on d

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider d.checkToUpscale().
In the old code, it reassigns d with d = d.upscale(0). However I discovered that this change of state is not reflected in the calling function d.trunc() which must be why d.checkToUpscale() returns the new decimal. (Yes I'm confused by this as well since I expect a pointer receiver to change state.)

@codecov-io
Copy link

codecov-io commented Oct 7, 2020

Codecov Report

Merging #165 into master will decrease coverage by 0.30%.
The diff coverage is 56.94%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #165      +/-   ##
==========================================
- Coverage   74.94%   74.64%   -0.31%     
==========================================
  Files          25       25              
  Lines        4865     4906      +41     
==========================================
+ Hits         3646     3662      +16     
- Misses        724      739      +15     
- Partials      495      505      +10     
Impacted Files Coverage Δ
ion/timestamp.go 68.63% <ø> (ø)
ion/unmarshal.go 67.74% <0.00%> (-0.90%) ⬇️
ion/writer.go 66.66% <ø> (ø)
ion/symboltable.go 56.12% <23.40%> (-3.35%) ⬇️
ion/readlocalsymboltable.go 66.93% <50.00%> (-1.87%) ⬇️
ion/binarywriter.go 75.75% <65.51%> (-0.32%) ⬇️
ion/marshal.go 69.39% <66.66%> (ø)
ion/textwriter.go 60.63% <78.57%> (+1.43%) ⬆️
ion/textutils.go 82.78% <83.33%> (-0.08%) ⬇️
ion/decimal.go 75.16% <100.00%> (-0.33%) ⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 21c1a81...4411767. Read the comment docs.

return w.err
}
} else {
return &UsageError{"Writer.WriteSymbol", "invalid symbol token"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some detail to the error message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return w.writeSymbolFromID("Writer.WriteSymbol", id)
}

// WriteSymbolFromString writes a symbol value given a string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be more explicit about how the provided string is going to be used; it expects the string to already be in the symbol table and will return an error if it isn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 260 to 263
// Wrap text value in single quotes if the symbol's text is a symbol identifier
// (ie. of form $n for some integer n)
// This is done to distinguish from actual symbol table mappings.
text = fmt.Sprintf("'%v'", text)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the behavior that's intended here. Why are we wrapping $37 in single quotes?

If I pass a SymbolToken with LocalSID=SymbolIDUnknown and Text="$37", I would expect WriteSymbol to try to resolve the text literal $37 (unmodified) in the symbol table. If I'd wanted to write SID 37, I would have set LocalSID=37 in the SymbolToken, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have roundtrip tests (such as ion-tests/iontestdata/good/symbols.ion with the test case '$99') that fail without any internal logic wrapping $99 in quotes.

When the text reader reads '$99', it produces the string "$99" (note the lack of single quotes) and then creates a SymbolToken using that string value. The absence of single quotes creates confusion further downstream about whether we are dealing with an actual symbol table mapping vs a non-mapping SymbolToken with this text value.

I moved the single quote wrapping logic over to textReader::nextBeforeTypeAnnotations instead of binaryWriter::WriteSymbol and textWriter::WriteSymbol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

such as ion-tests/iontestdata/good/symbols.ion with the test case '$99'

Here's a link to this test case for anyone who's interested.

When the text reader reads '$99', it produces the string "$99" (note the lack of single quotes) and then creates a SymbolToken using that string value.

This sounds like the correct behavior to me.

The absence of single quotes creates confusion further downstream about whether we are dealing with an actual symbol table mapping vs a non-mapping SymbolToken with this text value.

This is something we need to sort out; I don't think there's any writer API where the golang string $99 should be reinterpreted as anything other than a string literal.

Can you link to the function(s) that are causing the problems? Why is the reinterpretation happening at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember off the top of my head where specifically to go looking, but the root cause on this one is likely me. :)

At some point back before the introduction of SymbolToken, if the binary reader hit a symbol ID it couldn't resolve (or the text reader hit an unquoted ${n}), they'd return the string ${id} instead of an error. The writers correspondingly mapped WriteSymbol("$123") to being a symbol with that ID, bypassing the symbol table. That came at the expense of being able to read/write an actual symbol whose text is ${n}, but that wasn't a thing I thought I would ever need to do (😬😅).

Now that everything at the reader/writer level is converted to SymbolTokens, any remnants of that questionable decision can and should be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes total sense. Thanks for the added context!

Copy link
Contributor Author

@justing-bq justing-bq Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I reworked this so the Symbol Token's text value is the unquoted $99.
However when the text writer is writing out the symbol in textutils.go:writeSymbol(), we still need to wrap this value in single quotes.

return err
}
} else {
return &UsageError{api, "invalid field name symbol token"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message should communicate why the symbol token is invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if err == nil {
return id, nil
}
if id, ok := symbolIdentifier(sym); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to understand the symbol ID interpretation behavior that we're implementing here. I'm having difficulty imagining a scenario where I'm using a binary writer and I pass it a string in the $id format ($12, $47, etc).

// (ie. of form $n for some integer n)
// This is done to distinguish from actual symbol table mappings.
if _, ok := symbolIdentifier(text); ok {
text = fmt.Sprintf("'%v'", text)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here RE: quote wrapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@@ -103,6 +112,15 @@ func symbolIdentifier(symbolText string) (int64, bool) {
return SymbolIDUnknown, false
}

// NewSimpleSymbolToken returns a Symbol Token with the given text value and undefined SID and Source.
func NewSimpleSymbolToken(text string) SymbolToken {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is subjective, but to me the phrase Simple doesn't communicate what this function will do. How about NewSymbolTokenFromString?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return *val, err
case Int64:
val, err := d.r.Int64Value()
if err != nil {
return nil, err
}
return *val, err
default:
return d.r.BigIntValue()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update Unmarshal API to support shared symbol tables.

I was playing around marshalling/ unmarshalling roundtrip with sst and it was missing.

Update L95-97 to:

func Unmarshal(data []byte, v interface{}, ssts ...SharedSymbolTable ) error {
	catalog := NewCatalog(ssts...)
	return NewDecoder(NewReaderCat(bytes.NewReader(data), catalog)).DecodeTo(v)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Copy link
Contributor

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@zslayton zslayton merged commit c1fd17e into amazon-ion:master Nov 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants