forked from hohle/ion-rfc
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathion-rfc-02-data-model.nroff
259 lines (190 loc) · 8.28 KB
/
ion-rfc-02-data-model.nroff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
.tm 2. The Ion Data Model ............................................. \n%
.ti 0
2. The Ion Data Model
The semantic basis of Ion is an abstract data model, composed of a set
of primitive types and a set of recursively-defined container types. All
types support null values and user-defined type annotations.
It's important to note that the data model is value-based and does not
include references. As a result, the data model can express data
hierarchies (values can be nested to arbitrary depth), but not general
directed graphs.
Here's an overview of the core data types:
.in 9
.ti 6
o null - A generic null value
.ti 6
o bool - Boolean values
.ti 6
o int - Signed integers of arbitrary size
.ti 6
o float - Binary-encoded floating point numbers (IEEE 64-bit)
.ti 6
o decimal - Decimal-encoded real numbers of arbitrary precision
.ti 6
o timestamp - Date/time/timezone moments of arbitrary precision
.ti 6
o string - Unicode text literals
.ti 6
o symbol - Interned, Unicode symbolic atoms (aka identifiers)
.ti 6
o blob - Binary data of user-defined encoding
.ti 6
o clob - Text data of user-defined encoding
.ti 6
o struct - Unordered collections of tagged values
.ti 6
o list - Ordered collections of values
.ti 6
o sexp - Ordered collections of values with application-defined
semantics
.in 3
.tm _ 2.1. Primitive Types ........................................... \n%
.ti 0
2.1. Primitive Types
The Ion primitive types represent scalar values including nulls,
booleans, numbers, timestamps, character sequences, and symbols.
.tm _ 2.1.1. Null Values .......................................... \n%
.ti 0
2.1.1 Null Values
Ion supports a distinct null value for every core type. The null type has a
single value, null.
As a historical aside, the null type exists primarily for compatibility
with JSON, which has only the untyped null value.
.tm _ 2.1.2. Boolean .............................................. \n%
.ti 0
2.1.2 Booleans
The bool type may have a value of true, false, or, null.
.tm _ 2.1.3. Integers ............................................. \n%
.ti 0
2.1.3 Integers
The int type consists of signed integers of arbitrary size.
.tm _ 2.1.4. Real Numbers ......................................... \n%
.ti 0
2.1.4 Real Numbers
Ion supports both binary and lossless decimal encodings of real numbers
as, respectively, types float and decimal.
.tm _ 2.1.4.1. Float ........................................... \n%
.ti 0
2.1.4.1 Floats
The float type denotes either 32-bit or 64-bit IEEE-754 floating-point
values; other sizes may be supported in future versions of this
specification.
.tm _ 2.1.4.2. Decimal ......................................... \n%
.ti 0
2.1.4.2 Decimals
Because most decimal values cannot be represented exactly in binary
floating-point, float values may change "appearance" and precision when
being read or written. The decimal type, however, has significant
precision, including trailing zeros and is preserved through
round-trips.
.tm _ 2.1.5. Timestamps ........................................... \n%
.ti 0
2.1.5 Timestamps
Timestamps represent a specific moment in time, always include a local
offset, and are capable of arbitrary precision.
.in 3
Values that are precise only to the year, month, or date are assumed to
be UTC values with unknown local offset.
Zero and negative dates are not valid, so the earliest instant in time
that can be represented as a timestamp is Jan 01, 0001. As per the W3C
note, leap seconds cannot be represented.
Two timestamps are only equivalent if they represent the same instant
with the same offset and precision.
.tm _ 2.1.6. Strings .............................................. \n%
.ti 0
2.1.6 Strings
Ion string values are Unicode character sequences of arbitrary length.
.tm _ 2.1.7. Symbols .............................................. \n%
.ti 0
2.1.7 Symbols
Symbols are much like strings, in that they are Unicode character
sequences. The primary difference is the intended semantics: symbols
represent semantic identifiers as opposed to textual literal values.
Symbols are case sensitive.
Symbols may be shared between applications out of band or stored
separately from data at rest using symbol tables.
Symbols can be used in two different ways, as values or as symbol tokens.
A symbol value can appear anywhere a value can occur, and can be
annotated like any other value. A symbol token is a symbol used as an
annotation or as a field name in a struct, and cannot be annotated.
.tm _ 2.1.8. Blobs ................................................ \n%
.ti 0
2.1.8 Blobs
The blob type allows embedding of arbitrary raw binary data. Ion treats
such data as a single (though often very large) value. It does no
processing of such data other than passing it through intact.
.tm _ 2.1.9. Clobs ................................................ \n%
.ti 0
2.1.9 Clobs
The clob type is similar to blob in that it holds uninterpreted binary
data, the difference is that the content is expected to be text. Like
blobs, clobs are a sequence of raw octets that are not given any special
interpretation. This guarantees that the value can be transmitted
without modification.
.tm _ 2.2. Container Types ........................................... \n%
.ti 0
2.2. Container Types
Ion defines three container types: structures, lists, and S-expressions.
These types are defined recursively and may contain values of any
Ion type.
.tm _ 2.2.1. Structures ........................................... \n%
.ti 0
2.2.1 Structures
Structures are unordered collections of name/value pairs. The names are
symbol tokens, and the values are unrestricted. Each name/value pair is
called a field.
When two fields in the same struct have the same name we say there are
"repeated names" or (somewhat misleadingly) "repeated fields".
Implementations must preserve all such fields, i.e., they may not
discard fields that have repeated names. However, implementations may
reorder fields (the binary format identifies structs that are sorted by
symbolID), so certain operations may lead to nondeterministic behavior.
Note that field names are symbol tokens, not symbol values, and thus may
not be annotated. The value of a field may be annotated like any
other value.
.tm _ 2.2.2. Lists ................................................ \n%
.ti 0
2.2.2. Lists
Lists are ordered collections of values. The contents of the list are
heterogeneous (that is, each element can have a different type).
Homogeneous lists are not supported by the core type system, but may be
imposed by schema validation tools.
.tm _ 2.2.3. S-Expressions ........................................ \n%
.ti 0
2.2.3. S-Expressions
An S-expression (or symbolic expression) is much like a list in that
it's an ordered collection of values. However, the notation aligns with
Lisp syntax to connote use of application semantics like function calls
or programming-language statements. As such, correct interpretation
requires a higher-level context other than the raw Ion parser and
data model.
Ion does not define the interpretation of S-expressions or any semantics
beyond the pure sequence-of-values data model.
.tm _ 2.2.4. Type Annotations ..................................... \n%
.ti 0
2.2.4. Type Annotations
Any Ion value can include one or more annotation symbols denoting the
semantics of the content. This can be used to:
.in 9
.ti 6
o Annotate individual values with schema types, for validation purposes.
.ti 6
o Associate a higher-level datatype (e.g. a Java class) during
serialization processes.
.ti 6
o Indicate the notation used within a blob or clob value.
.ti 6
o Apply other application semantics to a single value.
.in 3
When multiple annotations are present, the Ion processor will maintain
their order. Duplicate annotation symbols are allowed but discouraged.
Except for a small number of predefined system annotations, Ion itself
neither defines nor validates such annotations; that behavior is left to
applications or tools (such as schema validators).
It's important to understand that annotations are symbol tokens, not
symbol values. That means they do not have annotations themselves.
.tm _ 2.3. Value Streams ............................................. \n%
.ti 0
2.3 Value Streams
A value stream is a (potentially unbounded) sequence of Ion values in
either text or binary.