All programming languages require the use of numbers of various forms. All programming languages choose to define these differently, but the C style languages tend to use the C style formatting. I'll compare them here and see what I like.
Ada has decimal numbers by default, no need to add any base notation, e.g. 101
, 3.14
, 25.4E-2
, etc.
Ada can also create based numbers by appending a base number and then enclosing the actual literal within hash marks, e.g. 16#fffe#
, 8#754#
, 2#1001#
, etc.
Ada also uses the underscore to separate digits for readability, e.g. 16#DEAD_BEEF#
.
C utilises quite a simplistic and not very consistent method to define literals, e.g. 101
, 0754
, 0xDEADBEEF
, etc. The issue with C is that you also have to specify a suffix to numbers to say if something is long (l
), unsigned long (ul
), float (f
), double (d
), etc.
C++ is a superset of C, so it takes what C has and adds more. C++14 has added the 0b
prefix which makes it consistent with respect to hex literals. C++14 also added the apostrophe as a digit separator which I imagine would make creating a scanner even more difficult than it already is.
Basically as for C, except there is no octal support.
Similar to C, but Walter Bright did add the underscore from Ada as a separator.
Go utlises something similar to Parasail with the addition of allowing underscores between the prefix and the digits, e.g. 0b_1001_1110
, 0x_DEAD_Beef
, which I think is very readable.
Essentially the same as C.
Here's a weird one. It's essentially got the prefix that C has, no suffixes as the compiler can infer the size from the literal given. It also adds prefixes for decimal and dozenal1 (which is a new on me), 0d
and 0z
respectively.
Parasail is influenced by Ada as it is developed by someone who has worked on Ada compilers for a long time, so they include Ada's based numerals, they've retained the underscore, but also added literals similar to C style languages, e.g. 0b1001_1011
, 0xdead_beef
, etc.
Again, similar to C, has included underscores for separators, but has different suffixes, using the data types in the language, e.g. u32
, i16
, etc.
I like the fact that Go has allowed the underscore between the prefix and the literal, that aids in readability. Without it, the number seems too cramped.
I know that the 0x
is meant to mean hex and everyone knows that and that the x is from the word hexadecimal, but I do wonder why it's not 0h
?
0hdead_beef
0h_dead_beef
You'd think that would be more consistent, but it does look odd.
I have always liked Ada's method, other languages have taken the underscore as it makes sense:
A : Integer := 16#ff_ff#; -- Fine.
B : Integer := 16#_ffff_#; -- Error.
B : Integer := 16# ffff #; -- Error.
Ada doesn't support adding an underscore nor a space between the hashes and the literal. If the editor has syntax highlighting to show the hashes as different colours that helps with the readability of these.
A strong statically compiled language should never need any kind of suffix, Ada and Odin don't, I won't.
I could take the combination approach of Parasail, allow the Go style prefix and the Ada style based literals. But this does go against the "one way to do things."
Going with the based numeral from Ada, allows exotic base systems, such as the duodecimal as noted above and base 36 and base 64. GNAT (Ada) allows bases 2-16.
$ ~/opt/free-ada-9.2.0/bin/gnatmake -c test.adb
gcc -c test.adb
test.adb:2:19: base not 2-16
gnatmake: "test.adb" compilation error
Neither Ada nor ParaSail have keywords for base numeric types, i.e. Integer
, Float
, they are defined in the package Standard
in Ada and defined in terms of the base language features. Both languages define literals to be of a Universal
literal type, either Universal_Integer
or Universal_Float
, of which all numeric (in their class, integer or real) types are compatible with.
Ada goes further by defining Character
and String
inside Standard
also. Characters are an enumeration and Strings are an array of Characters. Ada String's are currently Latin 1.
ParaSail goes further in defining Univ_Char/String<>
which are module types.
This seems to make sense where new numeric types can be defined and constrained, I cannot see any other way to handle this. These implicit types are the sets of all of their respective numbers, i.e. the set of all integers and all reals.
In Orenda decimals can be specified with a base in Ada-like or a Cish-like manner.
// Normal decimal literals.
10
100_000
// Decimals with specified bases.
2#100_0_11_11#
8#677#
16#dead_BEEF#
16# 1f #
7# 66_4#
16# FF_EE #
As can be seen in the above code, a space can be placed after the hash following the base or before the final hash, this aids in readability.
In the following Cish-like literals, an underscore can be placed between the base and the number, again, for readability.
// Binary literals.
0b1010_1100
0b_1010_1100
0b1_110_01_10
// Octal literals.
0o664
0o6_6_4
0O_664
// Hexadecimal literals.
0xdead
0x_da_ff
0XFAFF
Real numbers are specified in a similar manner.
3.14
400_0.14e34_0
10.2E-7
Identifiers are specified as follows.
Orenda
This_Is_An_Identifier
this_is_another_one
ThisIsAnother
thisCanBeToo
Id1.Id2.Id3
As can be seen above, qualified identifiers are composed of multiple identifiers separated with dots.
Unicode is built into the language and all source is Unicode, UTF-8. Like C there are certain escape sequences (to be expanded upon).
// Runes.
'\n'
'\t'
'\u20c3'
'g'
' '
'λ'
'ᱟ'
// This is a tab character ' '
'\t'
//'''
'\''
//'"'
'\"'
''
'😂'
// Strings.
""
" "
"\tHello\n"
"मैं काँच खा सकता हूँ और मुझे उससे कोई चोट नहीं पहुंचती"
"\""
//"""
//"'"
"\"Hello\" 😂 \u20c3"