Skip to content

Turn numeric or hex IDs into reversible and human friendly obfuscated strings

License

Notifications You must be signed in to change notification settings

stevegeek/encoded_id

Repository files navigation

EncodedId

Encode numerical or hex IDs into obfuscated strings that can be used in URLs.

coder = ::EncodedId::ReversibleId.new(salt: my_salt)
coder.encode(123)
# => "p5w9-z27j"
coder.encode_hex("10f8c")
# => "w72a-y0az"

The obfuscated strings are reversible (they decode them back into the original IDs).

Also supports encoding multiple IDs at once.

my_salt = "salt!"
coder = ::EncodedId::ReversibleId.new(salt: my_salt)

# One of more values can be encoded
coder.encode([78, 45])
# => "z2j7-0dmw"

# The encoded string can then be reversed back into the original IDs
coder.decode("z2j7-0dmw")
# => [78, 45]

# The decoder can be resilient to easily confused characters
coder.decode("z2j7-Odmw") # (note the capital 'o' instead of zero)
# => [78, 45]

Features

  • 🔄 encoded IDs are reversible (uses Hashids, the old site is here https://github.com/hashids/hashids.github.io))
  • 👥 supports multiple IDs encoded in one encoded string (eg 7aq6-0zqw decodes to [78, 45])
  • 🔡 supports custom alphabets for the encoded string (at least 16 characters needed)
    • by default uses a variation of the Crockford reduced character set (https://www.crockford.com/base32.html)
    • 👓 easily confused characters (eg i and j, 0 and O, 1 and I etc) are mapped to counterpart characters, to help avoid common readability mistakes when reading/sharing
    • 🤬 build in profanity limitation
  • 🤓 encoded string can be split into groups of letters to improve human-readability
    • eg nft9hr834htu as nft9-hr83-4htu
  • 🥽 supports limits on length to prevent resource exhaustion on encoding and decoding
  • configured with sensible defaults

I aim for 100% test coverage and have fuzz tested quite extensively. But please report any issues!

Experimental

  • support for encoding of hex strings (eg UUIDs), including multiple IDs encoded in one string

Performance and benchmarking

The hashids implementation is being optimised for performance and memory usage. Compared to the original hashids gem, the encoded_id gem is faster and uses less memory. The exact performance difference depends on the length of the encoded string and other factors.

Encoding comparison vs hashids gem

See bin/benchmark

hashids gem, no YJIT: length 8: 69569.5 i/s

encoded_id, no YJIT: length 8: 97085.8 i/s - 1.4x faster

or with YJIT enabled, the effect is more pronounced:

hashids gem, YJIT: length 8: 109341.5 i/s

encoded_id, YJIT: length 8: 223661.0 i/s - 2.3x faster

Memory allocations

See bin/memory_profile.

For a small input (2 integers):

hashids gem: Total allocated: 7456 bytes (120 objects)

encoded_id gem: Total allocated: 848 bytes (8 objects)

But with a 100 integer array input, the difference is huge (37x difference):

hashids gem: Total allocated: 413264 bytes (5998 objects)

encoded_id gem: Total allocated: 11024 bytes (204 objects)

Rails support encoded_id-rails

To use with Rails check out the encoded_id-rails gem.

class User < ApplicationRecord
  include EncodedId::WithEncodedId
end

User.find_by_encoded_id("p5w9-z27j")
# => #<User id: 78>

Note on security of encoded IDs (hashids)

Encoded IDs are not secure. It maybe possible to reverse them via brute-force. They are meant to be used in URLs as an obfuscation. The algorithm is not an encryption.

Please read more on https://hashids.org/

Compare to alternate Gems

Installation

Install the gem and add to the application's Gemfile by executing:

$ bundle add encoded_id

If bundler is not being used to manage dependencies, install the gem by executing:

$ gem install encoded_id

EncodedId::ReversibleId.new

To create an instance of the encoder/decoder use .new with the salt option:

coder = EncodedId::ReversibleId.new(
  # The salt is required
  salt: ...,
  # And then the following options are optional
  length: 8, 
  split_at: 4, 
  split_with: "-",
  alphabet: EncodedId::Alphabet.modified_crockford,
  hex_digit_encoding_group_size: 4 # Experimental
)

Note the salt value is required and should be a string of some length (greater than 3 characters). This is used to generate the encoded string.

It will need to be the same value when decoding the string back into the original ID. If the salt is changed, the encoded strings will be different and possibly decode to different IDs.

Options

The encoded ID is configurable. The following can be changed:

  • the length, eg 8 characters for p5w9-z27j
  • the alphabet used in it (min 16 characters)
  • and the number of characters to split the output into and the separator

length

length: the minimum length of the encoded string. The default is 8 characters.

The actual length of the encoded string can be longer if the inputs cannot be represented in the minimum length.

max_length

max_length: the maximum length of the encoded string. The default is 128 characters.

The maximum length represents both the longest encoded string that will be generated and also a limit on the maximum input length that will be decoded. If the encoded string exceeds max_length then a EncodedIdLengthError will be raised. If the input exceeds max_length then a InvalidInputError will be raised. If max_length is set to nil, then no validation, even using the default will be performed.

max_inputs_per_id

max_inputs_per_id: the maximum amount of IDs to be encoded together. The default is 32.

This maximum amount is used to limit:

  • the length of array input passed to encode
  • the length of integer array encoded in hex string(s) passed to encode_hex function. InvalidInputError wil be raised when array longer than max_inputs_per_id is provided.

alphabet

alphabet: the alphabet used in the encoded string. By default, it uses a variation of the Crockford reduced character set (https://www.crockford.com/base32.html).

alphabet must be an instance of EncodedId::Alphabet.

The default alphabet is EncodedId::Alphabet.modified_crockford.

To create a new alphabet, use EncodedId::Alphabet.new:

alphabet = EncodedId::Alphabet.new("0123456789abcdef")

EncodedId::Alphabet.new(characters, equivalences)

characters

characters: the characters of the alphabet. Can be a string or array of strings.

Note that the characters of the alphabet must be at least 16 unique characters long and must not contain any whitespace characters.

alphabet = EncodedId::Alphabet.new("ςερτυθιοπλκξηγφδσαζχψωβνμ")
coder = ::EncodedId::ReversibleId.new(salt: my_salt, alphabet: alphabet)
coder.encode(123)
# => "πφλχ-ψησω"

Note that larger alphabets can result in shorter encoded strings (but remember that length specifies the minimum length of the encoded string).

equivalences

You can optionally pass an appropriate character equivalences mapping. This is used to map easily confused characters to their counterpart.

equivalences: a hash of characters keys, with their equivalent alphabet character mapped to in the values.

Note that the characters to be mapped:

  • must not be in the alphabet,
  • must map to a character that is in the alphabet.

nil is the default value which means no equivalences are used.

alphabet = EncodedId::Alphabet.new("!@#$%^&*()+-={}", {"_" => "-"})
coder = ::EncodedId::ReversibleId.new(salt: my_salt, alphabet: alphabet)
coder.encode(123)
# => "}*^(-^}*="

split_at and split_with

For readability, the encoded string can be split into groups of characters.

split_at: specifies the number of characters to split the encoded string into. Defaults to 4.

split_with: specifies the separator to use between the groups. Default is -.

Set either to nil to disable splitting.

hex_digit_encoding_group_size

Experimental

hex_digit_encoding_group_size: specifies the number of hex digits to encode in a group. Defaults to 4. Can be between 1 and 32.

Can be used to control the size of the encoded string when encoding hex strings. Larger values will result in shorter encoded strings for long inputs, and shorter values will result in shorter encoded strings for smaller inputs.

But note that bigger values will also result in larger markers that separate the groups so could end up increasing the encoded string length undesirably.

See below section Using with hex strings for more details.

EncodedId::ReversibleId#encode

#encode(id): where id is an integer or array of integers to encode.

coder.encode(123)
# => "p5w9-z27j"

# One of more values can be encoded
coder.encode([78, 45])
# => "z2j7-0dmw"

EncodedId::ReversibleId#decode

#decode(encoded_id): where encoded_id is a string to decode.

# The encoded string can then be reversed back into the original IDs
coder.decode("z2j7-0dmw")
# => [78, 45]

Using with hex strings

Experimental (subject to incompatible changes in future versions)

# Hex input strings are also supported
coder.encode_hex("10f8c")
# => "w72a-y0az"

When encoding hex strings, the input is split into groups of hex digits, and each group is encoded separately as its integer equivalent. In other words the input is converted into an array of integers and encoded as normal with the encode method.

eg with hex_digit_encoding_group_size=1 and inpu f1, is split into f and 1, and then encoded as 15 and 1 respectively, ie encode is called with [15, 1].

To encode multiple hex inputs the encoded string contains markers to indicate the start of a new hex input. This marker is equal to an integer value which is 1 larger than the maximum value the hex digit encoding group size can represent (ie it is 2^(hex_digit_encoding_group_size * 4)).

So for a hex digit encoding group size of 4 (ie group max value is 0xFFFF), the marker is 65536

For example with hex_digit_encoding_group_size=1 for the inputs f1 and e2 encoded together, the actual encoded integer array is [15, 1, 16, 14, 2].

EncodedId::ReversibleId#encode_hex

encode_hex(hex_string) , where hex_string is a string of hex digits or an array of hex strings.

# UUIDs will result in long output strings...
coder.encode_hex("9a566b8b-8618-42ab-8db7-a5a0276401fd")
# => "5jjy-c8d9-hxp2-qsve-rgh9-rxnt-7nb5-tve7-bf84-vr"
# 
# but there is an option to help reduce this... 
coder = ::EncodedId::ReversibleId.new(salt: my_salt, hex_digit_encoding_group_size: 32)
coder.encode_hex("9a566b8b-8618-42ab-8db7-a5a0276401fd")
# => "vr7m-qra8-m5y6-dkgj-5rqr-q44e-gp4a-52"

EncodedId::ReversibleId#decode_hex

decode_hex(encoded_id) , where the output is an array of hex strings.

coder.decode_hex("5jjy-c8d9-hxp2-qsve-rgh9-rxnt-7nb5-tve7-bf84-vr")
# => ["9a566b8b-8618-42ab-8db7-a5a0276401fd"]

Development

After checking out the repo, run bin/setup to install dependencies.

Run bin/console for an interactive prompt that will allow you to experiment.

Running tests

Run bundle exec rake test to run the tests.

Type check

First install RBS dependencies:

rbs collection install

Then run:

steep check

See also

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/stevegeek/encoded_id.

License

The gem is available as open source under the terms of the MIT License.

keywords

hash ID, friendly ID, obfuscate ID, rails, ActiveRecord, model, slug, vanity URL, friendly URL

About

Turn numeric or hex IDs into reversible and human friendly obfuscated strings

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •