Skip to content

Commit

Permalink
Add sortKeys: Boolean flag to ujson/upickle APIs (#553)
Browse files Browse the repository at this point in the history
This PR allows anyone writing JSON to pass in `sortKeys = true` to make
the object keys get output in sorted order.

## Major changes:

1. Introduced a new `upickle.core.BufferedValue` type. This is a sealed
trait hierarchy that can be used to capture all the method calls
received by a `upickle.core.Visitor`, and re-play them later. It
replaces `IndexedValue`, which has JSON-specific idiosyncracies (e.g.
converting MsgPack visitor calls to their JSON equivalents) whereas
`BufferedValue` precisely captures the `Visitor` method calls unchanged.
`IndexedValue` is left for binary compatibility but no longer used
internally, even its ordering use case of buffering up things until a
`$type` tag is found has been replaced by `BufferedValue`

2. As part of the generalization from `IndexedValue` to `BufferedValue`,
we now need to handle non-JSON usage patterns such as non-string-keyed
objects (e.g. found in msgpack). This makes sorting these objects
non-trivial, as their keys can now be arbitrary `BufferedValue`s.
`BufferedValue.valueToSortKey` is a rough hack to traverse any
structured keys and make them sortable.
 
3. Intercept all `transform` calls in the primary upickle/ujson APIs
with `BufferedValue.maybeSortKeysTransform`. This checks the new
`sortKeys` boolean flag, and if true it intercepts the visitor calls,
captures them in a `BufferedValue`, recursively traverses and sorts the
`BufferedValue`'s object keys, then re-plays it back out

4. Binary compatibility stubs have been added to all methods who
received a new `sortKeys: Boolean = true` parameter. These stubs were
not marked as `@Deprecated`, because in some cases the Scala compiler
would preferentially resolve the stub over the
method-with-default-value, causing spurious deprecation warnings.

## Testing

1. Sorting functionality is covered by new `sortKeys` unit tests in
uJson and uPickle, and has been added to the `TestUtil.scala` round-trip
tests to ensure that all existing tests exercise the a subset of the
`sortKeys = true` codepaths for JSON and MsgPack and verify that the
round-tripping is still successful

## Notes

1. It took a surprising amount of code to make this work well, as it
goes against uPickle's "streaming-no-intermediate-data-structures" way
of doing things. `sortKeys = true` *has* to buffer up some kind of
intermediate JSON AST somewhere. I would expect performance to be worse
than the default zero-intermediate-datastructures visitor approach, and
so optimizing the `sortKeys = true` code paths was not a priority in
this PR. If anyone wants maximum performance, they can leave it as the
default `sortKeys = false`

2. The whole `Transformer`/`transform` abstraction in the codebase is
kind of half-baked and unsatisfactory, but I left it mostly as is for
now with only small tweaks to make things work (e.g. allowing the
implementation of `BufferedValue.maybeSortKeysTransform` to live in
`upickle.core`). Realistically, the whole name `Transformer` and the
concept it represents probably needs to be overhauled, but that's out of
scope for this PR

3. `TestUtil.scala` is getting big and messy, and I'm adding to the
mess. It's probably OK to leave for now, but it could definitely use a
cleanup
  • Loading branch information
lihaoyi authored Feb 9, 2024
1 parent 6e551ac commit 0b81593
Show file tree
Hide file tree
Showing 17 changed files with 503 additions and 66 deletions.
2 changes: 1 addition & 1 deletion build.sc
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ val sourcecode = "0.3.0"
val dottyCustomVersion = Option(sys.props("dottyVersion"))

val scala2JVMVersions = Seq(scala212, scala213)
val scalaVersions = scala2JVMVersions ++ Seq(scala3) ++ dottyCustomVersion
val scalaVersions = scala2JVMVersions// ++ Seq(scala3) ++ dottyCustomVersion

trait CommonPlatformModule extends ScalaModule with PlatformScalaModule{

Expand Down
2 changes: 1 addition & 1 deletion ujson/src/ujson/AstTransformer.scala
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import upickle.core.compat._

import scala.language.higherKinds

trait AstTransformer[I] extends Transformer[I] with JsVisitor[I, I]{
trait AstTransformer[I] extends ujson.Transformer[I] with JsVisitor[I, I]{
def apply(t: Readable): I = t.transform(this)

def transformArray[T](f: Visitor[_, T], items: Iterable[I]) = {
Expand Down
2 changes: 2 additions & 0 deletions ujson/src/ujson/IndexedValue.scala
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,12 @@ import upickle.core.{Visitor, ObjVisitor, ArrVisitor, Abort, AbortException}
* want to work with an AST but still provide source-index error positions if
* something goes wrong
*/
@deprecated("Left for binary compatibility, use upickle.core.BufferedValue instead", "upickle after 3.1.4")
sealed trait IndexedValue {
def index: Int
}

@deprecated("Left for binary compatibility, use upickle.core.BufferedValue instead", "upickle after 3.1.4")
object IndexedValue extends Transformer[IndexedValue]{

case class Str(index: Int, value0: java.lang.CharSequence) extends IndexedValue
Expand Down
5 changes: 4 additions & 1 deletion ujson/src/ujson/Readable.scala
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,10 @@ trait Readable {
def transform[T](f: Visitor[_, T]): T
}

object Readable extends ReadableLowPri{
object Readable extends ReadableLowPri with Transformer[Readable]{
def transform[T](j: ujson.Readable, f: upickle.core.Visitor[_, T]): T = {
j.transform(f)
}
case class fromTransformer[T](t: T, w: Transformer[T]) extends Readable{
def transform[T](f: Visitor[_, T]): T = {
w.transform(t, f)
Expand Down
3 changes: 1 addition & 2 deletions ujson/src/ujson/Transformer.scala
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
package ujson
import upickle.core.Visitor

trait Transformer[I] {
def transform[T](j: I, f: Visitor[_, T]): T
trait Transformer[I] extends upickle.core.Transformer[I]{
def transformable[T](j: I) = Readable.fromTransformer(j, this)
}
117 changes: 100 additions & 17 deletions ujson/src/ujson/package.scala
Original file line number Diff line number Diff line change
@@ -1,8 +1,16 @@
import upickle.core.NoOpVisitor
import upickle.core.BufferedValue

package object ujson{
def transform[T](t: Readable, v: upickle.core.Visitor[_, T]) = t.transform(v)
def transform[T](t: Readable,
v: upickle.core.Visitor[_, T],
sortKeys: Boolean = false): T = {
BufferedValue.maybeSortKeysTransform(Readable, t, sortKeys, v)
}

// @deprecated("Binary Compatibility Stub", "upickle after 3.1.4")
def transform[T](t: Readable,
v: upickle.core.Visitor[_, T]): T = transform(t, v, sortKeys = false)
/**
* Read the given JSON input as a JSON struct
*/
Expand All @@ -16,36 +24,71 @@ package object ujson{
*/
def write(t: Value.Value,
indent: Int = -1,
escapeUnicode: Boolean = false): String = {
escapeUnicode: Boolean = false,
sortKeys: Boolean = false): String = {
val writer = new java.io.StringWriter
writeTo(t, writer, indent, escapeUnicode)
writeTo(t, writer, indent, escapeUnicode, sortKeys)
writer.toString
}

// @deprecated("Binary Compatibility Stub", "upickle after 3.1.4")
def write(t: Value.Value,
indent: Int,
escapeUnicode: Boolean): String = {
write(t, indent, escapeUnicode, sortKeys = false)
}

/**
* Write the given JSON struct as a JSON String to the given Writer
*/
def writeTo(t: Value.Value,
out: java.io.Writer,
indent: Int = -1,
escapeUnicode: Boolean = false): Unit = {
transform(t, Renderer(out, indent, escapeUnicode))
escapeUnicode: Boolean = false,
sortKeys: Boolean = false): Unit = {
transform(t, Renderer(out, indent, escapeUnicode), sortKeys)
}

// @deprecated("Binary Compatibility Stub", "upickle after 3.1.4")
def writeTo(t: Value.Value,
out: java.io.Writer,
indent: Int,
escapeUnicode: Boolean): Unit = {
writeTo(t, out, indent, escapeUnicode, sortKeys = false)
}

def writeToOutputStream(t: Value.Value,
out: java.io.OutputStream,
indent: Int = -1,
escapeUnicode: Boolean = false): Unit = {
transform(t, new BaseByteRenderer(out, indent, escapeUnicode))
escapeUnicode: Boolean = false,
sortKeys: Boolean = false): Unit = {
transform(t, new BaseByteRenderer(out, indent, escapeUnicode), sortKeys)
}

// @deprecated("Binary Compatibility Stub", "upickle after 3.1.4")
def writeToOutputStream(t: Value.Value,
out: java.io.OutputStream,
indent: Int,
escapeUnicode: Boolean): Unit = {
writeToOutputStream(t, out, indent, escapeUnicode, sortKeys = false)
}

def writeToByteArray(t: Value.Value,
indent: Int = -1,
escapeUnicode: Boolean = false) = {
escapeUnicode: Boolean = false,
sortKeys: Boolean = false): Array[Byte] = {
val baos = new java.io.ByteArrayOutputStream
writeToOutputStream(t, baos, indent, escapeUnicode)
writeToOutputStream(t, baos, indent, escapeUnicode, sortKeys)
baos.toByteArray
}

// @deprecated("Binary Compatibility Stub", "upickle after 3.1.4")
def writeToByteArray(t: Value.Value,
indent: Int,
escapeUnicode: Boolean): Array[Byte] = {
writeToByteArray(t, indent, escapeUnicode, sortKeys = false)
}

/**
* Parse the given JSON input, failing if it is invalid
*/
Expand All @@ -54,17 +97,39 @@ package object ujson{
* Parse the given JSON input and write it to a string with
* the configured formatting
*/
def reformat(s: Readable, indent: Int = -1, escapeUnicode: Boolean = false): String = {
def reformat(s: Readable,
indent: Int = -1,
escapeUnicode: Boolean = false,
sortKeys: Boolean = false): String = {
val writer = new java.io.StringWriter()
reformatTo(s, writer, indent, escapeUnicode)
reformatTo(s, writer, indent, escapeUnicode, sortKeys)
writer.toString
}

// @deprecated("Binary Compatibility Stub", "upickle after 3.1.4")
def reformat(s: Readable,
indent: Int,
escapeUnicode: Boolean): String = {
reformat(s, indent, escapeUnicode, sortKeys = false)
}
/**
* Parse the given JSON input and write it to a string with
* the configured formatting to the given Writer
*/
def reformatTo(s: Readable, out: java.io.Writer, indent: Int = -1, escapeUnicode: Boolean = false): Unit = {
transform(s, Renderer(out, indent, escapeUnicode))
def reformatTo(s: Readable,
out: java.io.Writer,
indent: Int = -1,
escapeUnicode: Boolean = false,
sortKeys: Boolean = false): Unit = {
transform(s, Renderer(out, indent, escapeUnicode), sortKeys)
}

// @deprecated("Binary Compatibility Stub", "upickle after 3.1.4")
def reformatTo(s: Readable,
out: java.io.Writer,
indent: Int,
escapeUnicode: Boolean): Unit = {
reformatTo(s, out, indent, escapeUnicode, sortKeys = false)
}
/**
* Parse the given JSON input and write it to a string with
Expand All @@ -73,16 +138,34 @@ package object ujson{
def reformatToOutputStream(s: Readable,
out: java.io.OutputStream,
indent: Int = -1,
escapeUnicode: Boolean = false): Unit = {
transform(s, new BaseByteRenderer(out, indent, escapeUnicode))
escapeUnicode: Boolean = false,
sortKeys: Boolean = false): Unit = {
transform(s, new BaseByteRenderer(out, indent, escapeUnicode), sortKeys)
}

// @deprecated("Binary Compatibility Stub", "upickle after 3.1.4")
def reformatToOutputStream(s: Readable,
out: java.io.OutputStream,
indent: Int,
escapeUnicode: Boolean): Unit = {
reformatToOutputStream(s, out, indent, escapeUnicode, sortKeys = false)
}

def reformatToByteArray(s: Readable,
indent: Int = -1,
escapeUnicode: Boolean = false) = {
escapeUnicode: Boolean = false,
sortKeys: Boolean = false): Array[Byte] = {
val baos = new java.io.ByteArrayOutputStream
reformatToOutputStream(s, baos, indent, escapeUnicode)
reformatToOutputStream(s, baos, indent, escapeUnicode, sortKeys)
baos.toByteArray
}

// @deprecated("Binary Compatibility Stub", "upickle after 3.1.4")
def reformatToByteArray(s: Readable,
indent: Int,
escapeUnicode: Boolean): Array[Byte] = {
reformatToByteArray(s, indent, escapeUnicode, sortKeys = false)
}
// End ujson
@deprecated("use ujson.Value")
type Js = Value
Expand Down
27 changes: 27 additions & 0 deletions ujson/test/src/ujson/JsonTests.scala
Original file line number Diff line number Diff line change
Expand Up @@ -114,5 +114,32 @@ object JsonTests extends TestSuite {
reader.getBufferCopyCount() ==> 0
reader.getBufferLength() ==> bytes.length
}
test("sortKeys"){
val raw = """{"d": [{"c": 0, "b": 1}], "a": 2}"""
val sorted = """{
| "a": 2,
| "d": [
| {
| "b": 1,
| "c": 0
| }
| ]
|}""".stripMargin
ujson.reformat(raw, indent = 4, sortKeys = true) ==> sorted

ujson.write(ujson.read(raw), indent = 4, sortKeys = true) ==> sorted

val baos = new java.io.ByteArrayOutputStream
ujson.writeToOutputStream(ujson.read(raw), baos, indent = 4, sortKeys = true)
baos.toString ==> sorted

val writer = new java.io.StringWriter
ujson.writeTo(ujson.read(raw), writer, indent = 4, sortKeys = true)
writer.toString ==> sorted

new String(ujson.writeToByteArray(ujson.read(raw), indent = 4, sortKeys = true)) ==> sorted

new String(ujson.reformatToByteArray(raw, indent = 4, sortKeys = true)) ==> sorted
}
}
}
9 changes: 9 additions & 0 deletions upickle/core/src-2.12/upickle/core/compat/SortInPlace.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
package upickle.core.compat

object SortInPlace {
def apply[T, B: Ordering](t: collection.mutable.ArrayBuffer[T])(f: T => B): Unit = {
val sorted = t.sortBy(f)
t.clear()
t.appendAll(sorted)
}
}
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
package upickle.core.compat

import upickle.core.compat.Factory
import upickle.core.LinkedHashMap
import upickle.core.compat.Factory

import scala.collection.mutable

Expand Down
7 changes: 7 additions & 0 deletions upickle/core/src-2.13+/upickle/core/compat/SortInPlace.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
package upickle.core.compat

object SortInPlace {
def apply[T, B: scala.Ordering](t: collection.mutable.ArrayBuffer[T])(f: T => B): Unit = {
t.sortInPlaceBy(f)
}
}
Loading

0 comments on commit 0b81593

Please sign in to comment.