Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docgen: sort symbols (fix #17910) #18560

Merged
merged 5 commits into from
Jul 25, 2021
Merged

docgen: sort symbols (fix #17910) #18560

merged 5 commits into from
Jul 25, 2021

Conversation

a-mr
Copy link
Contributor

@a-mr a-mr commented Jul 22, 2021

Fixes #17910 : nim doc sorts all symbols alphabetically now both in text and left bar TOC.
Example for os.nim:
image

cc @zetashift @timotheecour

@zetashift
Copy link
Contributor

Ah, very kind for you to tag me here but I did not have the time to delve deep, however I just want to thank you for the PR! 🚀

Comment on lines 1283 to 1286
let overloadableNames = toSeq(keys(d.tocTable[kind]))
for plainName in overloadableNames.sorted():
var overloadChoices = d.tocTable[kind][plainName]
overloadChoices.sort(cmp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite slow, can we do something about it?

Copy link
Member

@timotheecour timotheecour Jul 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it does not look quite slow, calling N sorts where N = nb overloaded symbols will be dwarfed by other costs (parsing, semcheck, etc); although this could be optimized, there is no point in premature optimization at expense of simplicity.

and indeed:
nim doc --project --doccmd:skip lib/pure/times.nim takes 1.5 seconds (excluding runnableExamples)
the cost for this block of the whole block in if groupedToc: is 0.0035 (lumping other code too, this is an upperbound), ie 1/445 of total cost for a total of 1936 overloads. There are many optimizations worth considering before considering this one.

(0.0035069999999999824, 0.0000020000000000575113, 1936, 1, "D20210723T093627.total")
diff --git a/compiler/docgen.nim b/compiler/docgen.nim
index 842f95ce7..dde9ba0bb 100644
--- a/compiler/docgen.nim
+++ b/compiler/docgen.nim
@@ -1266,7 +1266,7 @@ proc generateTags*(d: PDoc, n: PNode, r: var string) =
     if not checkForFalse(n[0][0]):
       generateTags(d, lastSon(n[0]), r)
   else: discard
-
+import std/times except getDateStr, getClockStr
 proc genSection(d: PDoc, kind: TSymKind, groupedToc = false) =
   const sectionNames: array[skModule..skField, string] = [
     "Imports", "Types", "Vars", "Lets", "Consts", "Vars", "Procs", "Funcs",
@@ -1280,10 +1280,16 @@ proc genSection(d: PDoc, kind: TSymKind, groupedToc = false) =

   proc cmp(x, y: TocItem): int = cmpIgnoreCase(x.sortName, y.sortName)
   if groupedToc:
+    var dt0 {.global.}: typeof(cpuTime() - cpuTime())
+    var tot0 {.global.}: int
+    let t = cpuTime()
+    var tot = 0
     let overloadableNames = toSeq(keys(d.tocTable[kind]))
     for plainName in overloadableNames.sorted():
       var overloadChoices = d.tocTable[kind][plainName]
+      # echo "D20210723T093748: ", $overloadChoices.len, " " , plainName
       overloadChoices.sort(cmp)
+      tot += overloadChoices.len
       var content: string
       for item in overloadChoices:
         content.add item.content
@@ -1291,6 +1297,11 @@ proc genSection(d: PDoc, kind: TSymKind, groupedToc = false) =
           "sectionid", $ord(kind), "sectionTitle", title,
           "sectionTitleID", $(ord(kind) + 50),
           "content", content, "plainName", plainName]
+    let t2 = cpuTime()
+    let dt = t2 - t
+    dt0 += dt
+    tot0 += tot
+    echo (dt0, dt, tot0, tot, "D20210723T093627.total")
   else:
     for item in d.tocSimple[kind].sorted(cmp):
       d.toc2[kind].add item.content

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many optimizations worth considering before considering this one.

I hoped somebody would be prove me wrong by the numbers, however:

  1. Code is easier to optimize when it's still fresh in the developer's head.
  2. Things add up. Today's compiler is significantly slower than a couple of years ago. And the things that made it slower are hard to optimize. At the same time, I still get no help whatsoever in the one architectural change (= IC) that would imply we can stop caring about performance in most of the compiler's code.

Copy link
Member

@timotheecour timotheecour Jul 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with 1., but I'm skeptical of 2.; big-O notation and Amdahl's law is what matters in the end and if the compiler is slower today than years ago, it's more likely to be caused by a small number of bottlenecks (that should be identified and fixed) than to a multitude of small things adding up (eg better error messages, static analysis etc).

On the same vein, optimizing PNode (eg removing comment node) will have a bigger impact on memory/performance than optimizing PSym since much more PNode's are allocated.

As for IC, I'm still convinced this is the wrong approach, and cling shows that IC can be be done with a compiler as a service (CAS) approach even for a language like C++. Even if nim's current IC approach would end up being faster than current nim full re-compilation (which isn't the case today), the overheads involved would still make CAS much faster.

Not only that, but you'd also get a REPL for free in the process, unlike with IC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"compiler as a service (CAS) approach" is inherently non-scalable wrt memory consumption. It's also completely not required, C# and Delphi (Delphi in 1990 btw) have proven beyond doubt that "save to disk" is fast enough for everything, REPL included. Plus IC forces us to eliminate phase-ordering bugs which plague the Nim compiler.

@timotheecour
Copy link
Member

timotheecour commented Jul 23, 2021

how about using postprocessLexical defined below, to sort names involving numbers in a more intuitive way:

import std/[parseutils, strformat, algorithm, strutils, sequtils]

proc postprocessLexical(a: string): string =
  var i = 0
  while i<a.len:
    let c = a[i]
    if c in {'0'..'9'}:
      var num = 0
      i += parseInt(a, num, i)
      formatValue(result, num, "0>4")
    else:
      result.add c
      i.inc

proc main()=
  type A = object
    name, sortName: string
  let names = "uint8 uint16 uint32 uint64".split.mapIt(A(name: it, sortName: it.postprocessLexical))
  doAssert names.sortedByIt(it.sortName).mapIt(it.name) == @["uint8", "uint16", "uint32", "uint64"]
main()

Copy link
Member

@timotheecour timotheecour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@a-mr
Copy link
Contributor Author

a-mr commented Jul 24, 2021

@timotheecour ,
I agree that wrong decimal sorting like uint16 < uint8 can be quite annoying.
I added a function cmpDecimalsIgnoreCase for handling of such cases, I think it may be a bit faster/more robust.

## For sorting with correct handling of cases like 'uint8' and 'uint16'.
## Also handles leading zeroes well.
runnableExamples:
doAssert cmpDecimalsIgnoreCase("uint8", "uint16") < 0
Copy link
Member

@timotheecour timotheecour Jul 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about foo08 vs foo8?
right now it gives 0, which is not intuitive because the strings are not equal.
how about instead considering 08 is longer than 8 and having:

doAssert cmpDecimalsIgnoreCase("ab8", "ab08") < 0
doAssert cmpDecimalsIgnoreCase("ab8de", "ab08c") < 0 # sanity check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, fixed.

@timotheecour
Copy link
Member

timotheecour commented Jul 25, 2021

btw, after a more careful timing so that it only measures the sort, not the other context in the block, i now get:
(0.0004810000000006198, 0.0000020000000000575113, 1313, 1, "D20210723T093627.total")
ie, 1/3000 of the total time for nim doc --project --doccmd:skip lib/pure/times.nim

so, nothing to "fix" here performance wise; not that it's a surprise, given that the cost is:

sum_ni ni * log ni <= n * log n # with n = nb_symbol
diff --git a/compiler/docgen.nim b/compiler/docgen.nim
index 3e5c1c912..2a3f6e6d7 100644
--- a/compiler/docgen.nim
+++ b/compiler/docgen.nim
@@ -1312,6 +1312,8 @@ proc generateTags*(d: PDoc, n: PNode, r: var string) =
       generateTags(d, lastSon(n[0]), r)
   else: discard

+import std/times except getDateStr, getClockStr
+
 proc genSection(d: PDoc, kind: TSymKind, groupedToc = false) =
   const sectionNames: array[skModule..skField, string] = [
     "Imports", "Types", "Vars", "Lets", "Consts", "Vars", "Procs", "Funcs",
@@ -1325,8 +1327,19 @@ proc genSection(d: PDoc, kind: TSymKind, groupedToc = false) =

   proc cmp(x, y: TocItem): int = cmpDecimalsIgnoreCase(x.sortName, y.sortName)
   if groupedToc:
+    var dt0 {.global.}: typeof(cpuTime() - cpuTime())
+    var tot0 {.global.}: int
+    var tot = 0
     let overloadableNames = toSeq(keys(d.tocTable[kind]))
-    for plainName in overloadableNames.sorted(cmpDecimalsIgnoreCase):
+    let t = cpuTime()
+    let names2 = overloadableNames.sorted(cmpDecimalsIgnoreCase)
+    let t2 = cpuTime()
+    tot += names2.len
+    let dt = t2 - t
+    dt0 += dt
+    tot0 += tot
+    echo (dt0, dt, tot0, tot, "D20210723T093627.total")
+    for plainName in names2:
       var overloadChoices = d.tocTable[kind][plainName]
       overloadChoices.sort(cmp)
       var content: string

* more compact names for non-overloaded symbols
* more predictable Ascii sort (e.g. `<` instead of `&lt;`)
@Araq Araq merged commit 10da888 into nim-lang:devel Jul 25, 2021
PMunch pushed a commit to PMunch/Nim that referenced this pull request Mar 28, 2022
* docgen: sort symbols (fix nim-lang#17910)

* add workaround + change naming

* switch to a dedicated sort comparator

* fix numbers with unequal string lengths

* dedicated `sortName` instead of `plainNameEsc`:

* more compact names for non-overloaded symbols
* more predictable Ascii sort (e.g. `<` instead of `&lt;`)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nim doc shows symbols in random/non-intuitive order, should be sorted
4 participants