-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docgen: sort symbols (fix #17910) #18560
Conversation
Ah, very kind for you to tag me here but I did not have the time to delve deep, however I just want to thank you for the PR! 🚀 |
compiler/docgen.nim
Outdated
let overloadableNames = toSeq(keys(d.tocTable[kind])) | ||
for plainName in overloadableNames.sorted(): | ||
var overloadChoices = d.tocTable[kind][plainName] | ||
overloadChoices.sort(cmp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks quite slow, can we do something about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no it does not look quite slow, calling N sorts where N = nb overloaded symbols will be dwarfed by other costs (parsing, semcheck, etc); although this could be optimized, there is no point in premature optimization at expense of simplicity.
and indeed:
nim doc --project --doccmd:skip lib/pure/times.nim
takes 1.5 seconds (excluding runnableExamples)
the cost for this block of the whole block in if groupedToc:
is 0.0035 (lumping other code too, this is an upperbound), ie 1/445 of total cost for a total of 1936 overloads. There are many optimizations worth considering before considering this one.
(0.0035069999999999824, 0.0000020000000000575113, 1936, 1, "D20210723T093627.total")
diff --git a/compiler/docgen.nim b/compiler/docgen.nim
index 842f95ce7..dde9ba0bb 100644
--- a/compiler/docgen.nim
+++ b/compiler/docgen.nim
@@ -1266,7 +1266,7 @@ proc generateTags*(d: PDoc, n: PNode, r: var string) =
if not checkForFalse(n[0][0]):
generateTags(d, lastSon(n[0]), r)
else: discard
-
+import std/times except getDateStr, getClockStr
proc genSection(d: PDoc, kind: TSymKind, groupedToc = false) =
const sectionNames: array[skModule..skField, string] = [
"Imports", "Types", "Vars", "Lets", "Consts", "Vars", "Procs", "Funcs",
@@ -1280,10 +1280,16 @@ proc genSection(d: PDoc, kind: TSymKind, groupedToc = false) =
proc cmp(x, y: TocItem): int = cmpIgnoreCase(x.sortName, y.sortName)
if groupedToc:
+ var dt0 {.global.}: typeof(cpuTime() - cpuTime())
+ var tot0 {.global.}: int
+ let t = cpuTime()
+ var tot = 0
let overloadableNames = toSeq(keys(d.tocTable[kind]))
for plainName in overloadableNames.sorted():
var overloadChoices = d.tocTable[kind][plainName]
+ # echo "D20210723T093748: ", $overloadChoices.len, " " , plainName
overloadChoices.sort(cmp)
+ tot += overloadChoices.len
var content: string
for item in overloadChoices:
content.add item.content
@@ -1291,6 +1297,11 @@ proc genSection(d: PDoc, kind: TSymKind, groupedToc = false) =
"sectionid", $ord(kind), "sectionTitle", title,
"sectionTitleID", $(ord(kind) + 50),
"content", content, "plainName", plainName]
+ let t2 = cpuTime()
+ let dt = t2 - t
+ dt0 += dt
+ tot0 += tot
+ echo (dt0, dt, tot0, tot, "D20210723T093627.total")
else:
for item in d.tocSimple[kind].sorted(cmp):
d.toc2[kind].add item.content
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many optimizations worth considering before considering this one.
I hoped somebody would be prove me wrong by the numbers, however:
- Code is easier to optimize when it's still fresh in the developer's head.
- Things add up. Today's compiler is significantly slower than a couple of years ago. And the things that made it slower are hard to optimize. At the same time, I still get no help whatsoever in the one architectural change (= IC) that would imply we can stop caring about performance in most of the compiler's code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with 1.
, but I'm skeptical of 2.
; big-O notation and Amdahl's law is what matters in the end and if the compiler is slower today than years ago, it's more likely to be caused by a small number of bottlenecks (that should be identified and fixed) than to a multitude of small things adding up (eg better error messages, static analysis etc).
On the same vein, optimizing PNode (eg removing comment node) will have a bigger impact on memory/performance than optimizing PSym since much more PNode's are allocated.
As for IC, I'm still convinced this is the wrong approach, and cling
shows that IC can be be done with a compiler as a service (CAS) approach even for a language like C++. Even if nim's current IC approach would end up being faster than current nim full re-compilation (which isn't the case today), the overheads involved would still make CAS much faster.
Not only that, but you'd also get a REPL for free in the process, unlike with IC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"compiler as a service (CAS) approach" is inherently non-scalable wrt memory consumption. It's also completely not required, C# and Delphi (Delphi in 1990 btw) have proven beyond doubt that "save to disk" is fast enough for everything, REPL included. Plus IC forces us to eliminate phase-ordering bugs which plague the Nim compiler.
how about using import std/[parseutils, strformat, algorithm, strutils, sequtils]
proc postprocessLexical(a: string): string =
var i = 0
while i<a.len:
let c = a[i]
if c in {'0'..'9'}:
var num = 0
i += parseInt(a, num, i)
formatValue(result, num, "0>4")
else:
result.add c
i.inc
proc main()=
type A = object
name, sortName: string
let names = "uint8 uint16 uint32 uint64".split.mapIt(A(name: it, sortName: it.postprocessLexical))
doAssert names.sortedByIt(it.sortName).mapIt(it.name) == @["uint8", "uint16", "uint32", "uint64"]
main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
- the ordering suggested in docgen: sort symbols (fix #17910) #18560 (comment) can be done either here or in future work
- I believe the performance concern was addressed in https://github.com/nim-lang/Nim/pull/18560/files#r675704843 and that a slowdown upper-bounded by 1/445 is not a concern
@timotheecour , |
## For sorting with correct handling of cases like 'uint8' and 'uint16'. | ||
## Also handles leading zeroes well. | ||
runnableExamples: | ||
doAssert cmpDecimalsIgnoreCase("uint8", "uint16") < 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about foo08
vs foo8
?
right now it gives 0, which is not intuitive because the strings are not equal.
how about instead considering 08 is longer than 8 and having:
doAssert cmpDecimalsIgnoreCase("ab8", "ab08") < 0
doAssert cmpDecimalsIgnoreCase("ab8de", "ab08c") < 0 # sanity check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, fixed.
btw, after a more careful timing so that it only measures the sort, not the other context in the block, i now get: so, nothing to "fix" here performance wise; not that it's a surprise, given that the cost is:
diff --git a/compiler/docgen.nim b/compiler/docgen.nim
index 3e5c1c912..2a3f6e6d7 100644
--- a/compiler/docgen.nim
+++ b/compiler/docgen.nim
@@ -1312,6 +1312,8 @@ proc generateTags*(d: PDoc, n: PNode, r: var string) =
generateTags(d, lastSon(n[0]), r)
else: discard
+import std/times except getDateStr, getClockStr
+
proc genSection(d: PDoc, kind: TSymKind, groupedToc = false) =
const sectionNames: array[skModule..skField, string] = [
"Imports", "Types", "Vars", "Lets", "Consts", "Vars", "Procs", "Funcs",
@@ -1325,8 +1327,19 @@ proc genSection(d: PDoc, kind: TSymKind, groupedToc = false) =
proc cmp(x, y: TocItem): int = cmpDecimalsIgnoreCase(x.sortName, y.sortName)
if groupedToc:
+ var dt0 {.global.}: typeof(cpuTime() - cpuTime())
+ var tot0 {.global.}: int
+ var tot = 0
let overloadableNames = toSeq(keys(d.tocTable[kind]))
- for plainName in overloadableNames.sorted(cmpDecimalsIgnoreCase):
+ let t = cpuTime()
+ let names2 = overloadableNames.sorted(cmpDecimalsIgnoreCase)
+ let t2 = cpuTime()
+ tot += names2.len
+ let dt = t2 - t
+ dt0 += dt
+ tot0 += tot
+ echo (dt0, dt, tot0, tot, "D20210723T093627.total")
+ for plainName in names2:
var overloadChoices = d.tocTable[kind][plainName]
overloadChoices.sort(cmp)
var content: string |
* more compact names for non-overloaded symbols * more predictable Ascii sort (e.g. `<` instead of `<`)
* docgen: sort symbols (fix nim-lang#17910) * add workaround + change naming * switch to a dedicated sort comparator * fix numbers with unequal string lengths * dedicated `sortName` instead of `plainNameEsc`: * more compact names for non-overloaded symbols * more predictable Ascii sort (e.g. `<` instead of `<`)
Fixes #17910 :
nim doc
sorts all symbols alphabetically now both in text and left bar TOC.Example for
os.nim
:cc @zetashift @timotheecour