-
Notifications
You must be signed in to change notification settings - Fork 64
Slow parsing #20
Comments
SyntaxKit is quite slow due to the way textmate grammars are specified (regexes, nondeterministic, infinite lookahead etc.). Personally I would only use asynchronous incremental parsing for interactive highlighting (see AttributedParsingOperation). But this takes some manual synchronization management, plus I don't know if you are familiar with NSOperation. Also the first time highlighting will always take long. |
Yes I'm familiar with What do you think if I create a repo with an |
Do you have any example for |
You can, but I doubt that the implementation will become a lot faster. There is a little code snippet without synchroniztion in #18. |
At least it will not block the interface. Other question: in let secondOperation = AttributedParsingOperation(string: newInput,
previousOperation: firstOperation,
changeIsInsertion: true,
changedRange: NSRange(location: 0, length: 13)) to what corresponds the Do you mean if the text has been added at the end of the previous string or not? |
SyntaxKit does not cache regular expression matches if that is what you mean. |
Since I call SyntaxKit from NSTextStorage, the parser is called multiple times with the same string at init. What do you think of adding a cache so that in this case it would be faster? |
You are free to do that. Just add a performance test case to see if it is really faster. |
I would say the problem is an algorithmic/complexity one. It is very slow. |
I'll add the file as a performance test case. I tried some caching but the current performance test cases only run ~ 10% faster. |
Better than nothing! |
I added it as a test and on my machine it takes about 1.12 seconds on master and 1.27 seconds on the caching branch (no big optimizations yet) to parse the file above. The problem is pretty simple: Regex-based grammars are the wrong way to do syntax highlighting. Take a medium complexity tmLanguage file like Swift or LaTeX that has ~50 top-level rules. Then take a medium sized file with ~500 lines of code. Because of begin-end constructions the amount of lines that have to be matched are somewhere between 1.25x and 2x the actual lines of the file. Every rule needs to be matched on every line so we get 50000 regex matches per file and that takes time. You can do some optimizations like skipping some rules if you found a really good match but the gains are not fundamental. |
I'm using this
.tmLanguage
(it is latex).The parsing of the following text is very long (more than 1 minute). Is there a problem?
Thanks
The text was updated successfully, but these errors were encountered: