Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use optimized kotlin grammar #4023

Closed

Conversation

dolik-rce
Copy link
Contributor

I have been developing pegof as a hobby project for past few years. Recently it became good enough to optimize Kotlin grammar, making it 3 times faster and about 2.7 times less memory hungry.

With this PR, I'd like to start a discussion on how to best use this improvement in ctags. I'd like to keep the original, "human readable" version of the grammar in the repo to ensure simple updates and readable git history. On the other hand, I don't think it would be good idea to make ctags build to depend on pegof, it is not available on all platforms. So my proposal is to allow only manual optimization and keep both original and optimized grammar in the repository. What do you think?


Here is a simple benchmark using ctags-codebase to prove that it really makes a difference:

echo "# optimized"
git checkout optimized-kotlin-peg &> /dev/null
make ctags > /dev/null
/usr/bin/time -f "time: %E\nmax memory (kb): %M" ./ctags -f optimized.ctags ../ctags-codebase/code/kotlin/libraries/**/*.kt

echo "# master"
git checkout master &> /dev/null
make ctags > /dev/null
/usr/bin/time -f "time: %E\nmax memory (kb): %M" ./ctags -f master.ctags ../ctags-codebase/code/kotlin/libraries/**/*.kt

echo "# output comparison"
diff optimized.ctags master.ctags
wc -l optimized.ctags master.ctags 

Output:

# optimized
time: 0:13.84
max memory (kb): 447792
# master
time: 0:44.18
max memory (kb): 1204216
# output comparison
31c31
< !_TAG_PROGRAM_VERSION 6.1.0   /885afcb3e/
---
> !_TAG_PROGRAM_VERSION 6.1.0   /8976ec3d2/
   92478 optimized.ctags
   92478 master.ctags
  184956 total

Copy link

codecov bot commented Jun 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.43%. Comparing base (8976ec3) to head (885afcb).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4023   +/-   ##
=======================================
  Coverage   85.43%   85.43%           
=======================================
  Files         235      235           
  Lines       56734    56734           
=======================================
  Hits        48468    48468           
  Misses       8266     8266           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@masatake
Copy link
Member

Incredible work!

At first glance, the following rules in Makefile.am are the parts we must extend:

.peg.c:
	$(packcc_verbose)$(PACKCC) -o $(top_builddir)/peg/$(*F) "$<"
.peg.h:
	$(packcc_verbose)$(PACKCC) -o $(top_builddir)/peg/$(*F) "$<"

I assume configure looks for pegof.
If it finds pegof, it defines HAVE_PEGOF.

Pseudo code:

if HAVE_PEGOF
.peg.c:
        let-pegof-generate-an-optimized-peg-file "$<" > optimized-"$<" && 
	$(packcc_verbose)$(PACKCC) -o $(top_builddir)/peg/$(*F) optimized-"$<"
.peg.h:
        let-pegof-generate-an-optimized-peg-file "$<" > optimized-"$<" && 
	$(packcc_verbose)$(PACKCC) -o $(top_builddir)/peg/$(*F) optimized-"$<"
else
# As before
.peg.c:
	$(packcc_verbose)$(PACKCC) -o $(top_builddir)/peg/$(*F) "$<"
.peg.h:
	$(packcc_verbose)$(PACKCC) -o $(top_builddir)/peg/$(*F) "$<"
endif

I'll study the command line interface of pegof.

Having the following change may be nice.

 #ifdef HAVE_PACKCC
 	/* The test harnesses use this as hints for skipping test cases */
 	{"packcc", "has peg based parser(s)"},
+#ifdef HAVE_PEGOF
+	{"pegof", "has peg based parser(s) optimized by pegof"},
+#endif
 #endif
 	{"optscript", "can use the interpreter"},
 #ifdef HAVE_PCRE2

BTW, @dolik-rce don't you run CI/CD tests like https://github.com/universal-ctags/libreadtags/pull/53/files#diff-78a8a19706dbd2a4425dd72bdab0502ed7a2cef16365ab7030a5a0588927bf47 ?

At least, I think you should have a build (and install) test.

@dolik-rce
Copy link
Contributor Author

Well, using extending configure with HAVE_PEGOF macro would solve the problem with unsupported platforms, but there are other potential problems. The code is not yet stable enough and I definitely wouldn't call it "production quality". I did some extensive testing for the kotlin case (since that was my initial reason to develop pegof, to speed up kotlin parsing in Geany), but using it for all other peg based grammars might be risky. Also, using it only on some systems (pegof doesn't support windows yet and I don't plan to work on it in foreseeable future) could introduce different behavior of ctags across platform, which is not good idea either.

That is why I'd actually prefer to use the manual approach. I know it is not usually considered good practice to commit generated code, but it solves the problem of not having pegof available on MacOS or windows. I can supply a script to do the optimization, to make it simpler for other contributors who maintain PEG grammars. We could probably even make a CI job that would check if the optimized grammars in the repository are up-to-date. It could even suggest updates in case that new version of pegof can generate better optimizations.

BTW, @dolik-rce don't you run CI/CD tests like https://github.com/universal-ctags/libreadtags/pull/53/files#diff-78a8a19706dbd2a4425dd72bdab0502ed7a2cef16365ab7030a5a0588927bf47 ?

At least, I think you should have a build (and install) test.

It's on my TODO list 🙂 As well as providing packages, at least for Arch and Debian-based distributions (which I'm familiar with).

@masatake
Copy link
Member

Well, using extending configure with HAVE_PEGOF macro would solve the problem with unsupported platforms, but there are other potential problems. The code is not yet stable enough and I definitely wouldn't call it "production quality". I did some extensive testing for the kotlin case (since that was my initial reason to develop pegof, to speed up kotlin parsing in Geany), but using it for all other peg based grammars might be risky. Also, using it only on some systems (pegof doesn't support windows yet and I don't plan to work on it in foreseeable future) could introduce different behavior of ctags across platform, which is not good idea either.

That is why I'd actually prefer to use the manual approach. I know it is not usually considered good practice to commit generated code, but it solves the problem of not having pegof available on MacOS or windows. I can supply a script to do the optimization, to make it simpler for other contributors who maintain PEG grammars. We could probably even make a CI job that would check if the optimized grammars in the repository are up-to-date. It could even suggest updates in case that new version of pegof can generate better optimizations.

Understandable but I really want to try pegof.

Years ago, I rewrote the Java parser using packcc. I was disappointed in its performance, which was about five ~ to fifteen times slower than the current crafted Java parser in parsers/c.c. At that time, I needed Pegof.

Even if Pegof is not mature, I would like to provide a way to build Pegof-powered ctags easily. In my experience, I know how the software author is encouraged from the sophisticated bug report of the software.

I will make a pull request based on the HAVE_PEGOF approach.
I really want to see the peg-based parsers run faster.

@masatake
Copy link
Member

I opened #4026. The good news is that I found a bug in pegof during work on #4026. We are enjoying open-source development now.

@dolik-rce
Copy link
Contributor Author

That's exactly why I said that pegof is far from being stable and reliable 🙂 Receiving input like this is very valuable and will make pegof better. So yes, I will enjoy it.

@masatake
Copy link
Member

Instead of merging the optimized version of the Kotolin grammar file, I integrated pegof into our build scripts. So I close this.

@dolik-rce, when you think pegof is stable enough, we can turn on pegof at https://github.com/universal-ctags/ctags-nightly-build from where we distribute ctags binary.

@masatake masatake closed this Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants