Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve precision of tag analysis #40

Merged
merged 31 commits into from
Sep 16, 2018
Merged

Improve precision of tag analysis #40

merged 31 commits into from
Sep 16, 2018

Conversation

sergv
Copy link
Collaborator

@sergv sergv commented Sep 16, 2018

This PR contains many small things that I've been collecting for some time now. Some errors are no longer detected, a few new things (like type families or GADT-like newtypes (I was surprised, too)) are. Some functions I just tried to optimise. Not sure how to concisely summarise what was done - the best way to summarise is probaby to look at new test cases.

One significant improvement is to do with tracking of quasiquoters. Now lexer is smarter and tries to not confuse list comprehensions like [foo|foo<-xs] for quasiquoters by looking whether part of file after [foo| contains closing quasiquoter bracket |]. If it doesn't then we report a list comprehension and on some files start actually detecting the rest of the definitions.

One thing that does not fall under any of the improvements mentioned in the previous paragraphs but which I should nonetheless mention is addition of parent field to each tag. It does not appear in the output, so it should be safe w.r.t. backwards compatibility, but is very useful for using fast-tags package as a library to index some sources. I reckon it would not slow anything down since it's just a text field that gets populated when we now an entity (e.g. class, type declaration or type family) may have some related entities (children).

sergv added 29 commits August 27, 2018 11:47
…ter starts that could be confused with list comprehensions with greater care
…present. Support more crazy bracket-based layouts when expanding semicolons
…ine strings with 0 indentation. Simplify string lexing rules a bit
I'm just as surprised to find out that they do actually exist as you
currently are.
@elaforge
Copy link
Owner

Nice work! The tests for all the fixes are much appreciated. I tested against my codebase, and it's gotten a bit more accurate. Speed seems to be roughly the same.

It looks like CI is unhappy because in src/FastTags/LexerTypes.hs you need to explicitly import mempty for base with ghc 7.8.4. I think you can just put that in, maybe behind a version guard to avoid redundant import warnings. After that I think you should be able to merge yourself, so go right ahead. Or I could push the button, it's all the same to me :)

Unless you have some other changes in mind, I'll then bump the version and upload to hackage.

@elaforge
Copy link
Owner

Just out of curiosity, what do you use the parent field for?

@sergv
Copy link
Collaborator Author

sergv commented Sep 16, 2018

I've fixed the 7.8. Please push the button :)

I'm using parent field for alternative tagging approach where a server sits in background, tokenises all files and tracks all module headers/exports/imports/etc. Thus seach queries get analysed in specific import context and this hopefully reduces number of ambiguous results (e.g. trying to look up insert will surely produce a lot of candidates, yet GHC known how to resolve that - so would the server (hopefully)).

I'm developing the server at https://github.com/sergv/haskell-tags-server. It is editor agnostic but I'm still struggling to pick an interactino protocol/protocols that would make all editors reasonably happy. For now I'm using BERT but stil haven't produced a decent interaction with Emacs. Actually, if you have any ideas about which protocol would be reasonable or easy to support in Vim that would be great, since as a primarily non-vimer I have no clue.

@elaforge
Copy link
Owner

The tags server idea is interesting. I have solved the same problem with the --fully-qualified tag, and it works well for my project. But that's only because I almost always import qualified, and I almost never re-export. To deal with re-exports I would have to track imports and exports as your tags server does, and if I did that I might be on the road for supported unqualified imports... but if you have a persistent server which already does that then I should give it a try, as soon as it's ready for that.

I don't know about protocols, I don't use vim plugins so I don't know if there's a consensus. I gather than neovim added a msgpack API and does plugins via that, so if you are ok with targeting neovim then that is probably the way to go.

The other thing I'm thinking about is how to keep tags fast (so therefore incremental) across git checkouts. Currently I rebuild tags from scratch after every checkout and it's ok for 150k lines, but probably will get quite annoying at 1 million. I have some incomplete ideas about a global tags for the pristine branch, and then an overlay for local changes, but any more complete ideas you might have would be welcome!

@elaforge elaforge merged commit cefab1e into elaforge:master Sep 16, 2018
@sergv
Copy link
Collaborator Author

sergv commented Sep 17, 2018

That's interesting about tagging 1 million lines of code. Incremental update should likely work on a per file basis, regenerating only those files that changed on git checkout. Otherwise there's just too many files and lexing/analysis in fast tags could only be so fast (maybe it could be improved 2x, but 10x will be very tough).

A background server could help: it would listen for modifications via file notification and take action in the background as soon as anything changes. Provided you won't immediately need correct tags after performing a git checkout, there will be time to update index while you're e.g. looking for a file you need.

@elaforge
Copy link
Owner

I was thinking of creating tags as part of the build process, and then you reuse them for unchanged builds in the same way you reuse the library .so or .a files. But there would have to be an intermediate combination, since I don't think vim could efficiently scan hundreds of tags files. But that could be another build target, e.g. given a hackage snapshot, build merged tags of the whole thing and then you just put that file in your search path.

The tricky thing would be to keep track of the changes and tracking across git pulls and branch checkouts. I very frequently want some notion of generated output in source control which is still linked to commit... here is yet another use for that! Otherwise, this is just the same problem as wanting to build anything, except that build systems are generally too high latency to ask for an update every single time you want to follow a tag!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants