Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable source dirs for file parsing #386

Open
MilesMcBain opened this issue Mar 9, 2021 · 15 comments
Open

Configurable source dirs for file parsing #386

MilesMcBain opened this issue Mar 9, 2021 · 15 comments

Comments

@MilesMcBain
Copy link

MilesMcBain commented Mar 9, 2021

It is a well known issue that the languageserver only parses files other than those that it is notified are opened in the case that it detects the workspace root is a package.
#253

This means that the way that workspace symbols and definitions work are not how many people expect and seem to deviate a little from the protocol specification: https://microsoft.github.io/language-server-protocol/specification#workspace_symbol.

There is also a related issue that, even in open files, source() is not understood as something that affects the workspace namespace:
https://github.com/REditorSupport/vscode-r-lsp/issues/65

I believe a pretty cheap solution to this is to allow users to configure directories that the server should parse E.g. options(languageserver.r.source.dirs = c("R", "scripts") would cause workspace$load_all() parse R files in those two top level folders, even if it cannot identify the project as a package.

Another possibly nicer alternative is for the option to work as an exclusion list, so parsing starts at "." and any paths containing folders with names defined in the option are ignored. But there seems to a lot of inertia behind the "no parse by default" position.

The implementation would be pretty cheap:

in:

on_initialized <- function(self, params) {
    logger$info("on_initialized")
    project_root <- self$rootPath
    if (length(project_root) && is_package(project_root)) {
        # a bit like devtools::load_all()
        self$workspace$load_all(self)
        # TODO: result lint result of the package
        # lint_result <- lintr::lint_package(rootPath)
    }

workspace$load_all could take a vector of paths to parse.

E.g. workspace$load_all(self, source_paths)

So then we'd just need to build the vector of paths which is whatever languageserver.r.source.dirs says unioned with "R" in the case that the project is a package.

source_paths <- union(
  getOption("languageserver.r.source.dirs", NULL),
  if (is_package(project_root)) "R" else NULL)
) 

The change to workspace$load_all is minimal too, since file.path and list.files are already vectorised:

        load_all = function(langserver, source_paths) {
            source_dirs <- file.path(self$root, source_paths)
            files <- list.files(source_dirs, pattern = "\\.r$", ignore.case = TRUE)
            for (f in files) {
                logger$info("load ", f)
                path <- file.path(source_dir, f)
                uri <- path_to_uri(path)
                doc <- Document$new(uri, NULL, stringi::stri_read_lines(path))
                self$documents$set(uri, doc)
                # TODO: move text_sync to Workspace!?
                langserver$text_sync(uri, document = doc, parse = TRUE)
            }
            self$import_from_namespace_file()
        },

I'd also argue for recursive = TRUE in list.files.

@renkun-ken
Copy link
Member

I think the idea of using an option to control the default sources makes good sense, given that I was trying to implement #382 to handle source() calls but soon find it much trickier than expected in the following perspectives:

  1. source() could be recursive. Handling source() by opening and closing documents in workspace requires that a dependency graph be well maintained, being aware of whether each document is open or not open but sourced by other open documents. Handling this on document change could be expensive, only handle source graph on document open and close might make more sense.
  2. If the source graph is maintained, then it might make more sense to provide completion and other features based on the document and all sources it depends on. In this way, we don't need to open and close documents in the workspace to control the scope of language providers.

With these thinking, handling source() in an sensible way is not easy. An option to control default sources could solve much of the problem.

@MilesMcBain
Copy link
Author

Great!

What do you think about having an exclusion list in addition to the source dirs list? It would allow something like:

options(languageserver.r.source.dirs = ".", languageserver.excluded.source.dirs = c("renv", ".drake")) etc.

A bit of fancy footwork somewhere to avoid parsing R folder twice in some cases, possibly just by making files to be parsed unique().

@renkun-ken
Copy link
Member

I guess an important use case is to specify a regex or glob pattern to choose a collection of files, both could be handled by fs::dir_ls().

@renkun-ken
Copy link
Member

Also, I find Sys.glob() useful to specify a glob pattern to specify a collection of files:

> Sys.glob("*.R")                                                                          
character(0)

> Sys.glob("**/*.R")                                                                       
 [1] "R/call_hierarchy.R"        "R/capabilities.R"          "R/color.R"                
 [4] "R/completion.R"            "R/definition.R"            "R/diagnostics.R"          
 [7] "R/document.R"              "R/folding.R"               "R/formatting.R"           
[10] "R/handlers-general.R"      "R/handlers-langfeatures.R" "R/handlers-textsync.R"    
[13] "R/handlers-workspace.R"    "R/highlight.R"             "R/hover.R"                
[16] "R/interfaces.R"            "R/languagebase.R"          "R/languageclient.R"       
[19] "R/languageserver.R"        "R/link.R"                  "R/log.R"                  
[22] "R/namespace.R"             "R/protocol.R"              "R/references.R"           
[25] "R/rename.R"                "R/selection.R"             "R/session.R"              
[28] "R/signature.R"             "R/symbol.R"                "R/task.R"                 
[31] "R/utils.R"                 "R/workspace.R"             "man-roxygen/document.R"   
[34] "man-roxygen/id.R"          "man-roxygen/position.R"    "man-roxygen/self.R"       
[37] "man-roxygen/tdpp.R"        "man-roxygen/uri.R"         "man-roxygen/workspace.R"  
[40] "tests/testthat.R"         

> Sys.glob("R/*.R")                                                                        
 [1] "R/call_hierarchy.R"        "R/capabilities.R"          "R/color.R"                
 [4] "R/completion.R"            "R/definition.R"            "R/diagnostics.R"          
 [7] "R/document.R"              "R/folding.R"               "R/formatting.R"           
[10] "R/handlers-general.R"      "R/handlers-langfeatures.R" "R/handlers-textsync.R"    
[13] "R/handlers-workspace.R"    "R/highlight.R"             "R/hover.R"                
[16] "R/interfaces.R"            "R/languagebase.R"          "R/languageclient.R"       
[19] "R/languageserver.R"        "R/link.R"                  "R/log.R"                  
[22] "R/namespace.R"             "R/protocol.R"              "R/references.R"           
[25] "R/rename.R"                "R/selection.R"             "R/session.R"              
[28] "R/signature.R"             "R/symbol.R"                "R/task.R"                 
[31] "R/utils.R"                 "R/workspace.R"            

> Sys.glob(c("R/*.R", "tests/**/*.R"))                                                     
 [1] "R/call_hierarchy.R"                    "R/capabilities.R"                     
 [3] "R/color.R"                             "R/completion.R"                       
 [5] "R/definition.R"                        "R/diagnostics.R"                      
 [7] "R/document.R"                          "R/folding.R"                          
 [9] "R/formatting.R"                        "R/handlers-general.R"                 
[11] "R/handlers-langfeatures.R"             "R/handlers-textsync.R"                
[13] "R/handlers-workspace.R"                "R/highlight.R"                        
[15] "R/hover.R"                             "R/interfaces.R"                       
[17] "R/languagebase.R"                      "R/languageclient.R"                   
[19] "R/languageserver.R"                    "R/link.R"                             
[21] "R/log.R"                               "R/namespace.R"                        
[23] "R/protocol.R"                          "R/references.R"                       
[25] "R/rename.R"                            "R/selection.R"                        
[27] "R/session.R"                           "R/signature.R"                        
[29] "R/symbol.R"                            "R/task.R"                             
[31] "R/utils.R"                             "R/workspace.R"                        
[33] "tests/testthat/helper-utils.R"         "tests/testthat/test-call-hierarchy.R" 
[35] "tests/testthat/test-codeunits.R"       "tests/testthat/test-color.R"          
[37] "tests/testthat/test-completion.R"      "tests/testthat/test-definition.R"     
[39] "tests/testthat/test-folding.R"         "tests/testthat/test-formatting.R"     
[41] "tests/testthat/test-highlight.R"       "tests/testthat/test-hover.R"          
[43] "tests/testthat/test-langauagecilent.R" "tests/testthat/test-link.R"           
[45] "tests/testthat/test-lintr.R"           "tests/testthat/test-null-root.R"      
[47] "tests/testthat/test-references.R"      "tests/testthat/test-rename.R"         
[49] "tests/testthat/test-search.R"          "tests/testthat/test-selection.R"      
[51] "tests/testthat/test-signature.R"       "tests/testthat/test-stdio.R"          
[53] "tests/testthat/test-symbol.R"          "tests/testthat/test-tcp.R"            
[55] "tests/testthat/test-unicode-path.R"

> Sys.glob("**/language*.R")                                                               
[1] "R/languagebase.R"   "R/languageclient.R" "R/languageserver.R"

@MilesMcBain
Copy link
Author

Okay I quite like this. So the options could be globs? I'm going to have a go at an implementation and see how this feels.

@MilesMcBain
Copy link
Author

Not quite ready for a PR, but it seems to be working well: https://github.com/MilesMcBain/languageserver/tree/config-src-globs

@renkun-ken
Copy link
Member

Please feel free to send a PR anytime.

@sebffischer
Copy link

@MilesMcBain Do you intend to finish this PR? If not, what were the issues that you ran into?

@MilesMcBain
Copy link
Author

The main issue I had was writing tests for the code. The change itself wasn't too hard although code mat have drifted now.

@MilesMcBain
Copy link
Author

@sebffischer to answer the first part of the question for you, I'm not currently working on it and have no plans to pick it up.

@gowerc
Copy link

gowerc commented Feb 3, 2022

@randy3k & @renkun-ken , Just wanted to say it would be amazing if someone could pick this up again :( This limitation in particular causes pain when writing large packages with significant testthat/ code in vscode as it means you can't reliably depend on the reference finding when re-factoring code + tests.

Even if the full feature isn't implemented it would be great to have tests/ added as a default directory that is scanned alongside the R/ directory.

(obligatury thank you guys for your work, this package makes a huge difference to doing R based development)

@renkun-ken
Copy link
Member

renkun-ken commented Feb 3, 2022

I'd like to summarize what we need to consider for this issue:

  • Where should we configure this? options(), a standalone config file (e.g. .languageserver.yml), or an existing config file <project>.Rproj?
  • How should we configure this? Through a vector of regex or glob patterns? What about exclude?
  • When should we resolve this configuration? The regex or glob patterns should be resolved not only on language server startup, but also on each workspace file changes via workspace/didChangeWatchedFiles handled by workspace_did_change_watched_files, which includes creating/renaming/removing a file or directory in the workspace folder.

@gowerc
Copy link

gowerc commented Feb 3, 2022

I mean my personal preferance would be to go down the route similar to .gitignore where you just have a flat text file listing paths and use a ! prefix to indicate "not" i.e.

**/*.R
!R/xxx_*.R

@celbig
Copy link

celbig commented Jun 20, 2023

I'm digging up this issue because that's a feature that most (if not all) people of our research team want/need. The way our projects are organized makes it very difficult to use langagueserver in a useful way. I wrote an ugly fix for myself sometimes ago (pretty much recursively loading all R files in the workspace) but since several colleagues asked me for my hacky version of the package, I decided to clean it and make it publicly available.

I made the following choices for the implementation:

  • The user can add a list of folder to include through a new option languageserver.source_dirs which should be a list of folder's path
  • If languageserver.source_dirs = list() the previous behavior is conserved (the language serrver tries to detect if the workspace is a package)
  • For now, it is not possible to use globs
  • Internally, the server keeps a list of the files that are tracked (and not a list of the folders) so it would be easy to add the option for globs (through another option?)

I do have a working version in my fork at https://github.com/celbig/languageserver but no tests for now. If you are interested, I'm willing to do a proper PR with your suggestions and requirements. Considering tests, I'm not sure of what to test but I'm willing to implement whatever you find necessary.

I hope this can help enhancing this package since this feature seems to be requested regularly

@n8layman
Copy link

Just adding my voice to request this feature. Something like options(languageserver.r.source.dirs = c("R", "scripts") would be awesome. I honestly thought that was how it was supposed to work by default and spent an hour trying to troubleshoot why I couldn't see my Roxygen comments on hover or go to the definition without first opening the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants