Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for cross- or non-standard compilers? #138

Closed
madscientist opened this issue Nov 27, 2018 · 11 comments
Closed

Support for cross- or non-standard compilers? #138

madscientist opened this issue Nov 27, 2018 · 11 comments
Labels
question Further information is requested wontfix This will not be worked on

Comments

@madscientist
Copy link
Contributor

madscientist commented Nov 27, 2018

I use my own build of GCC to compile for my code. I use CMake and I have it generate compile_commands.json so ccls knows the complete command line. However, all the system C runtime and compiler header files are installed at a separate location and the local build of GCC knows where to find them. The compiler doesn't use any of the system libc headers or system compiler headers at all, ever.

Unfortunately ccls doesn't seem to understand this; after running if I look at the .ccls-cache/@@work@src directory for the system and compiler headers, they are all from the clang instance that ccls was built with and the system's /usr/include. I know for sure that those were not the C / C++ headers that my code is compiled with.

I'm not sure how ccls attempts to locate header files to index: I hoped it would ask the compiler itself where to look, either by parsing post-processed code or by getting the compiler to print the set of include paths it will use when searching for headers. Using these options ccls would locate the correct headers to index. However neither of those seem to be used.

Any thoughts?

@MaskRay
Copy link
Owner

MaskRay commented Nov 27, 2018

A mad scientist still needs CMake to generate Makefile :)

I use my own build of GCC to compile for my code. ... However, all the system C runtime and compiler header files are installed at a separate location and the local build of GCC knows where to find them

The gcc you use has a unconventional sysroot and system search paths.

Unfortunately ccls doesn't seem to understand this; after running if I look at the .ccls-cache/@@work@src directory for the system and compiler headers, they are all from the clang instance that ccls was built with and the system's /usr/include.

Right. ccls just calls clang::createInvocationFromCommandLine to infer system search paths (e.g. /usr/include). The underneath mechanism is similar to that of clang -v -E -x c++ /dev/null

std::unique_ptr<CompilerInvocation> clang::createInvocationFromCommandLine(
...
  // clangDriver which detects system search paths
  driver::Driver TheDriver(Args[0], llvm::sys::getDefaultTargetTriple(),
                           *Diags, VFS);
...
  // Similar as clang -cc1
  if (!CompilerInvocation::CreateFromArgs(...)

clangDriver has logic to detect system search paths as what a system gcc does, it may look like:

#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-linux-gnu/8.0.1/../../../../include/c++/8.0.1
 /usr/lib/gcc/x86_64-linux-gnu/8.0.1/../../../../include/x86_64-linux-gnu/c++/8.0.1
 /usr/lib/gcc/x86_64-linux-gnu/8.0.1/../../../../include/c++/8.0.1/backward
 /usr/local/include
 /home/ray/llvm/Release/lib/clang/8.0.0/include
 /usr/include/x86_64-linux-gnu
 /usr/include

For your local GCC, the first 3 directories should differ. You need to figure out the search path replacement in your GCC installation directory, and specify -isystem. To reliably disable system C++ search paths, you can probably pass -nostdinc++.

You can add extra flags by:

I've also written some tips on #125 You can also play with --gcc-toolchain=

@madscientist
Copy link
Contributor Author

If I didn't need to support Visual Studio and Xcode project files, I probably would just use a straightforward makefile system... I've definitely banged my head for hours against things in cmake that I could have written in 10 minutes in a makefile :).

The gcc you use has a unconventional sysroot and system search paths.

Yes, definitely. In fact I have a wrapper script I invoke, that sets up all the sysroot etc. arguments. That's why I was hoping ccls would query the actual compiler somehow (e.g. by running <compiler> -v -E -xc++ or whatever--may have to be customized for different compilers although that works with both GCC and clang) rather than using predefined values. It seems that by relying on clang::createInvocationFromCommandLine it limits, somewhat, the ability of ccls to understand alternate compilers and their locations.

The issue I was hitting is that by using compile_commands.json I don't have the ability to manipulate the directories. It's also quite inconvenient, in our environment, to try to "post process" the .json file generated by cmake. Also, our compiler and alternate sysroot are completely relocatable (you can copy the entire thing somewhere else and it works fine--in fact the compiler and sysroots are checked into a separate Git repository and cloned "somewhere" on the user's system) so I can't check in a pre-defined set of ccls flags: they need to be generated specifically for a given system. And finally, the invocation of ccls to do the indexing is sort of "hidden" because it's being used with Emacs and company mode. I'll look into seeing how to pass through options there.

I feel personally that to get the most accurate indexing with the least amount of effort from the user (and the fewest PEBCAK issues filed :) ) it would be great for ccls to query the compiler for include directories. I realize that's much more effort than simply asking the clang library for a set of directories. I don't know what your thoughts are on this.

Also it would be great if ccls could accept configuration in more flexible ways than just the command line (I'm afraid it wasn't exactly clear to me from the initialization option wiki page, whether that was already true or not): for example, from an environment variable or a ~/.cclsconfig file or something like that, in addition to directly on the command line (which may be difficult to configure behind front-ends). Perhaps that deserves a different issue.

Thanks for your reply!

@cmm
Copy link

cmm commented Nov 27, 2018

One potential problem with the "query the damn compiler for its default include path" is that bundled includes may in theory contain arbitrary compiler-specific magic that can give clang severe indigestion, if the compiler in question is not clang (admittedly gcc is fine in my experience, but perhaps I've just had good luck with gcc/clang version combinations).

@MaskRay
Copy link
Owner

MaskRay commented Nov 28, 2018

You may try specifying --gcc-toolchain= to your GCC installation directory, e.g. if you have /Driver/Inputs/mips_img_v2_tree/lib/gcc/mips-img-linux-gnu/4.9.2/include, use clang --gcc-toolchain=/Driver/Inputs/mips_img_v2_tree

The issue I was hitting is that by using compile_commands.json I don't have the ability to manipulate the directories.

You can post process compile_commands.json but it may be inconvenient. See the two initialization options clang.excludeArgs and clang.extraArgs. clang supports --sysroot=

And finally, the invocation of ccls to do the indexing is sort of "hidden" because it's being used with Emacs and company mode. I'll look into seeing how to pass through options there.

ccls-extra-init-params https://github.com/MaskRay/ccls/wiki/Emacs#initialization-options

it would be great for ccls to query the compiler for include directories. I realize that's much more effort than simply asking the clang library for a set of directories. I don't know what your thoughts are on this.

clang does all the heavylifting here. I try to avoid inventing any compiler driver logic in ccls as the extra work can often lead to loss of generality and get in the way.

What ccls does should not be very different from what the compiler driver clang or other clang tools such as c-index-test do:

c-index-test -index-file a.cc -resource-dir ~/llvm/Release/lib/clang/8.0.0 -isystem /xxx -I /yyy -cxx-isystem /cxx-xxx

If this command gives you index information, passing the same set of flags through .ccls or compile_commands.json should also be able to index the file in ccls, vice versa.

Also it would be great if ccls could accept configuration in more flexible ways than just the command line (I'm afraid it wasn't exactly clear to me from the initialization option wiki page, whether that was already true or not): for example, from an environment variable or a ~/.cclsconfig file or something like that, in addition to directly on the command line (which may be difficult to configure behind front-ends). Perhaps that deserves a different issue.

The customization is provided through LSP's initialization options, which are supposed to be sent by the language client to ccls (the language server). The command line option -init= can override initialization options. This is what I do everyday:

ccls -index ~/llvm -init='{"clang":{"extraArgs": ["--gcc-toolchain=/usr"]}}'

I don't think a ~/.cclsconfig or $XDG_CONFIG_HOME/ccls/config is very necessary as client-side customization + -init look sufficient to me. What priority should the file-based initialization options have? It may bring confusion. wiki/Emacs suggests making ccls a shell script:

#!/bin/zsh
#export CCLS_TRACEME=1 # if you want to debug ccls, stop it right after main() is called
#export LD_LIBRARY_PATH=~/llvm/Release/lib # if you link against shared library libLLVM.so, not statically
exec ~/ccls/Release/ccls --log-file=/tmp/ccls.log "$@"

@madscientist
Copy link
Contributor Author

Sorry for the lack of response, I got busy at work.

I spent most of yesterday trying to make this work First, note that --gcc-toolchain won't help. It will work to locate multiple installations of GCC in the modified layout used by RPM/DEB to install multiple compiler versions into /usr, but it fails to interpret a standard GCC layout obtained from building and installing GCC yourself (which we do) using its default directory structure. There are multiple SO questions, etc. about this over the years but no fix in clang. The only solution I've seen on SO is cobbling together a different directory layout using symlinks: clearly not something I want to do. I ran ccls under strace to see where it was looking, just to be sure, and it wasn't anything useful.

The only useful thing about --gcc-toolchain for me is that it keeps clangDriver from using the clang C++ headers; I couldn't find any other options that would prevent that. So then I hand-crafted a set of extraArgs that used the -isystem and -cxx-isystem options to add headers I cut and pasted from the output of -E -v. This seemed to work.

This is really painful though. I have multiple different worktrees and each one can use a different instance of the compiler, so I can't hardcode these paths in my Emacs config; they need to be set per-project.

What I'd like to do is create a general environment that I can publish for other people in my group to use, but as I mentioned all the compilers are relocatable so I can't use any hardcoded paths. Perhaps I'll need to write a somewhat sophisticated front-end script to ccls to try to manage all of this. Because of the way --init encodes all of the configuration in one single JSON string, and only one --init is allowed, the wrapper will have to require jq (if it's a shell script) or be written in Python or something so it can break open the incoming --init and add in these extra options.

I did try out cquery yesterday as well: it handled finding all the right headers for my compilers perfectly the first time without any special customizations, which is great. But, it seemed to require a lot more resources, which is not great.

So I'm still thinking about which to use.

Cheers!

@MaskRay
Copy link
Owner

MaskRay commented Dec 9, 2018

There are multiple SO questions, etc. about this over the years but no fix in clang.

You may forget these So questions and start from here https://github.com/llvm-mirror/clang/tree/master/lib/Driver/ToolChains/Gnu.cpp#L1697 They are intricate logic which emulates standard GCC installation on some selected platforms. The code explains more. CollectLibDirsAndTriples ScanLibDirForGCCTriple

Because of the way --init encodes all of the configuration in one single JSON string, and only one --init is allowed, the wrapper will have to require jq (if it's a shell script) or be written in Python or something so it can break open the incoming --init and add in these extra options.

This is a reasonable feature request. I pushed a commit to support multiple -init=.

Initialization options are applied (deserialized to the same object) in the following order:

* "initializationOptions" from client
* first -init=
* second -init=
* ...

Scalar options will be overridden but arrays will get concatenated, e.g.

ccls -log-file=/dev/stderr -index . -init='{"clang":{"extraArgs":["a"]}}' -init='{"clang":{"extraArgs":["b"]}}'

results in clang.extraArgs: ["a", "b"]

it handled finding all the right headers for my compilers perfectly the first time without any special customizations, which is great.

cquery/src/platform.cc uses reproc and spawns a process to collect search directories from -v -xc++ /dev/null. I don't like that approach and won't do the similar thing in ccls.

But, it seemed to require a lot more resources, which is not great.

For cquery, remember to set {cacheFormat: "msgpack"}, otherwise the default "json" is selected and it is very inefficient as a serialization format.

To have a fair comparison, cquery/src/command_line.cc:270 FreeUnusedMemory(); should be tweaked a bit (ccls counterpart is in src/pipeline.cc) as cquery uses an aggressive approach to call malloc_trim(0).

@madscientist
Copy link
Contributor Author

You may forget these So questions and start from here

I see. The problem is the hardcoded set of triples: our triple is x86_64-generic-linux-gnu which is not one of the ones predefined in Clang. it's a shame they decided use an explicit list of triples rather than using globbing matches, such as x86_64-*-linux-gnu. A hardcoded list can never be sufficient. If I go through our compiler installation and create x86_64-unknown-linux-gnu symlinks to point to each x86_64-generic-linux-gnu, then it works. I'm not sure I want to do that in general though.

For cquery, remember to set {cacheFormat: "msgpack"}

I thought I had done that, but I realized I used cacheformat instead of cacheFormat and it silently ignored the misspelling (another hazard of using JSON for configuration 😁) After I fixed that the situation is better for sure.

There's something strange about my ccls output: cquery caches almost twice as many files. Looking at the results, it seems like ccls is not caching most of my header files. It's very strange because it does cache the matching .cpp file, right in the same directory. In fact it only caches 67 of my 1631 header files. I see no mention of the missing files in the log output.

  • "initializationOptions" from client
  • first -init=
  • second -init=
  • ...

Awesome! I'm wondering: if things are added at the end and overridden by later values, shouldn't the options from the client come last rather than first? i would think that the arguments the server was started with should be overridden by the options provided by the client.

cquery/src/platform.cc uses reproc and spawns a process to collect search directories from -v -xc++ /dev/null. I don't like that approach and won't do the similar thing in ccls.

I absolutely understand your distaste.

Nevertheless, I fear that this will keep ccls unsuitable for a number of dev environments. Embedded development and a significant number of development teams (outside of FOSS) use non-default compilers. Even when not doing cross-compilation, for co-ordinated teams your choices are (a) allow everyone to use their own native compiler and deal with the broken compiles from people using different versions, (b) force everyone to use identical OS distributions/versions so they all have the same native compiler, or (c) provide a separate compiler outside of the default locations so everyone can use the same one. By far the least amount of hassle is (c) and that's how all the development shops I've worked at do things. As above, it appears that if you're lucky and your environment matches something Clang expects you can configure things with only a few extra arguments (--gcc-toolchain and --sysroot), but if not you've got a long road ahead.

Anyway, thanks for your efforts on both ccls and cquery; it's great to have good indexing for C/C++ projects!

@MaskRay
Copy link
Owner

MaskRay commented Dec 10, 2018

shouldn't the options from the client come last rather than first?

For editors that are not as flexible as Emacs is: no buffer-local variables or dynamic scoping, it isn't easy to tweak client-side configuration and the shell wrapper responsible for starting the language server has the more flexibility here. I think it makes sense to let the flexible one (-init) overrides the inflexible one ("initializationOptions").

Embedded development and a significant number of development teams (outside of FOSS) use non-default compilers. Even when not doing cross-compilation, for co-ordinated teams your choices are (a) allow everyone to use their own native compiler and deal with the broken compiles from people using different versions, (b) force everyone to use identical OS distributions/versions so they all have the same native compiler, or (c) provide a separate compiler outside of the default locations so everyone can use the same one.

Thanks for elaborating on the complexity.

This run-clang-v-to-get-search-directories approach is not bulletproof and has caused many issues before.

cquery/third_party/reproc/src/c/posix/fork.c:109 execvp(argv[0], (char **) argv);

I haven't tried it by myself, but if I understand it correctly, argv[0] cannot be different from the executable name. This may break a use scenario as clang can infer the target/triple from argv[0] (you can create a symlink powerpc64le-linux-gnu-clang pointing to clang). #107

There are also users who don't code the full pathname of the compiler driver and rely on PATH when compiling (some virtualenv-like scripts). I wonder if this approach would still work.

clang has provided good emulation of gcc but nuance is unavoidable. Users have to fight with command line options if they want to use different versions of GCC or switch to clang. clangDriver is a gigantic thing, ycmd has probably done some crazy twiddling on their side, I think, doing the least and handing over everything to clangDriver is the cleanest/most robust approach. Adding -isystem to clang.extraArgs should be acceptable.

@MaskRay
Copy link
Owner

MaskRay commented Dec 10, 2018

In fact it only caches 67 of my 1631 header files. I see no mention of the missing files in the log output.

Header files are not compiled on their own. For header files that are not included, they may not get indexes. Also see #72 (comment)

@madscientist
Copy link
Contributor Author

Header files are not compiled on their own.

That's good info, thanks. But these headers are definitely included.

I found the issue although I'm not sure why it happens. I had added a --include=Foo.h to the clang.extraArgs to force the inclusion of a certain header file before anything else happens. However, with that argument it seems most of the indexing is failing but there are no error messages in the output or in the log. I've tried with full pathnames, omitting the =, etc. and always the same. The .blob for the source files is very small and doesn't include much past a basic list of top-level headers: no information about any content defined in the file.

If I remove that option then I get a much more robust output which seems to include everything (and takes a lot more effort to generate 😁)

@Riatre
Copy link
Contributor

Riatre commented Dec 10, 2018

That said, I think writing your own hack-y wrapper doing include path detection then add to compile_commands/init options in your favorite scripting language might be a good idea for this use case: fits the need; could be simple as the compilers are under your control; won't be a surprise to user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants