Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add context to better locate runtime coercions #6204

Merged
merged 44 commits into from
Jan 2, 2023

Conversation

layus
Copy link
Member

@layus layus commented Mar 4, 2022

Add tons of helpful error messages in locations with poor error reporting.
This includes mostly type errors. As these are runtime errors, the precise location of the faulty value is usually lost at that time. This translates textually the type expectations (requirements ?) hard-coded in the sources.

TODOs:

  • Also annotate coerceToString
  • Find and annotate remaining functions bearing expectations on the values. (I guess this is good enough)
  • Wrap some function calls with debug info

Fixes #6200 #6191 #6031

@layus
Copy link
Member Author

layus commented Mar 4, 2022

Have fun by starting with ./outputs/out/bin/nix-instantiate --eval --expr 'builtins.genericClosure false' and ending with ./outputs/out/bin/nix-instantiate --eval --expr 'builtins.sort (a: b: a.key < b.key) (builtins.genericClosure {startSet = [{key = 1;}]; operator = x: if x.key < 5 then [{key = x.key + 1;} {key = x.key * 2;}] else [];})' by trying to trigger all the possible type errors in between :-)

@layus layus marked this pull request as ready for review March 4, 2022 21:55
@layus
Copy link
Member Author

layus commented Mar 5, 2022

image

🎉

@layus
Copy link
Member Author

layus commented Mar 5, 2022

More examples: https://maudoux.be/changes.html

@layus
Copy link
Member Author

layus commented Mar 7, 2022

Well, it has some impact on performance, but that remains under control:

This PR:
nixos.smallContainer.time     0.762153  0.975931  0.736349  0.775949
nixos.kde.time                1.44574   1.6254    1.48982   1.59174
nixos.lapp.time               1.1987    1.3634    1.19679   1.30787
nix-env.qa.time               6.18346   6.71418   6.32466   6.43977
nix-env.qaDrv.time            44.1625   50.7246   53.6632   46.6902
nix-env.qaAggressive.time     6.31606   6.67477   9.45704   6.60786
nix-env.qaDrvAggressive.time  44.3691   48.075    60.2891   45.648

Baseline:
nixos.smallContainer.time     0.71367  0.705955  0.71822  0.715239
nixos.kde.time                1.37252  1.40253   1.47962  1.38685
nixos.lapp.time               1.12598  1.14188   1.22673  1.12473
nix-env.qa.time               6.12947  6.16216   6.56867  6.22066
nix-env.qaDrv.time            42.1374  42.1028   54.5509  42.0571
nix-env.qaAggressive.time     6.79885  6.00412   9.02374  6.36788
nix-env.qaDrvAggressive.time  45.5595  42.1364   55.0709  42.3918

output

@Ericson2314
Copy link
Member

Do you want to update TODOs, mark this draft, or save things for future PRs?

@Ericson2314
Copy link
Member

Did you consider using the existing call stack stuff?

Relatedly, it would be nice to use the positions of the subterm that is trying to be casted. But I see that was discussed at #6200

I would not be opposed to merging this as-is, and circling back to these issues, but others may disagree.

@layus
Copy link
Member Author

layus commented Mar 8, 2022

Do you want to update TODOs, mark this draft, or save things for future PRs?

I guess this is as good as I can achieve in reasonable time.

Did you consider using the existing call stack stuff?

Well, first I thought it would not work. Now I am not too sure anymore. It should be feasible, but it requires shifting around several pos and rework how we handle them. It is a bit more involved, and the try/catch syntax adds lots of clutter. Maybe recent C++ lambdas may alleviate this issue. Apparently the try{}catch may have zero cost when not used. That would definitely be the right way forward if it is the case.
So, yes, I think it could be superior if well used. It would certainly avoid mismatch between error messages and positions, as we have now.

That being said, the current version has the advantage that it requires any user of these functions to provide an explanation string, which ensures that we do not inadvertently omit them.

Relatedly, it would be nice to use the positions of the subterm that is trying to be casted. But I see that was discussed at #6200

It is extremely complex in nix, as often there is no good position for values. We do keep some positions for the keys of attribute sets, and sometimes for functions, when feasible.

For example

let
  list = [ 1 "a" {attr="value";} ];
  fun = builtins.map (x: x+1);
  afterList = builtins.seq list;
  getThird = l: builtins.elemAt l 2;
in
  afterList (getThird (fun list ++ [3]))

fails with

   1 error: cannot coerce a set to a string
   2 
   3        at /home/layus/projects/nix/complex.nix:4:26:
   4 
   5             3|   list = [ 1 "a" {attr="value";} ];
   6             4|   fun = builtins.map (x: x+1);
   7              |                          ^
   8             5|   afterList = builtins.seq list;
   9 
  10        … while evaluating anonymous lambda
  11 
  12        at /home/layus/projects/nix/complex.nix:4:23:
  13 
  14             3|   list = [ 1 "a" {attr="value";} ];
  15             4|   fun = builtins.map (x: x+1);
  16              |                       ^
  17             5|   afterList = builtins.seq list;
  18 
  19        … from call site
  20 
  21        … while evaluating 'getThird'
  22 
  23        at /home/layus/projects/nix/complex.nix:6:14:
  24 
  25             5|   afterList = builtins.seq list;
  26             6|   getThird = l: builtins.elemAt l 2;
  27              |              ^
  28             7| in
  29 
  30        … from call site
  31 
  32        at /home/layus/projects/nix/complex.nix:8:14:
  33 
  34             7| in
  35             8|   afterList (getThird (fun list ++ [3]))
  36              |              ^
  37             9|

And nowhere is there any mention of the list itself, nor the value that failed to be a proper int. And that would be difficult. Would you point list ++ [3], list = [ 1 "a" {attr="value";} ], [ 1 "a" {attr="value";} ], {attr="value";}, or something else ?

💡 One thing we could do is print the fautly values to help users get a ffeling of what happens.

I would not be opposed to merging this as-is, and circling back to these issues, but others may disagree.

I think it is ready to merge. Better merge early and improve often :-).

@Ericson2314
Copy link
Member

So, yes, I think it could be superior if well used. It would certainly avoid mismatch between error messages and positions, as we have now.

That being said, the current version has the advantage that it requires any user of these functions to provide an explanation string, which ensures that we do not inadvertently omit them.

Yeah, I think the cast functions should take the argument, and do the try-catch themselves on behalf of the caller. That still ensures one doesn't forget to provide a reason for the cast!

@Ericson2314
Copy link
Member

Very nice looking!

@layus
Copy link
Member Author

layus commented Mar 18, 2022

Except for painful conflicts... I guess this will have to wait the hackathon tomorrow.

@layus
Copy link
Member Author

layus commented Mar 18, 2022

@Ericson2314 Not sure I will be able to join the hacking session, but if you could find some time to review the changes, that would be much appreciated. I am quite happy with the result, but I lack real tests, and there are so many changes that typos can always slip in.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-26/18252/1

This caused traces 'at «none»:0: (source not available)'.
@edolstra edolstra merged commit a75b7ba into NixOS:master Jan 2, 2023
@edolstra
Copy link
Member

edolstra commented Jan 2, 2023

@layus Merged, thanks!

roberth added a commit to hercules-ci/nix that referenced this pull request Jan 18, 2023
This reverts commit a75b7ba, reversing
changes made to 9af16c5.
roberth added a commit that referenced this pull request Jan 18, 2023
Revert #6204 to fix regression, add nixpkgs/lib/tests as regression test
github-actions bot pushed a commit that referenced this pull request Jan 18, 2023
This reverts commit a75b7ba, reversing
changes made to 9af16c5.

(cherry picked from commit 9b33ef3)
edolstra added a commit that referenced this pull request Jan 18, 2023
[Backport 2.13-maintenance] Revert #6204 to fix regression, add nixpkgs/lib/tests as regression test
@roberth
Copy link
Member

roberth commented Jan 18, 2023

Unfortunately this PR has caused a regression in 2.13.0.

Symptoms include infinite recursions and some sort of regression in the module system.
I've tried to find the root cause by

  • bisecting, but the git history of this change isn't conducive to it
  • reading through the change for two potential causes
    • accidentally forcing more evaluation than necessary
    • memory corruption

I haven't managed to find it, so for the time being, I've reverted this change.
Any help would be much appreciated, as we'd like to have this change back, both because it is a very welcome change and because other PRs would not to have conflicts resolved if we don't manage to merge this soon.

@Ericson2314
Copy link
Member

Do we have any test cases for these issues? I think a PR with the "revert of the revert" + some failing tests would be a good starting point, and then hopefully @layus or another person can pick up the baton from there.

@layus
Copy link
Member Author

layus commented Jan 19, 2023

Hi, having a look right now. There are plenty of potential fixes. Finding failing tests would be tremendously useful indeed. Any help welcome. For now I am focusing on reproducing the nixpkgs-lib-tests failures.

@tomberek
Copy link
Contributor

tomberek commented Jan 19, 2023

@layus so far I have not seen a behavioral regression, only a change (and improvement in the error message) in the error message output. I added a reproducer to more quickly try out Nix versions: #7621 (comment)

@layus
Copy link
Member Author

layus commented Jan 19, 2023

So first there is indeed a change where I introduced "frame" traces (pardon the naming here). Think of them as more important trace points, like function calls and builtins.derivation calls.
This explains why the addErrorTrace thing is invisible: it happens after a function call in the trace. I have confirmed that removing this notion of frame traces fixes the nixpkgs tests errors. There are two solutions

  1. disable frame traces
  2. make addErrorTrace a frame trace.

I think 2 is better because we keep the terseness of the output, and it even makes sense, as a message that a human took time to write for debugging purposes deserves to be printed in the end :-).
Working on it.

@tomberek
Copy link
Contributor

Agreed, 2 is likely to be a better match for user expectations.

@layus
Copy link
Member Author

layus commented Jan 19, 2023

Ok, so the fix is here: layus@65c2c94
Working on a merge/rebase to master right now

layus added a commit to layus/nix that referenced this pull request Jan 19, 2023
@layus layus mentioned this pull request Jan 19, 2023
3 tasks
iFreilicht pushed a commit to iFreilicht/nix that referenced this pull request Jan 28, 2023
This reverts commit a75b7ba, reversing
changes made to 9af16c5.
iFreilicht pushed a commit to iFreilicht/nix that referenced this pull request Jan 28, 2023
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-43/25185/1

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-team-report-2022-10-2023-03/27486/1

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-06-22-nix-team-meeting-minutes-65/29643/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error-messages Confusing messages and better diagnostics
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Improve location reporting for type and type coercion errors
10 participants