Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace cabal project parsing with Parsec #8889

Open
wants to merge 41 commits into
base: master
Choose a base branch
from

Conversation

jgotoh
Copy link
Member

@jgotoh jgotoh commented Apr 1, 2023

Implements #6101, #7748. Finally ready for review!

Please include the following checklist in your PR:

  • Patches conform to the coding conventions.
  • Any changes that could be relevant to users have been recorded in the changelog.
  • The documentation has been updated, if necessary. (I don't think it's necessary, correct me if I'm wrong)
  • Manual QA notes have been included (not needed as this change is invisible to users)
  • Tests have been added. (Ask for help if you don’t know how to write them! Ask for an exemption if tests are too complex for too little coverage!)

In this PR I implemented #6101 to replace the legacy cabal.project parser (module Distribution.Client.ProjectConfig.Legacy) with an implementation based on Parsec.
My implementation is based heavily on the existing Parsec parser of .cabal files, see module Distribution.PackageDescription.Parsec.
My goal was to recreate the exact grammar the Legacy Parser parses using the modern Parsec framework with FieldGrammars etc.
This means the Legacy Parser and Parsec Parser should always return the same ProjectConfig value for a file (at least in this PR).

About CI:
All of the validation checks of Ubuntu are running, unfortunately windows-latest ghc-9.10.1 fails, but windows ghc-9.8.2 succeeds.

The PR consists of several main parts:

Main Entrypoint Distribution.Client.ProjectConfig.readProjectFileSkeleton
The main entrypoint into the Parsec parser is in function readProjectFileSkeleton in Distribution.Client.ProjectConfig and parses a cabal.project file.
Note that the legacy parser can be executed still via readProjectFileSkeletonLegacy in the same file, I want to remove it but in another ticket because this PR is huge already.
My current implementation of readProjectFileSkeleton also calls readProjectFileSkeletonLegacy to compare the Parsec value to the legacy value, throwing errors if they are not equal.
Currently I use it to verify that the parsers' grammars match.
I will delete this functionality when the PR is ready to merge, but for now I will let the functionality stay in the PR so reviewers can try to build other projects to verify the parsers' values to not diverge.
Note that because .project files are parsed by both the Legacy parser and the Parsec parser, all warnings are currently emitted twice - once by each parser.

The function readProjectFileSkeleton currently uses an old variant of readAndParseFile from Cabal/src/Distribution/Simple/PackageDescription.hs (new variant is here) that is not based on SymbolicPaths yet but uses FilePath instead.
If we want to migrate it to use SymbolicPaths I need some help here :) Maybe this should be part of a future ticket too.

readProjectFileSkeleton calls Distribution.Client.ProjectConfig.Parsec.parseProject which leads me to the next main part:

Module Distribution.Client.ProjectConfig.Parsec

This module contains the Parsec parser of ProjectConfigs.
Function parseProject is a copy of function parseProject of the Legacy parser (Distribution.Client.ProjectConfig.Legacy.parseProject) but it now calls the Parsec version of parseProjectSkeleton.

parseProjectSkeleton
parseProjectSkeleton is a port of function parseProjectSkeleton of module Distribution.Client.ProjectConfig.Legacy.
It does not use the deprecated type Field from Distribution.Deprecated.ParseUtils anymore but Distribution.Fields.Field instead.
I tried to change as little as possible here, so we still parse the fields/sections into ProjectConfigs, import and parse other cabal.project files and process the conditional structure.

An interesting bit includes the new processing of imports (Fields with name equaling "import"):
We need to use liftPR to be able to compose ParseResults that involve IO actions (for example downloading imported cabal.project files via HTTP).
Composing two actions involves executing a ParseResult resulting in a PRState and executing another ParseResult passing in the previous PRState.
Unfortunately the PRState that is generated when executing a ParseResult does not contain the file source where the warnings/errors came from.
So we need to print any warnings/errors that came up when parsing an imported file before we return the ParseResult, otherwise we lose the source file of the warnings/errors.
I added the implementation of liftPR for Parsec ParseResults to module Distribution.Fields.ParseResult because I needed its constructor.
Please pay special attention to reviewing this implementation, as it was quite complex to develop.
It works in the tests :)

Another interesting bit is function fieldsToConfig in parseProjectSkeleton. Here we produce ProjectConfig values by parsing the current fields with a FieldGrammar. Afterwards we parse the sections (such as source-repository-package, ...) with function goSections.

FieldGrammar
The ProjectConfig FieldGrammar is defined in module Distribution.Client.ProjectConfig.FieldGrammar.
It took me some while to reverse engineer all the possible field names and find out their named field equivalents in the ProjectConfig record, but it should be complete now.
Parsing of sections such as source-repository-package is not done in here, see the next point below.
There are some fields such as projectConfigDryRun, projectConfigOnlyDeps that afaik can not be specified in a cabal.project file and need to be passed in via command line flags.
They are marked by comments in the following form: -- cli flag: projectConfigDryRun.
I also needed to add some Parsec instances that I added to the modules defining the respective type.
For example the Parsec instance of OptimisationLevel is in module Distribution.Simple.Compiler directly below the type definition.

Parsing of Sections
Parsing of sections happens in a StateT monad (SectionParser) modifying the ProjectConfig we got out of the FieldGrammar.
For the SectionParser I took a lot of inspiration again from the parser of .cabal files in module Distribution.PackageDescription.Parsec, see type SectionParser.
Currently I need to implement parsing of repository sections here as I've missed it until now but I hope it is the last type of section that is missing.

Integration Tests in cabal-testsuite/PackageTests/ProjectConfig/Parsec/cabal.test.hs
A suite of integration tests.
Parsing values and comparing them to expectations.
Note that I also run the legacy parser here to make sure my expected values do not differ from the non-Parsec implementation, this will be removed in the future.

Furthermore, I've tested the implementation by successfully building the following Haskell repositories: http2, cabal, tls, hashable, hspec, ghc-lib-parser, texmath, text, lens, megaparsec.

Future PRs
There are also other aspects that I want to address, but I want to do this in future PRs because the current one is too big already:

  • Add WarningTests/ErrorTests similar to Cabal-tests/tests/ParserTests.warningTests to test the correct output of warnings/errors of the Parsec parser
  • Use treeDiff in the integration tests to compare the parsed values with expectations for easier identification of differences
  • Remove unused code from Distribution/Client/ProjectConfig/Legacy.hs

@jgotoh
Copy link
Member Author

jgotoh commented Apr 1, 2023

@ulysses4ever as announced, here is an early PR of the changes. Still lots of work, but I have a rough outline of the ticket. I would be glad if you would take a look!

@ulysses4ever
Copy link
Collaborator

Thanks @jgotoh ! Will take a look eventually.

@gbaz you know this code better than anyone currently active, i think. Would you be able to provide guidance?

@jgotoh jgotoh force-pushed the parse-cabal-project-parsec branch 5 times, most recently from 402f607 to 9016a3a Compare April 3, 2023 06:55
@ulysses4ever
Copy link
Collaborator

@grayjay I know you’re more of a solver person, but if you happen to have some time for advising and feel comfortable in this part of the code, your contribution would be priceless.

@ulysses4ever
Copy link
Collaborator

@jgotoh there's a bi-weekly cabal devs meeting, where, I am sure, people would be delighted to hear your experience so far. The closest one is this Thursday (Apr 20th), 1 PM Eastern Time (US). The link to a Jitsi video call is posted before the meeting on #hackage at libera.chat (can be browsed using Element/Matrix or an IRC client). Are you interested?

@jgotoh
Copy link
Member Author

jgotoh commented Apr 19, 2023

@ulysses4ever Thank you very much for you invitation! I've already wondered about the cabal devs main communication channel. I am definitely interested and will be glad to attend :)

@ulysses4ever
Copy link
Collaborator

@jgotoh cool! let me know if you want me to mail you the jitsi link beforehand.

@grayjay
Copy link
Collaborator

grayjay commented Apr 20, 2023

@grayjay I know you’re more of a solver person, but if you happen to have some time for advising and feel comfortable in this part of the code, your contribution would be priceless.

I'm not familiar with this part of the code, but I tried to answer a couple questions based on my understanding of the parser behavior.

@jgotoh jgotoh force-pushed the parse-cabal-project-parsec branch from c5f7509 to 5fd9791 Compare May 20, 2023 15:31
@andreabedini
Copy link
Collaborator

@jgotoh Is there anyway I can help you with this?

@jgotoh
Copy link
Member Author

jgotoh commented May 30, 2023

@andreabedini many thanks for your offer! I will push some changes this week adding ProjectConfig.projectConfigBuildOnly. This means around 50% of parsing of ProjectConfigs is done then.
What is missing afterwards is parsing Conditionals into the CondTree structure. I've concentrated more on ProjectConfigs for now, but if you have some spare time you could support me here! Also, I will talk about the current state of the PR in the next dev meeting this thursday, you can join if you want :) see this comment: #8889 (comment)

Also ongoing small reviews of the code I push are helpful if you spot anything.

@andreabedini
Copy link
Collaborator

Also, I will talk about the current state of the PR in the next dev meeting this thursday, you can join if you want :)

Thank you for the invite, but unfortunately that would be 3:00 am here 😄 😞 I might just go through the PR in my own time and leave some comment (if there's anything I can comment on).

@jgotoh jgotoh force-pushed the parse-cabal-project-parsec branch from 96ef42c to e671cb1 Compare June 3, 2023 18:44
parsec = parsecNumJobs

parsecNumJobs :: CabalParsing m => m NumJobs
parsecNumJobs = ncpus <|> numJobs
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Distribution.Setup.Simple.Common.numJobsParser for original, non-parsec parser

@jgotoh jgotoh force-pushed the parse-cabal-project-parsec branch from 9e0127d to c75f6e2 Compare August 5, 2023 17:03
jgotoh added 26 commits January 10, 2025 22:40
Expects Invalid subsection warning by the Parsec ProjectConfig parser.
Currently the warning is issued twice:
First warning: "Warning: <ROOT>/else.project, else.project: Unrecognized
section '_' on line 3" is issued by the legacy parser,
Second warning: "Warning: dir-else/else.config:3:5: Invalid subsection
"_"" by Parsec Parser.
When we remove executing the legacy parser, we can remove the duplicate warning.
@jgotoh jgotoh force-pushed the parse-cabal-project-parsec branch from 327e618 to eededa4 Compare January 10, 2025 21:40
@philderbeast
Copy link
Collaborator

My current implementation of readProjectFileSkeleton also calls readProjectFileSkeletonLegacy to compare the Parsec value to the legacy value, throwing errors if they are not equal.

That is good for a development check. Are you able to put the comparison and the call to the legacy function behind a flag in cabal-install/cabal-install.cabal? Many of the tests in cabal-testsuite will assertOutputContains and these tests will find what they're looking for if still output by readProjectFileSkeletonLegacy.

I ran a check of this with a recent test from #10629. The test passes but then I commented out these lines and it fails:

$ git diff
diff --git a/cabal-install/src/Distribution/Client/ProjectConfig.hs b/cabal-install/src/Distribution/Client/ProjectConfig.hs
index 29bcb7605..df111de17 100644
--- a/cabal-install/src/Distribution/Client/ProjectConfig.hs
+++ b/cabal-install/src/Distribution/Client/ProjectConfig.hs
@@ -838,14 +838,14 @@ readProjectFileSkeleton
   dir@DistDirLayout{distProjectFile, distDownloadSrcDirectory}
   extensionName
   extensionDescription = do
-    legacyPcs <- readProjectFileSkeletonLegacy verbosity httpTransport dir extensionName extensionDescription
+    -- legacyPcs <- readProjectFileSkeletonLegacy verbosity httpTransport dir extensionName extensionDescription
     exists <- liftIO $ doesFileExist extensionFile
     if exists
       then do
         monitorFiles [monitorFileHashed extensionFile]
         pcs <- liftIO $ readExtensionFile verbosity extensionFile
         monitorFiles $ map monitorFileHashed (projectConfigPathRoot <$> projectSkeletonImports pcs)
-        unless (legacyPcs == pcs) (error (show callStack ++ "\nParsec: " ++ show pcs ++ "\nLegacy: " ++ show legacyPcs))
+        -- unless (legacyPcs == pcs) (error (show callStack ++ "\nParsec: " ++ show pcs ++ "\nLegacy: " ++ show legacyPcs))
         pure pcs
$ cabal run cabal-testsuite:cabal-tests -- --with-cabal=./dist-newstyle/build/x86_64-linux/ghc-9.10.1/cabal-install-3.15.0.0/x/cabal/build/cabal/cabal cabal-testsuite/PackageTests/ProjectImport/UntrimmedImport/cabal.test.hs
...
# STDERR:
cabal.test.hs: expected:
import has leading or trailing whitespace
CallStack (from HasCallStack):
  assertOn, called at src/Test/Cabal/Prelude.hs:809:24 in cabal-testsuite-3-inplace:Test.Cabal.Prelude
  assertOutputContains, called at cabal-testsuite/PackageTests/ProjectImport/UntrimmedImport/cabal.test.hs:8:3 in main:Main

@philderbeast
Copy link
Collaborator

Unfortunately the PRState that is generated when executing a ParseResult does not contain the file source where the warnings/errors came from.

With #10644, I'm using a ProjectParseResult for retaining the ProjectConfigPath of a parse error or parse warnings and that pull request includes tests that these locations are reported.

fetch pci

fetch :: FilePath -> IO BS.ByteString
fetch pci = case parseURI pci of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing the trim of #10629.

fetch pci = case parseURI $ trim pci of

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants