Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reenable Windows CI build with Artifactory support #4596

Merged
merged 37 commits into from
Oct 9, 2023

Conversation

ximinez
Copy link
Collaborator

@ximinez ximinez commented Jun 28, 2023

High Level Overview of Change

#4556 added Artifactory support to the nix builds. This extends that support to the Windows build

The good news is that the build now works. The bad news is that it introduces a new bottleneck, since the build and test takes over an hour.

Context of Change

Our Windows CI build has been disabled for a while because it was unable to successfully build Conan dependencies. Now that it can download pre-built dependencies from Artificatory, we can put it back in play.

Right now, the job is running on Github's limited hardware. I hope that we can get self-hosted Windows runners available soon. If they become available before this PR is merged, I'll push a trivial commit changing the runs-on: line.

Before / After

Before: No Windows CI jobs enabled.
After: CI will build and test a Windows release build.

Test Plan

No C++ code was changed. This only affects CI, so there is nothing to test.

Future Tasks

Making this job run on heavy Windows runners, so it doesn't take so dang long.

* Use github concurrency management
* Update to get more in sync with nix.yml
  * Don't bother caching Conan
  * Add support for Debug job (disabled for now)
@ximinez ximinez changed the title Reenable Windows build with Artifactory support Reenable Windows CI build with Artifactory support Jun 28, 2023
@ximinez ximinez requested a review from legleux June 29, 2023 15:46
@ximinez ximinez added the CI Continuous Integration Functionality label Jun 29, 2023
ximinez added 2 commits June 29, 2023 11:54
* upstream/develop:
  APIv2: add error messages for account_tx (4571)
* upstream/develop:
  ci: cancel overridden workflows (4597)
  fix: Update Handler::Condition enum values 3417 (4239)
.github/workflows/windows.yml Show resolved Hide resolved
.github/actions/dependencies/action.yml Outdated Show resolved Hide resolved
@ximinez ximinez added the Ready to merge *PR author* thinks it's ready to merge. Has passed code review. Perf sign-off may still be required. label Jun 30, 2023
ximinez added 3 commits June 30, 2023 17:55
* upstream/develop:
  APIv2(account_info): handle invalid "signer_lists" value (4585)
  fix: deb package build (4591)
* upstream/develop:
  fix: add allowClawback flag for `account_info` (4590)
  build: add binary hardening compile and link flags (4603)
  add clang-format pre-commit hook (4599)
  chore: update checkout action version to v3: (4598)
* upstream/develop:
  refactor: change the return type of mulDiv to std::optional (4243)
@ximinez ximinez mentioned this pull request Jul 6, 2023
Comment on lines 66 to 67
conan remote add ripple ${{ env.CONAN_URL }} --insert 0 || \
conan remote list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hosted runner seems to be keeping some state between runs, which cause errors when the commands are run again. This was a workaround. Here's an example from my own repo: https://github.com/ximinez/rippled/actions/runs/5490473061/jobs/10005957758#step:8:14

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because the runner does not create a fresh environment for each job? I think we should consider this a blocker.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what it looks like.

@thejohnfreeman thejohnfreeman self-requested a review July 10, 2023 14:07
ximinez added 2 commits July 12, 2023 14:46
* upstream/develop:
  Introduce AMM support (XLS-30d): (4294)
  Adapt to change in Conan recipe for NuDB (4615)
  docs(CONTRIBUTING): push beta releases to `release` (4589)
* upstream/develop:
  Rename `allowClawback` flag to `allowTrustLineClawback` (4617)
  Update dependencies (4595)
@thejohnfreeman
Copy link
Collaborator

Judging by the tests, the runner has only 4 hardware threads available. We might be able to cut the build time if we can get that number up to 12 or 16.

ximinez added 6 commits July 20, 2023 14:02
* upstream/develop:
  docs: add API Changelog (4612)
  Set version to 1.12.0-b2
  BUILD: list steps after dependencies update (4623)
* upstream/develop:
  Set version to 1.12.0-rc1
  Set version to 1.12-b3
  Several changes to improve Consensus stability: (4505)
  Apply transaction batches in periodic intervals (4504)
  Asynchronously write batches to NuDB (4503)
  refactor: fix typo in FeeUnits.h (4644)
  Update Ubuntu build image (4650)
  test: add forAllApiVersions helper function (4611)
  add view updates for account SLEs (4629)
  Fix the package recipe for consumers of libxrpl (4631)
  refactor: improve checking of path lengths (4519)
  refactor: use C++20 function std::popcount (4389)
  fix(AMM): prevent orphaned objects, inconsistent ledger state: (4626)
  feat: support Concise Transaction Identifier (CTID) (XLS-37) (4418)
* upstream/develop:
  Set version to 1.12.0-rc3
  Revert "Asynchronously write batches to NuDB (4503)"
  Revert "Apply transaction batches in periodic intervals (4504)"
  Revert "Several changes to improve Consensus stability: (4505)"
* upstream/develop:
  docs: update SECURITY.md (4338)
  refactor: simplify `TxFormats` common fields logic (4637)
  APIv2(ledger_entry): return invalidParams for bad parameters (4630)
  docs(rippled-example.cfg): clarify ssl_cert vs ssl_chain (4667)
  Introduce replacement for getting and setting thread name: (4312)
  Set version to 1.12.0
  Set version to 1.12.0-rc4
  amm_info: fetch by amm account id; add AMM object entry (4682)
  AMMBid: use tecINTERNAL for 'impossible' errors (4674)
* upstream/develop:
  Several changes to improve Consensus stability: (4505)
  Apply transaction batches in periodic intervals (4504)
  Asynchronously write batches to NuDB. (4503)
  Remove CurrentThreadName.h from RippledCore.cmake (4697)
* Stolen from (4716 discussion)
* Also add note to test step that it can fail
* upstream/develop:
  docs(BUILD.md): require GCC 11 or higher (4700)
@ximinez
Copy link
Collaborator Author

ximinez commented Oct 3, 2023

@thejohnfreeman @ckeshava I think this PR is as good as it's going to get given current infrastructure.

It's an improvement (given that we have no Windows CI at all right now), and doesn't appear to do any harm. It still takes a long time to build, but cancels old jobs, so it won't tie things up indefinitely. The only place where it might cause a problem is if we need to rush something into develop, and need to wait for this build. There are ways around that, so I don't see it as a blocker.

Mind reviewing changes and signing off so we can get this merged? Thanks!

conan export external/snappy snappy/1.1.9@
conan export external/soci soci/4.0.3@
- name: install dependencies
echo outcome=$(conan user --remote ripple ${{ secrets.CONAN_USERNAME }} \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. this change obviates the need for this PR right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait, I think this change affects only the windows workflow, never mind

@intelliot
Copy link
Collaborator

note: awaiting review from @thejohnfreeman to confirm whether prior requests were addressed

* upstream/develop:
  refactor: reunify transaction common fields: (4715)
  fix typo in SECURITY.md (4662)
  docs(API-CHANGELOG): clarify account_info response (4724)
  fix: asan stack-use-after-scope in soci::use with rvalues (4676)
* upstream/develop:
  `fixDisallowIncomingV1`: allow issuers to authorize trust lines (4721)
  refactor: reduce boilerplate in applySteps: (4710)
@thejohnfreeman
Copy link
Collaborator

Thank you for all the work here @ximinez. I know this was a heavy lift even though the changeset is small.

@ximinez ximinez added the Ready to merge *PR author* thinks it's ready to merge. Has passed code review. Perf sign-off may still be required. label Oct 9, 2023
@ximinez
Copy link
Collaborator Author

ximinez commented Oct 9, 2023

Thank you for all the work here @ximinez. I know this was a heavy lift even though the changeset is small.

My pleasure. I think it will save time and headaches in the long run.

* upstream/develop:
  docs(API-CHANGELOG): add `XRPFees` change (4741)
  docs(rippled-example.cfg): add P2P link compression (4753)
@intelliot
Copy link
Collaborator

note: intend to merge this shortly. giving CI a chance to run for a bit

@intelliot intelliot mentioned this pull request Oct 9, 2023
@intelliot intelliot added this to the 2.0 milestone Oct 9, 2023
@intelliot
Copy link
Collaborator

intelliot commented Oct 9, 2023

suggested commit message:

ci: reenable Windows CI build with Artifactory support (#4596)

Artifactory support was added to the `nix` builds with #4556. This
extends that support to the Windows build. Now the Windows build works;
CI will build and test a Windows release build. This only affects CI and
does not change any C++ code.

* Copy the remote setup step outcome fix from #4716 discussion
* Allow the Windows job to succeed if tests fail:
  * Currently the tests do not always pass, even on a single threaded
    run on the GitHub runners. So we are using parallel runs and mark
    the test step as allowed to fail (continue-on-error).
  * At this point, it's more important that the build succeeds than that
    the tests succeed, because:
  * We've got plenty of test coverage on the other jobs.
  * Test failures are much rarer than build failures because of
    cross-platform issues.
  * Having a test failure locally doesn't interrupt a workflow nearly as
    much as a build failure.

Note that Conan Center cannot hold the binaries we need. They do not
build the configurations we need, and they will not add them.

## Future Tasks

This introduces a new bottleneck since the build and test takes over an
hour. Speed up the job by:

* Making this job run on heavy Windows runners.
* Increasing the number of hardware threads.

@intelliot intelliot merged commit 3e08c39 into XRPLF:develop Oct 9, 2023
intelliot pushed a commit that referenced this pull request Oct 16, 2023
In Windows, we need to call `python` in order for the `pip` upgrade
command to work.

This changes the GitHub Actions Windows CI job to use the correct
command to upgrade PIP, fixing this error:

```
ERROR: To modify pip, please run the following command:
C:\hostedtoolcache\windows\Python\3.9.13\x64\python.exe -m pip install --upgrade pip
```

A future task is to make job run on heavy Windows runners so that it
doesn't take so long.

Context: #4596
intelliot pushed a commit that referenced this pull request Oct 23, 2023
The unity build speeds up compilation by bundling multiple source files
into one larger file. This reduces Windows CI build time by up to 50%.

As described in #4596, the automatic Windows builds take a very long
time. Unity builds are significantly faster - currently about 45 min,
much closer to the typical MacOS (35-40 minutes) and nix (~30 minutes)
run times.

This is intended as a stopgap solution until a more resourced and
reliable runner is available.

No C++ code was changed. This only affects CI.
intelliot pushed a commit that referenced this pull request Feb 1, 2024
* Disable the Windows CI unit tests "allowed to fail" workaround which
  was previously introduced in #4596.
* The runner hardware was upgraded, and the unit tests have been passing
  since then.
sophiax851 pushed a commit to sophiax851/rippled that referenced this pull request Jun 12, 2024
Artifactory support was added to the `nix` builds with XRPLF#4556. This
extends that support to the Windows build. Now the Windows build works;
CI will build and test a Windows release build. This only affects CI and
does not change any C++ code.

* Copy the remote setup step outcome fix from XRPLF#4716 discussion
* Allow the Windows job to succeed if tests fail:
  * Currently the tests do not always pass, even on a single threaded
    run on the GitHub runners. So we are using parallel runs and mark
    the test step as allowed to fail (continue-on-error).
  * At this point, it's more important that the build succeeds than that
    the tests succeed, because:
  * We've got plenty of test coverage on the other jobs.
  * Test failures are much rarer than build failures because of
    cross-platform issues.
  * Having a test failure locally doesn't interrupt a workflow nearly as
    much as a build failure.

Note that Conan Center cannot hold the binaries we need. They do not
build the configurations we need, and they will not add them.

## Future Tasks

This introduces a new bottleneck since the build and test takes over an
hour. Speed up the job by:

* Making this job run on heavy Windows runners.
* Increasing the number of hardware threads.
sophiax851 pushed a commit to sophiax851/rippled that referenced this pull request Jun 12, 2024
In Windows, we need to call `python` in order for the `pip` upgrade
command to work.

This changes the GitHub Actions Windows CI job to use the correct
command to upgrade PIP, fixing this error:

```
ERROR: To modify pip, please run the following command:
C:\hostedtoolcache\windows\Python\3.9.13\x64\python.exe -m pip install --upgrade pip
```

A future task is to make job run on heavy Windows runners so that it
doesn't take so long.

Context: XRPLF#4596
sophiax851 pushed a commit to sophiax851/rippled that referenced this pull request Jun 12, 2024
The unity build speeds up compilation by bundling multiple source files
into one larger file. This reduces Windows CI build time by up to 50%.

As described in XRPLF#4596, the automatic Windows builds take a very long
time. Unity builds are significantly faster - currently about 45 min,
much closer to the typical MacOS (35-40 minutes) and nix (~30 minutes)
run times.

This is intended as a stopgap solution until a more resourced and
reliable runner is available.

No C++ code was changed. This only affects CI.
sophiax851 pushed a commit to sophiax851/rippled that referenced this pull request Jun 12, 2024
* Disable the Windows CI unit tests "allowed to fail" workaround which
  was previously introduced in XRPLF#4596.
* The runner hardware was upgraded, and the unit tests have been passing
  since then.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Functionality Ready to merge *PR author* thinks it's ready to merge. Has passed code review. Perf sign-off may still be required.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants