excessive cache invalidation in ccache #1323

powderluv · 2022-08-31T16:12:50Z

Investigate why we have ccache invalidation (especially when building from source) in the CI.

TODO: test local behaviour of ccache for a days worth of PyTorch changes and validate we see similar behaviour on the CI.

powderluv · 2022-08-31T21:03:33Z

some data points:

anush@MacBook-Pro torch-mlir % gh api \
  -H "Accept: application/vnd.github+json" \
  /repos/llvm/torch-mlir/actions/cache/usage

{
  "full_name": "llvm/torch-mlir",
  "active_caches_size_in_bytes": 12875838677,
  "active_caches_count": 38
}

and

anush@MacBook-Pro torch-mlir % gh api \
  -H "Accept: application/vnd.github+json" \
  /repos/llvm/torch-mlir/actions/caches     
{
  "total_count": 39,
  "actions_caches": [
    {
      "id": 8561,
      "ref": "refs/heads/main",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-30T22:49:32.966Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:52:16.940000000Z",
      "created_at": "2022-08-30T22:49:41.193333300Z",
      "size_in_bytes": 260985062
    },
    {
      "id": 8608,
      "ref": "refs/pull/1320/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T20:51:58.992Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:52:04.766666700Z",
      "created_at": "2022-08-31T20:52:04.766666700Z",
      "size_in_bytes": 301616535
    },
    {
      "id": 8560,
      "ref": "refs/heads/main",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-30T22:33:13.536Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:50:42.793333300Z",
      "created_at": "2022-08-30T22:33:21.193333300Z",
      "size_in_bytes": 300679370
    },
    {
      "id": 8568,
      "ref": "refs/heads/main",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-30T23:42:34.745Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:50:30.930000000Z",
      "created_at": "2022-08-30T23:42:43.846666700Z",
      "size_in_bytes": 492695119
    },
    {
      "id": 8607,
      "ref": "refs/pull/1320/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T20:47:32.908Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:47:35.636666700Z",
      "created_at": "2022-08-31T20:47:35.636666700Z",
      "size_in_bytes": 261870709
    },
    {
      "id": 8606,
      "ref": "refs/pull/1326/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T20:44:48.691Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:44:53.676666700Z",
      "created_at": "2022-08-31T20:44:53.676666700Z",
      "size_in_bytes": 301386313
    },
    {
      "id": 8605,
      "ref": "refs/pull/1326/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T20:37:20.719Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:37:25.186666700Z",
      "created_at": "2022-08-31T20:37:25.186666700Z",
      "size_in_bytes": 261569159
    },
    {
      "id": 8580,
      "ref": "refs/pull/1320/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T04:24:54.294Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:36:24.280000000Z",
      "created_at": "2022-08-31T04:24:59.120000000Z",
      "size_in_bytes": 260998212
    },
    {
      "id": 8579,
      "ref": "refs/pull/1320/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T04:24:51.998Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:34:28.890000000Z",
      "created_at": "2022-08-31T04:24:54.323333300Z",
      "size_in_bytes": 492648749
    },
    {
      "id": 8578,
      "ref": "refs/pull/1320/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T04:24:24.603Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:34:26.246666700Z",
      "created_at": "2022-08-31T04:24:29.016666700Z",
      "size_in_bytes": 300767619
    },
    {
      "id": 8604,
      "ref": "refs/tags/oneshot-20220831.50",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T20:31:21.884Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:31:25.486666700Z",
      "created_at": "2022-08-31T20:31:25.486666700Z",
      "size_in_bytes": 301398205
    },
    {
      "id": 8603,
      "ref": "refs/tags/oneshot-20220831.50",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T20:29:16.386Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:29:19.603333300Z",
      "created_at": "2022-08-31T20:29:19.603333300Z",
      "size_in_bytes": 261610659
    },
    {
      "id": 8602,
      "ref": "refs/pull/1325/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T20:06:33.851Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:06:39.460000000Z",
      "created_at": "2022-08-31T20:06:39.460000000Z",
      "size_in_bytes": 585486202
    },
    {
      "id": 8601,
      "ref": "refs/pull/1325/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T18:52:57.135Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T18:53:05.840000000Z",
      "created_at": "2022-08-31T18:53:05.840000000Z",
      "size_in_bytes": 301395382
    },
    {
      "id": 8600,
      "ref": "refs/pull/1325/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T18:41:25.524Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T18:41:27.660000000Z",
      "created_at": "2022-08-31T18:41:27.660000000Z",
      "size_in_bytes": 261698014
    },
    {
      "id": 8599,
      "ref": "refs/pull/862/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T17:48:58.672Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T17:49:06.020000000Z",
      "created_at": "2022-08-31T17:49:06.020000000Z",
      "size_in_bytes": 586311251
    },
    {
      "id": 8598,
      "ref": "refs/pull/862/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T16:31:00.389Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T16:31:04.190000000Z",
      "created_at": "2022-08-31T16:31:04.190000000Z",
      "size_in_bytes": 301622135
    },
    {
      "id": 8597,
      "ref": "refs/pull/862/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T16:28:25.358Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T16:28:27.226666700Z",
      "created_at": "2022-08-31T16:28:27.226666700Z",
      "size_in_bytes": 261969537
    },
    {
      "id": 8596,
      "ref": "refs/tags/snapshot-20220831.582",
      "key": "ccache-Linux-torch_mlir_build_assets--2022-08-31T16:14:52.629Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T16:14:53.050000000Z",
      "created_at": "2022-08-31T16:14:53.050000000Z",
      "size_in_bytes": 5980942
    },
    {
      "id": 8595,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T13:36:30.473Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:36:37.873333300Z",
      "created_at": "2022-08-31T13:36:37.873333300Z",
      "size_in_bytes": 590135705
    },
    {
      "id": 8594,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T13:35:46.871Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:35:50.500000000Z",
      "created_at": "2022-08-31T13:35:50.500000000Z",
      "size_in_bytes": 301374991
    },
    {
      "id": 8593,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T13:31:50.375Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:31:55.260000000Z",
      "created_at": "2022-08-31T13:31:55.260000000Z",
      "size_in_bytes": 261689031
    },
    {
      "id": 8586,
      "ref": "refs/heads/ashay/mlir-python-bindings",
      "key": "ccache-Linux-torch_mlir_build_assets--2022-08-31T09:38:32.636Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:30:06.156666700Z",
      "created_at": "2022-08-31T09:38:33.676666700Z",
      "size_in_bytes": 116604009
    },
    {
      "id": 8574,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T03:12:29.752Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:23:20.493333300Z",
      "created_at": "2022-08-31T03:12:32.986666700Z",
      "size_in_bytes": 261022042
    },
    {
      "id": 8575,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T03:17:24.269Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:21:49.630000000Z",
      "created_at": "2022-08-31T03:17:27.620000000Z",
      "size_in_bytes": 300635868
    },
    {
      "id": 8581,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T04:37:24.333Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:21:44.700000000Z",
      "created_at": "2022-08-31T04:37:30.106666700Z",
      "size_in_bytes": 585284840
    },
    {
      "id": 8592,
      "ref": "refs/tags/snapshot-20220831.582",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T13:12:38.632Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:12:52.636666700Z",
      "created_at": "2022-08-31T13:12:52.636666700Z",
      "size_in_bytes": 637815608
    },
    {
      "id": 8591,
      "ref": "refs/tags/snapshot-20220831.582",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T11:25:27.423Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T11:25:33.623333300Z",
      "created_at": "2022-08-31T11:25:33.623333300Z",
      "size_in_bytes": 300637490
    },
    {
      "id": 8590,
      "ref": "refs/tags/snapshot-20220831.582",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T11:16:42.808Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T11:16:44.870000000Z",
      "created_at": "2022-08-31T11:16:44.870000000Z",
      "size_in_bytes": 261027059
    },
    {
      "id": 8589,
      "ref": "refs/pull/1321/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T10:06:34.866Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T10:06:37.880000000Z",
      "created_at": "2022-08-31T10:06:37.880000000Z",
      "size_in_bytes": 492510923
    }
  ]
}

powderluv · 2022-09-01T08:01:58Z

so looks like we are loading really old caches in -- instead of the most recent cache that is uploaded.

https://github.com/llvm/torch-mlir/runs/8129353328?check_suite_focus=true restored from ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T10:06:34.866Z instead of the immediately preceding https://github.com/llvm/torch-mlir/actions/runs/2968703997 that finished 1+hr earlier and uploaded ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-09-01T05:17:27.524Z

So we need to debug / fix this cache because pinning the build wont help if we load an old cache.

Importing onnx graph fails if an output is also used by another node. This happens because the output ValueInfo will be registered, and then it will throw an error that it already exists when importing internal ValueInfos. Solution is to import the internal ValueInfos before importing the output ValueInfos. Resolves llvm#1376 Signed-off-by: Michael Holman <michhol@microsoft.com>

ashay · 2022-10-11T03:37:15Z

Building on top of the previous findings, I realized that PyTorch uses precompiled headers, which, to work with ccache, require build flags that we might have to upstream to PyTorch.

However, we can perhaps work around these limitations by leveraging the fact that we don't clear the VM disk between consecutive CI runs, although we do remove the PyTorch build files. More precisely, we could change this snippet (in the package_pytorch() function of build_libtorch.sh):

  # Copy over all of the cmake files
  mv build/lib*/torch/share     libtorch/
  mv build/lib*/torch/include   libtorch/
  mv build/lib*/torch/lib       libtorch/
  # Copy over all lib files
  mv build/lib/*                libtorch/lib/
  # Copy over all include files
  mv build/include/*            libtorch/include/

to use cp -r instead of mv. Perhaps then, the build system would pickup the fact that the object files are newer than the source files, thus avoiding a full rebuild. There is an additional small change necessary to make sure that we run git fetch only if the requested commit hash is different from the existing commit hash (so as to not change the mtime of the source files), but hopefully that broad idea makes sense. Let me know if you spot any flaws. Thanks!

powderluv · 2022-10-11T08:29:28Z

I am ok with the change from mv to cp -r to see if it helped. I actually did that change from the original Pytorch to avoid copying and just mv for speed. So lets try that.

However I am not sure we should assume we don't clear artifacts between VM invocations in the CI. I thought it is supposed to be a clean run -- maybe it was a transient bug ?

ashay · 2022-10-11T14:21:35Z

However I am not sure we should assume we don't clear artifacts between VM invocations in the CI.

Lucky for us, when you and Maksim wrote the build_libtorch.sh script, y'all added a code path to handle both cases, one where the PyTorch source is checked out and one where it doesn't.

checkout_pytorch() {
  if [[ ! -d "$PYTORCH_ROOT" ]]; then
    ...
  else
    cd "${PYTORCH_ROOT}"
    git fetch --depth=1 origin "${TORCH_MLIR_SRC_PYTORCH_BRANCH}"
    git reset --hard FETCH_HEAD
  fi

Combined with the fact that we don't pass clean: true during the checkout phase, we might be able to safely make use of the existing files. And if they don't exist or are out of date, then the script can likely perform a fresh checkout of PyTorch.

powderluv · 2022-10-12T06:55:47Z

Ahh we added that path to support local customer forks of Pytorch source builds.

powderluv assigned powderluv and sjain-stanford Aug 31, 2022

powderluv mentioned this issue Sep 30, 2022

pin pytorch source builds nightly to speed up CI #1328

Closed

ashay mentioned this issue Oct 12, 2022

Handoff LLVM and PyTorch updates #1486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

excessive cache invalidation in ccache #1323

excessive cache invalidation in ccache #1323

powderluv commented Aug 31, 2022

powderluv commented Aug 31, 2022

powderluv commented Sep 1, 2022

ashay commented Oct 11, 2022

powderluv commented Oct 11, 2022

ashay commented Oct 11, 2022

powderluv commented Oct 12, 2022

excessive cache invalidation in ccache #1323

excessive cache invalidation in ccache #1323

Comments

powderluv commented Aug 31, 2022

powderluv commented Aug 31, 2022

powderluv commented Sep 1, 2022

ashay commented Oct 11, 2022

powderluv commented Oct 11, 2022

ashay commented Oct 11, 2022

powderluv commented Oct 12, 2022