Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

excessive cache invalidation in ccache #1323

Open
powderluv opened this issue Aug 31, 2022 · 6 comments
Open

excessive cache invalidation in ccache #1323

powderluv opened this issue Aug 31, 2022 · 6 comments
Assignees

Comments

@powderluv
Copy link
Collaborator

Investigate why we have ccache invalidation (especially when building from source) in the CI.

TODO: test local behaviour of ccache for a days worth of PyTorch changes and validate we see similar behaviour on the CI.

@powderluv
Copy link
Collaborator Author

some data points:

anush@MacBook-Pro torch-mlir % gh api \
  -H "Accept: application/vnd.github+json" \
  /repos/llvm/torch-mlir/actions/cache/usage

{
  "full_name": "llvm/torch-mlir",
  "active_caches_size_in_bytes": 12875838677,
  "active_caches_count": 38
}

and

anush@MacBook-Pro torch-mlir % gh api \
  -H "Accept: application/vnd.github+json" \
  /repos/llvm/torch-mlir/actions/caches     
{
  "total_count": 39,
  "actions_caches": [
    {
      "id": 8561,
      "ref": "refs/heads/main",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-30T22:49:32.966Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:52:16.940000000Z",
      "created_at": "2022-08-30T22:49:41.193333300Z",
      "size_in_bytes": 260985062
    },
    {
      "id": 8608,
      "ref": "refs/pull/1320/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T20:51:58.992Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:52:04.766666700Z",
      "created_at": "2022-08-31T20:52:04.766666700Z",
      "size_in_bytes": 301616535
    },
    {
      "id": 8560,
      "ref": "refs/heads/main",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-30T22:33:13.536Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:50:42.793333300Z",
      "created_at": "2022-08-30T22:33:21.193333300Z",
      "size_in_bytes": 300679370
    },
    {
      "id": 8568,
      "ref": "refs/heads/main",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-30T23:42:34.745Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:50:30.930000000Z",
      "created_at": "2022-08-30T23:42:43.846666700Z",
      "size_in_bytes": 492695119
    },
    {
      "id": 8607,
      "ref": "refs/pull/1320/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T20:47:32.908Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:47:35.636666700Z",
      "created_at": "2022-08-31T20:47:35.636666700Z",
      "size_in_bytes": 261870709
    },
    {
      "id": 8606,
      "ref": "refs/pull/1326/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T20:44:48.691Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:44:53.676666700Z",
      "created_at": "2022-08-31T20:44:53.676666700Z",
      "size_in_bytes": 301386313
    },
    {
      "id": 8605,
      "ref": "refs/pull/1326/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T20:37:20.719Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:37:25.186666700Z",
      "created_at": "2022-08-31T20:37:25.186666700Z",
      "size_in_bytes": 261569159
    },
    {
      "id": 8580,
      "ref": "refs/pull/1320/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T04:24:54.294Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:36:24.280000000Z",
      "created_at": "2022-08-31T04:24:59.120000000Z",
      "size_in_bytes": 260998212
    },
    {
      "id": 8579,
      "ref": "refs/pull/1320/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T04:24:51.998Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:34:28.890000000Z",
      "created_at": "2022-08-31T04:24:54.323333300Z",
      "size_in_bytes": 492648749
    },
    {
      "id": 8578,
      "ref": "refs/pull/1320/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T04:24:24.603Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:34:26.246666700Z",
      "created_at": "2022-08-31T04:24:29.016666700Z",
      "size_in_bytes": 300767619
    },
    {
      "id": 8604,
      "ref": "refs/tags/oneshot-20220831.50",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T20:31:21.884Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:31:25.486666700Z",
      "created_at": "2022-08-31T20:31:25.486666700Z",
      "size_in_bytes": 301398205
    },
    {
      "id": 8603,
      "ref": "refs/tags/oneshot-20220831.50",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T20:29:16.386Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:29:19.603333300Z",
      "created_at": "2022-08-31T20:29:19.603333300Z",
      "size_in_bytes": 261610659
    },
    {
      "id": 8602,
      "ref": "refs/pull/1325/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T20:06:33.851Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:06:39.460000000Z",
      "created_at": "2022-08-31T20:06:39.460000000Z",
      "size_in_bytes": 585486202
    },
    {
      "id": 8601,
      "ref": "refs/pull/1325/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T18:52:57.135Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T18:53:05.840000000Z",
      "created_at": "2022-08-31T18:53:05.840000000Z",
      "size_in_bytes": 301395382
    },
    {
      "id": 8600,
      "ref": "refs/pull/1325/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T18:41:25.524Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T18:41:27.660000000Z",
      "created_at": "2022-08-31T18:41:27.660000000Z",
      "size_in_bytes": 261698014
    },
    {
      "id": 8599,
      "ref": "refs/pull/862/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T17:48:58.672Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T17:49:06.020000000Z",
      "created_at": "2022-08-31T17:49:06.020000000Z",
      "size_in_bytes": 586311251
    },
    {
      "id": 8598,
      "ref": "refs/pull/862/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T16:31:00.389Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T16:31:04.190000000Z",
      "created_at": "2022-08-31T16:31:04.190000000Z",
      "size_in_bytes": 301622135
    },
    {
      "id": 8597,
      "ref": "refs/pull/862/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T16:28:25.358Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T16:28:27.226666700Z",
      "created_at": "2022-08-31T16:28:27.226666700Z",
      "size_in_bytes": 261969537
    },
    {
      "id": 8596,
      "ref": "refs/tags/snapshot-20220831.582",
      "key": "ccache-Linux-torch_mlir_build_assets--2022-08-31T16:14:52.629Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T16:14:53.050000000Z",
      "created_at": "2022-08-31T16:14:53.050000000Z",
      "size_in_bytes": 5980942
    },
    {
      "id": 8595,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T13:36:30.473Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:36:37.873333300Z",
      "created_at": "2022-08-31T13:36:37.873333300Z",
      "size_in_bytes": 590135705
    },
    {
      "id": 8594,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T13:35:46.871Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:35:50.500000000Z",
      "created_at": "2022-08-31T13:35:50.500000000Z",
      "size_in_bytes": 301374991
    },
    {
      "id": 8593,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T13:31:50.375Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:31:55.260000000Z",
      "created_at": "2022-08-31T13:31:55.260000000Z",
      "size_in_bytes": 261689031
    },
    {
      "id": 8586,
      "ref": "refs/heads/ashay/mlir-python-bindings",
      "key": "ccache-Linux-torch_mlir_build_assets--2022-08-31T09:38:32.636Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:30:06.156666700Z",
      "created_at": "2022-08-31T09:38:33.676666700Z",
      "size_in_bytes": 116604009
    },
    {
      "id": 8574,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T03:12:29.752Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:23:20.493333300Z",
      "created_at": "2022-08-31T03:12:32.986666700Z",
      "size_in_bytes": 261022042
    },
    {
      "id": 8575,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T03:17:24.269Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:21:49.630000000Z",
      "created_at": "2022-08-31T03:17:27.620000000Z",
      "size_in_bytes": 300635868
    },
    {
      "id": 8581,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T04:37:24.333Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:21:44.700000000Z",
      "created_at": "2022-08-31T04:37:30.106666700Z",
      "size_in_bytes": 585284840
    },
    {
      "id": 8592,
      "ref": "refs/tags/snapshot-20220831.582",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T13:12:38.632Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:12:52.636666700Z",
      "created_at": "2022-08-31T13:12:52.636666700Z",
      "size_in_bytes": 637815608
    },
    {
      "id": 8591,
      "ref": "refs/tags/snapshot-20220831.582",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T11:25:27.423Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T11:25:33.623333300Z",
      "created_at": "2022-08-31T11:25:33.623333300Z",
      "size_in_bytes": 300637490
    },
    {
      "id": 8590,
      "ref": "refs/tags/snapshot-20220831.582",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T11:16:42.808Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T11:16:44.870000000Z",
      "created_at": "2022-08-31T11:16:44.870000000Z",
      "size_in_bytes": 261027059
    },
    {
      "id": 8589,
      "ref": "refs/pull/1321/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T10:06:34.866Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T10:06:37.880000000Z",
      "created_at": "2022-08-31T10:06:37.880000000Z",
      "size_in_bytes": 492510923
    }
  ]
}


@powderluv
Copy link
Collaborator Author

so looks like we are loading really old caches in -- instead of the most recent cache that is uploaded.

https://github.com/llvm/torch-mlir/runs/8129353328?check_suite_focus=true restored from ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T10:06:34.866Z instead of the immediately preceding https://github.com/llvm/torch-mlir/actions/runs/2968703997 that finished 1+hr earlier and uploaded ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-09-01T05:17:27.524Z

So we need to debug / fix this cache because pinning the build wont help if we load an old cache.

qedawkins pushed a commit to nod-ai/torch-mlir that referenced this issue Oct 3, 2022
Importing onnx graph fails if an output is also used by another node. This happens because the output ValueInfo will be registered, and then it will throw an error that it already exists when importing internal ValueInfos.

Solution is to import the internal ValueInfos before importing the output ValueInfos.

Resolves llvm#1376

Signed-off-by: Michael Holman <michhol@microsoft.com>
@ashay
Copy link
Collaborator

ashay commented Oct 11, 2022

Building on top of the previous findings, I realized that PyTorch uses precompiled headers, which, to work with ccache, require build flags that we might have to upstream to PyTorch.

However, we can perhaps work around these limitations by leveraging the fact that we don't clear the VM disk between consecutive CI runs, although we do remove the PyTorch build files. More precisely, we could change this snippet (in the package_pytorch() function of build_libtorch.sh):

  # Copy over all of the cmake files
  mv build/lib*/torch/share     libtorch/
  mv build/lib*/torch/include   libtorch/
  mv build/lib*/torch/lib       libtorch/
  # Copy over all lib files
  mv build/lib/*                libtorch/lib/
  # Copy over all include files
  mv build/include/*            libtorch/include/

to use cp -r instead of mv. Perhaps then, the build system would pickup the fact that the object files are newer than the source files, thus avoiding a full rebuild. There is an additional small change necessary to make sure that we run git fetch only if the requested commit hash is different from the existing commit hash (so as to not change the mtime of the source files), but hopefully that broad idea makes sense. Let me know if you spot any flaws. Thanks!

@powderluv
Copy link
Collaborator Author

I am ok with the change from mv to cp -r to see if it helped. I actually did that change from the original Pytorch to avoid copying and just mv for speed. So lets try that.

However I am not sure we should assume we don't clear artifacts between VM invocations in the CI. I thought it is supposed to be a clean run -- maybe it was a transient bug ?

@ashay
Copy link
Collaborator

ashay commented Oct 11, 2022

However I am not sure we should assume we don't clear artifacts between VM invocations in the CI.

Lucky for us, when you and Maksim wrote the build_libtorch.sh script, y'all added a code path to handle both cases, one where the PyTorch source is checked out and one where it doesn't.

checkout_pytorch() {
  if [[ ! -d "$PYTORCH_ROOT" ]]; then
    ...
  else
    cd "${PYTORCH_ROOT}"
    git fetch --depth=1 origin "${TORCH_MLIR_SRC_PYTORCH_BRANCH}"
    git reset --hard FETCH_HEAD
  fi

Combined with the fact that we don't pass clean: true during the checkout phase, we might be able to safely make use of the existing files. And if they don't exist or are out of date, then the script can likely perform a fresh checkout of PyTorch.

@powderluv
Copy link
Collaborator Author

Ahh we added that path to support local customer forks of Pytorch source builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants