Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give a pause when we iter on the index file and when we decompress a PACK file #631

Merged
merged 7 commits into from
Feb 11, 2024

Conversation

dinosaure
Copy link
Member

/cc @hannesm: a probably solution to let the opam-mirror unikernel be available even if we try to clone.

@hannesm
Copy link
Member

hannesm commented Feb 5, 2024

I tried this PR, thanks a lot. Unfortunately it doesn't solve the issue - I still don't get any reply to curl while the:

    batch_write t ~uid_ln:Hash.length ~uid_rw:Hash.of_raw_string ~map ~iter
      pck_contents index

is being evaluated. I'm at the last expression of the batch_write, namely between the two log outputs (note, in the same batch_write, there's an earlier call to iter index ~f with a different f, but that's finished in no time -- obseved below is the local f that calls into Carton.Dec):

  Log.info (fun m -> m "toplevel Mem.batch_write, now iterating");
  iter index ~f >|= fun () ->
  Log.info (fun m -> m "toplevel Mem.batch_write, iterating done")

And I also encounter some issue with opam-mirror.

What I observe is:

Solo5: trap: type=#PF ec=0x0 rip=0x4d4f20 rsp=0x3ffffb50 rflags=0x10006
Solo5: trap: cr2=0x87407b87188
Solo5: ABORT: cpu_x86_64.c:181: Fatal trap

With the disassembly of opam-mirror.hvt:

00000000004d4ee0 <free>:
  4d4ee0: 55                            pushq   %rbp
  4d4ee1: 48 89 e5                      movq    %rsp, %rbp
  4d4ee4: 53                            pushq   %rbx
  4d4ee5: 50                            pushq   %rax
  4d4ee6: 48 85 ff                      testq   %rdi, %rdi
  4d4ee9: 0f 84 fe 06 00 00             je      0x4d55ed <free+0x70d>
  4d4eef: 48 8b 4f f8                   movq    -0x8(%rdi), %rcx
  4d4ef3: 48 83 c7 f0                   addq    $-0x10, %rdi
  4d4ef7: 41 89 c9                      movl    %ecx, %r9d
  4d4efa: 41 83 e1 03                   andl    $0x3, %r9d
  4d4efe: 48 89 ca                      movq    %rcx, %rdx
  4d4f01: 48 83 e2 f8                   andq    $-0x8, %rdx
  4d4f05: 49 83 f9 01                   cmpq    $0x1, %r9
  4d4f09: 74 07                         je      0x4d4f12 <free+0x32>
  4d4f0b: 48 29 15 0e 4a 44 00          subq    %rdx, 0x444a0e(%rip)    # 0x919920 <_gm_+0x3a8>
  4d4f12: 48 8b 35 0f 4a 44 00          movq    0x444a0f(%rip), %rsi    # 0x919928 <mparams>
  4d4f19: 48 8b 04 17                   movq    (%rdi,%rdx), %rax
  4d4f1d: 48 31 f0                      xorq    %rsi, %rax
  4d4f20: 48 39 70 40                   cmpq    %rsi, 0x40(%rax)
  4d4f24: 0f 85 e9 06 00 00             jne     0x4d5613 <free+0x733>
  4d4f2a: 4c 8b 40 18                   movq    0x18(%rax), %r8
  4d4f2e: 4c 39 c7                      cmpq    %r8, %rdi

So, a page fault in free()... uhoh

@hannesm
Copy link
Member

hannesm commented Feb 5, 2024

FWIW, another code location where git blocks is in Smart_git module, the fetch_v1 function where in a Lwt.try_bind

      (fun refs ->
        pack None >>= fun () ->
        Mimic.close flow >>= fun () -> Lwt.return_ok refs))

is evaluated. The pack None is taking ~40 seconds on my laptop (with the opam-repository) during which time it is blocked..

src/git/mem.ml Outdated Show resolved Hide resolved
@palainp
Copy link
Member

palainp commented Feb 5, 2024

Sorry to jump in here (without any help regarding the page fault :( ), bellow my comments on the asm code (with some edits, I misplaced some lines).

With the disassembly of opam-mirror.hvt:

00000000004d4ee0 <free>:
  4d4ee0: 55                            pushq   %rbp
  4d4ee1: 48 89 e5                      movq    %rsp, %rbp
  4d4ee4: 53                            pushq   %rbx
  4d4ee5: 50                            pushq   %rax
  4d4ee6: 48 85 ff                      testq   %rdi, %rdi
  4d4ee9: 0f 84 fe 06 00 00             je      0x4d55ed <free+0x70d>
  4d4eef: 48 8b 4f f8                   movq    -0x8(%rdi), %rcx
  4d4ef3: 48 83 c7 f0                   addq    $-0x10, %rdi
  4d4ef7: 41 89 c9                      movl    %ecx, %r9d
  4d4efa: 41 83 e1 03                   andl    $0x3, %r9d            ; 3 could be INUSE_BITS (=1|2) in `dlmalloc.i:2280` , and this could be (((p)->head & INUSE_BITS) 
  4d4efe: 48 89 ca                      movq    %rcx, %rdx        
  4d4f01: 48 83 e2 f8                   andq    $-0x8, %rdx          
  4d4f05: 49 83 f9 01                   cmpq    $0x1, %r9             ; therefore this is the test     if (is_inuse(p))  in `dlmalloc.i:4722` defined as `(((p)->head & INUSE_BITS) != PINUSE_BIT)` with PINUSE_BIT=1
  4d4f09: 74 07                         je      0x4d4f12 <free+0x32>
  4d4f0b: 48 29 15 0e 4a 44 00          subq    %rdx, 0x444a0e(%rip)    # 0x919920 <_gm_+0x3a8>   ; this is probably `gm->internal_memory_usage -= chunksize(p);`, sjipped when not equal
  4d4f12: 48 8b 35 0f 4a 44 00          movq    0x444a0f(%rip), %rsi    # 0x919928 <mparams>
  4d4f19: 48 8b 04 17                   movq    (%rdi,%rdx), %rax
  4d4f1d: 48 31 f0                      xorq    %rsi, %rax      ;  and this could be from `mstate fm = get_mstate_for(p);` with `get_mstate_for` defined at `dlamalloc.i:3092`
  4d4f20: 48 39 70 40                   cmpq    %rsi, 0x40(%rax)   ;  the comparison could be `if (!ok_magic(fm)) {` defined as `((M)->magic == mparams.magic)`, and this means that the heap is somehow corrupted :(
  4d4f24: 0f 85 e9 06 00 00             jne     0x4d5613 <free+0x733>
  4d4f2a: 4c 8b 40 18                   movq    0x18(%rax), %r8
  4d4f2e: 4c 39 c7                      cmpq    %r8, %rdi

Co-authored-by: Hannes Mehnert <hannes@mehnert.org>
@hannesm
Copy link
Member

hannesm commented Feb 5, 2024

@palainp thanks for your comments.. this is with OCaml 4.14.1 and ocaml-solo5 0.8.2 (and solo5 0.7 series) FWIW.

@palainp
Copy link
Member

palainp commented Feb 5, 2024

@palainp thanks for your comments.. this is with OCaml 4.14.1 and ocaml-solo5 0.8.2 (and solo5 0.7 series) FWIW.

Yes I've looked against current ocaml-solo5 head which is 0.8.2, and I (correctly supposed to be Ocaml 4.14.1 :) ).

Co-authored-by: Hannes Mehnert <hannes@mehnert.org>
@dinosaure dinosaure force-pushed the yield-on-batch-write branch from bd1bb09 to 0151247 Compare February 8, 2024 16:58
@dinosaure
Copy link
Member Author

I finally did a last test on opam-mirror and this PR fixs our issue. Ready to merge.

@dinosaure dinosaure force-pushed the yield-on-batch-write branch from 67fa53c to 64b1bf5 Compare February 11, 2024 11:39
@dinosaure dinosaure merged commit a36c904 into main Feb 11, 2024
1 check passed
@dinosaure dinosaure deleted the yield-on-batch-write branch February 11, 2024 12:00
dinosaure added a commit to dinosaure/opam-repository that referenced this pull request Feb 12, 2024
CHANGES:

- Add a `Lwt.pause` to insert a cooperative point when we verify a PACK file (@dinosaure, @hannesm, mirage/ocaml-git#631)
dinosaure added a commit to dinosaure/opam-repository that referenced this pull request Feb 12, 2024
CHANGES:

- Update unikernels (@hannesm, mirage/ocaml-git#621)
- Extend the Smart protocol with `have` and `want` (@plangesd, @dinosaure, mirage/ocaml-git#626)
- Rename function to get capabilities from the client and the server (@Julow, mirage/ocaml-git#627)
- Extend the Smart protocol with a possible `done` or a `flush` (@plangesd, mirage/ocaml-git#628)
- Extend the Smart protocol with `ack` and handle empty request (@plangesd, @Julow, mirage/ocaml-git#629)
- Add missing new lines for the server side (@plangesd, mirage/ocaml-git#630)
- Be more cooperative with other services (like `http`) when we clone (@dinosaure, @hannesm, mirage/ocaml-git#631)
dinosaure added a commit that referenced this pull request May 19, 2024
Give a pause when we iter on the index file and when we decompress a PACK file
nberth pushed a commit to nberth/opam-repository that referenced this pull request Jun 18, 2024
CHANGES:

- Add a `Lwt.pause` to insert a cooperative point when we verify a PACK file (@dinosaure, @hannesm, mirage/ocaml-git#631)
nberth pushed a commit to nberth/opam-repository that referenced this pull request Jun 18, 2024
CHANGES:

- Update unikernels (@hannesm, mirage/ocaml-git#621)
- Extend the Smart protocol with `have` and `want` (@plangesd, @dinosaure, mirage/ocaml-git#626)
- Rename function to get capabilities from the client and the server (@Julow, mirage/ocaml-git#627)
- Extend the Smart protocol with a possible `done` or a `flush` (@plangesd, mirage/ocaml-git#628)
- Extend the Smart protocol with `ack` and handle empty request (@plangesd, @Julow, mirage/ocaml-git#629)
- Add missing new lines for the server side (@plangesd, mirage/ocaml-git#630)
- Be more cooperative with other services (like `http`) when we clone (@dinosaure, @hannesm, mirage/ocaml-git#631)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants