Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding linenoise.cpp to llama-run #11252

Merged
merged 1 commit into from
Jan 18, 2025

Conversation

ericcurtin
Copy link
Collaborator

This is a fork of linenoise that is C++17 compatible. I intend on adding it to llama-run so we can do things like traverse prompt history via the up and down arrows:

https://github.com/ericcurtin/linenoise.cpp

@ericcurtin
Copy link
Collaborator Author

Linked issue:

containers/ramalama#586

@rhatdan
Copy link

rhatdan commented Jan 15, 2025

This does not work on Windows? I guess inside of a container it would work.

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Jan 15, 2025

This does not work on Windows? I guess inside of a container it would work.

I couldn't find any library like linenoise that had Windows support. Plenty have macOS/Linux support. But yeah if using a container or WSL2, etc. we'd be fine.

I didn't kill Windows native support here, but you don't get the cool features like being able to cycle through prompt history with up and down arrows.

Comment on lines 22 to 25
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this will require adding a notice in the releases.

Copy link
Collaborator Author

@ericcurtin ericcurtin Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where can I copy that in? I think the license is compatible once we include these copyrights

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this is usually handled, I guess we could add a file with all 3rd party copyright notices and copy it to the release packages.

Copy link
Collaborator Author

@ericcurtin ericcurtin Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the same issue with CURL if we start distributing it for Windows FWIW, from curl license: "Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies."

@ggerganov you were speaking about shipping curl on Windows recently.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't know what is the common way to handle this. Probably look for how other projects do it.

Btw, each llama.cpp release already includes a copy of the full source code in this repo so I think all licenses are technically already included in the release packages? Again, we can make this more explicit if necessary - feel free to improve.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with that technique, there's no standard technique. Everybody does it a little different.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a lawyer, but I am not convinced that's enough. We already copy our own LICENSE file to the release packages as llama.cpp.txt, so it shouldn't be a problem to also add the 3rd party licenses there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slaren can you show me the script where that copy occurs? We can just check in:

https://github.com/ericcurtin/linenoise.cpp/blob/main/LICENSE

and copy that file to linenoise.cpp.txt maybe ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks my searching tool skips hidden directories, sometimes I forget this. I hope it's good now, it should do the copy of the LICENSE now I think.

@ericcurtin ericcurtin force-pushed the linenoise.cpp branch 2 times, most recently from 27812c8 to 6360829 Compare January 15, 2025 20:52
@rhatdan
Copy link

rhatdan commented Jan 15, 2025

Most of the time windows will be using a container so this should not be an issue.

@ericcurtin ericcurtin force-pushed the linenoise.cpp branch 3 times, most recently from 4127d6a to 1aa42d3 Compare January 16, 2025 10:38
@ericcurtin
Copy link
Collaborator Author

What unblocks this from getting merged @slaren and @ggerganov ?

@github-actions github-actions bot added the devops improvements to build systems and github actions label Jan 16, 2025
@ericcurtin
Copy link
Collaborator Author

Yay green build and licensing file copied @slaren @ggerganov

@@ -796,6 +796,7 @@ jobs:
if: ${{ ( github.event_name == 'push' && github.ref == 'refs/heads/master' ) || github.event.inputs.create_release == 'true' }}
run: |
Copy-Item LICENSE .\build\bin\Release\llama.cpp.txt
Copy-Item .\examples\run\linenoise.cpp\LICENSE .\build\bin\Release\linenoise.cpp.txt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only affect the Windows releases, but the macOS and linux releases should also be updated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about now?

@ggerganov
Copy link
Owner

llama-simple-chat does not produce this sanitizer error. Only llama-run so far. This is on MacOS.

I just tested on my Linux box with address sanitizer enabled, and the address sanitizer does not produce an error during model loading. However, it produces an error at the end of the first response:

> Hello                                                                                                                                                                                                                                                                                                                       
Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
=================================================================
==3069612==ERROR: AddressSanitizer: heap-use-after-free on address 0x603000a9be00 at pc 0x7de3a083daa7 bp 0x7ffccd7d90d0 sp 0x7ffccd7d8878
READ of size 1 at 0x603000a9be00 thread T0
    #0 0x7de3a083daa6 in __interceptor_strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:389
    #1 0x5e13c29d5fd5 in std::char_traits<char>::length(char const*) /usr/include/c++/11/bits/char_traits.h:399
    #2 0x5e13c29da14f in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string<std::allocator<char> >(char const*, std::allocator<char> const&) /usr/include/c++/11/bits/basic_string.h:536
    #3 0x7de3a05275f8 in llm_chat_apply_template(llm_chat_template, std::vector<llama_chat_message const*, std::allocator<llama_chat_message const*> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool) /llama.cpp/src/llama-chat.cpp:419
    #4 0x7de3a04b9841 in llama_chat_apply_template /llama.cpp/src/llama.cpp:9966
    #5 0x5e13c29d4af1 in apply_chat_template /llama.cpp/examples/run/run.cpp:715
    #6 0x5e13c29d5334 in apply_chat_template_with_error_handling /llama.cpp/examples/run/run.cpp:851
    #7 0x5e13c29d56f8 in chat_loop /llama.cpp/examples/run/run.cpp:942
    #8 0x5e13c29d5b14 in main /llama.cpp/examples/run/run.cpp:1002
    #9 0x7de39dc29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #10 0x7de39dc29e3f in __libc_start_main_impl ../csu/libc-start.c:392
    #11 0x5e13c29d4324 in _start (/llama.cpp/build-sanitize-addr/bin/llama-run+0x12324)

0x603000a9be00 is located 16 bytes inside of 32-byte region [0x603000a9bdf0,0x603000a9be10)
freed by thread T0 here:
    #0 0x7de3a08b724f in operator delete(void*, unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:172
    #1 0x5e13c29e6d6d in __gnu_cxx::new_allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::deallocate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, unsigned long) /usr/include/c++/11/ext/new_allocator.h:145
    #2 0x5e13c29e0508 in std::allocator_traits<std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::deallocate(std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:496
    #3 0x5e13c29dc9e7 in std::_Vector_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_deallocate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, unsigned long) /usr/include/c++/11/bits/stl_vector.h:354
    #4 0x5e13c29df5e0 in void std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_realloc_insert<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /usr/include/c++/11/bits/vector.tcc:500
    #5 0x5e13c29dbb03 in std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::push_back(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /usr/include/c++/11/bits/stl_vector.h:1198
    #6 0x5e13c29d49e7 in add_message /llama.cpp/examples/run/run.cpp:709
    #7 0x5e13c29d56dd in chat_loop /llama.cpp/examples/run/run.cpp:941
    #8 0x5e13c29d5b14 in main /llama.cpp/examples/run/run.cpp:1002
    #9 0x7de39dc29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

previously allocated by thread T0 here:
    #0 0x7de3a08b61e7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x5e13c29eaf87 in __gnu_cxx::new_allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::allocate(unsigned long, void const*) /usr/include/c++/11/ext/new_allocator.h:127
    #2 0x5e13c29e6b58 in std::allocator_traits<std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::allocate(std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:464
    #3 0x5e13c29e02a1 in std::_Vector_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_allocate(unsigned long) /usr/include/c++/11/bits/stl_vector.h:346
    #4 0x5e13c29df50a in void std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_realloc_insert<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /usr/include/c++/11/bits/vector.tcc:440
    #5 0x5e13c29dbb03 in std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::push_back(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /usr/include/c++/11/bits/stl_vector.h:1198
    #6 0x5e13c29d49e7 in add_message /llama.cpp/examples/run/run.cpp:709
    #7 0x5e13c29d5587 in chat_loop /llama.cpp/examples/run/run.cpp:925
    #8 0x5e13c29d5b14 in main /llama.cpp/examples/run/run.cpp:1002
    #9 0x7de39dc29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

SUMMARY: AddressSanitizer: heap-use-after-free ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:389 in __interceptor_strlen
Shadow bytes around the buggy address:
  0x0c068014b770: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa
  0x0c068014b780: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
  0x0c068014b790: fd fa fa fa fd fd fd fa fa fa fd fd fd fd fa fa
  0x0c068014b7a0: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa
  0x0c068014b7b0: fa fa fd fd fd fa fa fa 00 00 00 00 fa fa fd fd
=>0x0c068014b7c0:[fd]fd fa fa fd fd fd fa fa fa fd fd fd fa fa fa
  0x0c068014b7d0: fd fd fd fd fa fa fd fd fd fd fa fa fd fd fd fd
  0x0c068014b7e0: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
  0x0c068014b7f0: fd fd fa fa fd fd fd fd fa fa fd fd fd fd fa fa
  0x0c068014b800: fd fd fd fa fa fa fd fd fd fd fa fa fd fd fd fd
  0x0c068014b810: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==3069612==ABORTING

Looks similar to the issue reported by @slaren earlier on Windows.

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Jan 17, 2025

I've managed to reproduce it, what's really weird is it seems to jump around the place, I'm guessing the bug is before chat_loop because of the bug where asan exposes it during the loading of the model file (that's what I've seen too). (Unless we are actually chasing 2 bugs).

@ericcurtin
Copy link
Collaborator Author

Something else I see in valgrind:

==19055== Memcheck, a memory error detector
==19055== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==19055== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==19055== Command: build/bin/llama-run /var/home/ecurtin/.local/share/ramalama/models/ollama/smollm:135m hi
==19055==
==19055== Source and destination overlap in memcpy(0x1441e080, 0x1441e080, 2304)
==19055==    at 0x4890F3C: __GI_memcpy (vg_replace_strmem.c:1151)
==19055==    by 0x4DE4E8B: ggml_compute_forward_rms_norm_f32 (ggml-cpu.c:7041)
==19055==    by 0x4DE4F5F: ggml_compute_forward_rms_norm (ggml-cpu.c:7063)
==19055==    by 0x4DF9553: ggml_compute_forward (ggml-cpu.c:12822)
==19055==    by 0x4DFB003: ggml_graph_compute_thread (ggml-cpu.c:13857)
==19055==    by 0x4DFB73F: ggml_graph_compute._omp_fn.0 (ggml-cpu.c:14128)
==19055==    by 0x556CA77: GOMP_parallel (parallel.c:178)
==19055==    by 0x4DFB44F: ggml_graph_compute (ggml-cpu.c:14119)
==19055==    by 0x4DFC307: ggml_backend_cpu_graph_compute(ggml_backend*, ggml_cgraph*) (ggml-cpu.cpp:158)
==19055==    by 0x4F147EF: ggml_backend_graph_compute_async (ggml-backend.cpp:332)
==19055==    by 0x4F1836F: ggml_backend_sched_compute_splits(ggml_backend_sched*) (ggml-backend.cpp:1397)
==19055==    by 0x4F18EDF: ggml_backend_sched_graph_compute_async (ggml-backend.cpp:1588)
==19055==
Hello! How can I help you?
==19055==
==19055== HEAP SUMMARY:
==19055==     in use at exit: 3,576 bytes in 9 blocks
==19055==   total heap usage: 181,595 allocs, 181,586 frees, 252,685,923 bytes allocated
==19055==
==19055== LEAK SUMMARY:
==19055==    definitely lost: 0 bytes in 0 blocks
==19055==    indirectly lost: 0 bytes in 0 blocks
==19055==      possibly lost: 960 bytes in 3 blocks
==19055==    still reachable: 2,616 bytes in 6 blocks
==19055==         suppressed: 0 bytes in 0 blocks
==19055== Rerun with --leak-check=full to see details of leaked memory
==19055==
==19055== For lists of detected and suppressed errors, rerun with: -s
==19055== ERROR SUMMARY: 9 errors from 1 contexts (suppressed: 0 from 0)

@ericcurtin
Copy link
Collaborator Author

I wonder if I call ggml_backend_load_all(); before initialising context will it fix this

@ericcurtin
Copy link
Collaborator Author

Nevermind I think ggml_backend_load_all() is ok

@ericcurtin
Copy link
Collaborator Author

I can confirm when we revert:

53ff6b9

this issue goes away, which cleanly reverts

@ericcurtin
Copy link
Collaborator Author

This is a simple reproducer anyway for asan, valgrind, etc. :

llama-run smollm:135m hi

@slaren
Copy link
Collaborator

slaren commented Jan 17, 2025

The sanitizer error from @ggerganov is caused by adding pointers c_str from the std::vector to messages. std::string uses small string optimization to store short strings without allocating more memory in the space of object itself, this means that a pointer returned by c_str will become invalidated after std::vector reallocation.

@ericcurtin
Copy link
Collaborator Author

The sanitizer error from @ggerganov is caused by adding pointers c_str from the std::vector to messages. std::string uses small string optimization to store short strings without allocating more memory in the space of object itself, this means that a pointer returned by c_str will become invalidated after std::vector reallocation.

Wow it would have been a long time before I considered small string optimization causing issues. Thanks so much.

I'll try and change:

std::vector<std::string> msg_strs;

to:

std::vector<const char*> msg_strs;

I guess sometimes C++ isn't safer than C 😓

@slaren
Copy link
Collaborator

slaren commented Jan 17, 2025

Replacing the std::vector with a std::list should also work, this would ensure that the strings never move.

@ericcurtin
Copy link
Collaborator Author

Still doesn't explain the llama_model_load_from_file bug which seems more readily reproducible, it happens before all this.

@ericcurtin
Copy link
Collaborator Author

Haven't been able to reproduce the other one.

@ngxson
Copy link
Collaborator

ngxson commented Jan 17, 2025

The error with model load could be a false positive described in https://github.com/google/sanitizers/wiki/AddressSanitizerContainerOverflow , where one part of the app is built without ASAN enabled (could be gguf.cpp in this case)

@ggerganov Did you try deleting the build folder?

@github-actions github-actions bot added build Compilation issues ggml changes relating to the ggml tensor library for machine learning labels Jan 17, 2025
@ericcurtin
Copy link
Collaborator Author

Nonetheless I pushed the list change 😄

This is a fork of linenoise that is C++17 compatible. I intend on
adding it to llama-run so we can do things like traverse prompt
history via the up and down arrows:

https://github.com/ericcurtin/linenoise.cpp

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
@slaren
Copy link
Collaborator

slaren commented Jan 17, 2025

The error with model load could be a false positive described in https://github.com/google/sanitizers/wiki/AddressSanitizerContainerOverflow , where one part of the app is built without ASAN enabled (could be gguf.cpp in this case)

Now that you mention it, the llama.cpp files are built without sanitizer flags. This needs to be fixed as well.

@ericcurtin
Copy link
Collaborator Author

The error with model load could be a false positive described in https://github.com/google/sanitizers/wiki/AddressSanitizerContainerOverflow , where one part of the app is built without ASAN enabled (could be gguf.cpp in this case)

@ggerganov Did you try deleting the build folder?

You could be right, things do run fine without asan

@ggerganov
Copy link
Owner

The error with model load could be a false positive described in Wiki: AddressSanitizerContainerOverflow (google/sanitizers) , where one part of the app is built without ASAN enabled (could be gguf.cpp in this case)

Now that you mention it, the llama.cpp files are built without sanitizer flags. This needs to be fixed as well.

I'll push a fix on master.

@ngxson
Copy link
Collaborator

ngxson commented Jan 17, 2025

For the msg_strs problem, I would prefer constructing the vector/list inside apply_chat_template because it's more aligned with functional programming approach. common_chat_apply_template does the same thing too (though, it's up to you to decide if you want to use or not)

@ggerganov
Copy link
Owner

#11279 resolves the sanitizer error on mac.

@JohannesGaessler
Copy link
Collaborator

Is my understanding correct that the issue with the GGUF code turned out to be a false positive that has now been fixed?

@ericcurtin
Copy link
Collaborator Author

Is my understanding correct that the issue with the GGUF code turned out to be a false positive that has now been fixed?

Yes and I think all the issues discussed in this PR are resolved now.

@ericcurtin ericcurtin merged commit a1649cc into ggerganov:master Jan 18, 2025
48 checks passed
@ericcurtin ericcurtin deleted the linenoise.cpp branch January 18, 2025 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Compilation issues devops improvements to build systems and github actions examples ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants