Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARKNLP-1027] llama.cpp integration #14364

Conversation

DevinTDHa
Copy link
Member

Description

This PR implements support for llama.cpp in Spark NLP.

llama.cpp is a high-performance C/C++ library designed for running Meta's LLaMA models and other large language models (LLMs) on a variety of hardware platforms.

This will enable users to do inference of LLMs wich a variety of optimizations:

  • Hardware Optimization: Supports Apple silicon via Metal, x86 architectures with AVX, NVIDIA GPUs (via CUDA).
  • Quantization: Model Quantization (1.5-bit to 8-bit) to improve inference speed and reduce memory usage.

Motivation and Context

Many users will have clusters with many smaller nodes. This will enable these nodes to also perform inference for LLMs with limited memory.

How Has This Been Tested?

Local tests, google colab, databricks

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code improvements with no or little impact
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING page.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@DevinTDHa DevinTDHa added new-feature Introducing a new feature dependencies Pull requests that update a dependency file labels Aug 8, 2024
@DevinTDHa DevinTDHa requested a review from maziyarpanahi August 8, 2024 14:57
@DevinTDHa DevinTDHa self-assigned this Aug 8, 2024
@DevinTDHa DevinTDHa added the DON'T MERGE Do not merge this PR label Aug 8, 2024
@DevinTDHa DevinTDHa changed the title Feature/sparknlp 1027 llama cpp integration [SPARKNLP-1027] llama.cpp integration Aug 15, 2024
@maziyarpanahi maziyarpanahi changed the base branch from master to release/550-release-candidate September 2, 2024 17:59
@coveralls
Copy link

coveralls commented Sep 2, 2024

@maziyarpanahi maziyarpanahi marked this pull request as ready for review September 3, 2024 10:39
Seq(llamaCppGPU)
else if (is_silicon.equals("true"))
Seq(llamaCppSilicon)
// else if (is_aarch64.equals("true"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DevinTDHa We don't need a special build for aarch64 or it's not supported?

@maziyarpanahi maziyarpanahi merged commit c2c0e48 into JohnSnowLabs:release/550-release-candidate Sep 5, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file DON'T MERGE Do not merge this PR new-feature Introducing a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants