Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SparkNLP 1005 implement nomic embeddings #14217

Conversation

prabod
Copy link
Contributor

@prabod prabod commented Mar 27, 2024

This PR introduces nomic embeddings to Spark NLP

Description

nomic-embed-text-v1 is 8192 context length text encoder that surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small performance on short and long context tasks.

Types of changes

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING page.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@prabod prabod added new-feature Introducing a new feature new model labels Mar 27, 2024
@prabod prabod self-assigned this Mar 27, 2024
@jtattersall09403
Copy link

Hi, has there been any further progress on this PR? Are there any estimated timescales for when it might be completed? Thanks!

@prabod prabod force-pushed the SPARKNLP-1005-Implement-NomicEmbeddings branch from fe5537a to fcdbb6c Compare August 7, 2024 08:56
@prabod prabod marked this pull request as ready for review August 7, 2024 08:58
@prabod prabod requested a review from maziyarpanahi August 7, 2024 08:58
@maziyarpanahi maziyarpanahi changed the base branch from master to release/550-release-candidate September 1, 2024 18:12
@maziyarpanahi
Copy link
Member

Hi, has there been any further progress on this PR? Are there any estimated timescales for when it might be completed? Thanks!

We are preparing this PR to be included in the next release :)

@maziyarpanahi maziyarpanahi merged commit 803edf6 into release/550-release-candidate Sep 1, 2024
4 checks passed
@coveralls
Copy link

Pull Request Test Coverage Report for Build 10656294335

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 62.422%

Totals Coverage Status
Change from base Build 10656276853: 0.0%
Covered Lines: 8970
Relevant Lines: 14370

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new model new-feature Introducing a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants