Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Fix issues + add sync point to test embeddings #2397

Merged
merged 24 commits into from
Jul 3, 2024

Conversation

sanketkedia
Copy link
Contributor

@sanketkedia sanketkedia commented Jun 21, 2024

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • Fixes allowed_ids and disallowed_ids to also take care of updates/deletes/upserts. For e.g. if there is an update on the log that does not update the embedding and it is in the query list then today we are never going to return this record even if it is in the top k
    • Adds sync points to test_embeddings + increase test timeout
    • Adds another rule in test_embeddings for compaction
    • Suppresses health check warning for filtering too much
    • Fixes the case when trying to commit and flush an empty block (can happen due to deletes). Sparse index start key can also get changed to something that is not SparseIndexDelimiter::Start. We decided to go ahead with flushing a dummy block if blockfile becomes fully empty so that our segment abstraction is only uninitialized until the first compaction; post that it is always initialized albeit with empty block
    • Fixes a bug in FTS delete document where we were incorrectly panicing
    • Fixes a bug in record segment apply materialization where for deletes and updates we missed writing the max offset id
    • Updates to metadata segment were missing updating the document and were only updating the metadata
    • Don't return error from the metadata segment if document is supplied as None for an update

Test plan

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

None

@sanketkedia sanketkedia requested a review from HammadB June 21, 2024 05:47
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@sanketkedia sanketkedia force-pushed the sync_point_test_embeddings branch from 98745be to ded33d1 Compare June 24, 2024 21:46
Copy link
Collaborator

@HammadB HammadB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would love to grab 5 minutes just to make sure I understand the logic changes in hnsw_knn.rs

@sanketkedia sanketkedia force-pushed the sync_point_test_embeddings branch 3 times, most recently from 1543d38 to 99f56d3 Compare July 2, 2024 02:58
Copy link
Collaborator

@HammadB HammadB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good minus comments around mixin and the blockfile changes

@sanketkedia sanketkedia force-pushed the sync_point_test_embeddings branch from 3ecfdfa to 78df19b Compare July 2, 2024 23:21
@sanketkedia sanketkedia merged commit 6769627 into main Jul 3, 2024
65 checks passed
Ishiihara pushed a commit that referenced this pull request Jul 16, 2024
*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
- Fixes allowed_ids and disallowed_ids to also take care of
updates/deletes/upserts. For e.g. if there is an update on the log that
does not update the embedding and it is in the query list then today we
are never going to return this record even if it is in the top k
    - Adds sync points to test_embeddings + increase test timeout
    - Adds another rule in test_embeddings for compaction
    - Suppresses health check warning for filtering too much
- Fixes the case when trying to commit and flush an empty block (can
happen due to deletes). Sparse index start key can also get changed to
something that is not SparseIndexDelimiter::Start. We decided to go
ahead with flushing a dummy block if blockfile becomes fully empty so
that our segment abstraction is only uninitialized until the first
compaction; post that it is always initialized albeit with empty block
- Fixes a bug in FTS delete document where we were incorrectly panicing
- Fixes a bug in record segment apply materialization where for deletes
and updates we missed writing the max offset id
- Updates to metadata segment were missing updating the document and
were only updating the metadata
- Don't return error from the metadata segment if document is supplied
as None for an update

- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants