Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update embeddings tools for 2025-01-30 LTS #1354

Merged
merged 30 commits into from
Feb 12, 2025
Merged

Update embeddings tools for 2025-01-30 LTS #1354

merged 30 commits into from
Feb 12, 2025

Conversation

mlin
Copy link
Contributor

@mlin mlin commented Feb 11, 2025

Variety of minor changes & readme updates for our embedding preparation tooling, as we worked through them for the new LTS.

mlin and others added 29 commits August 26, 2024 10:36
Co-authored-by: Isaac Virshup <ivirshup@gmail.com>
Add a missing `.flatten()`
@mlin
Copy link
Contributor Author

mlin commented Feb 11, 2025

NOTE: the Python unit test failures are expected to be fixed by releasing the new embeddings.

@mlin mlin changed the title Embeddings tools updates for 2025-01-30 LTS Update embeddings tools for 2025-01-30 LTS Feb 11, 2025
@mlin mlin marked this pull request as ready for review February 11, 2025 20:58
@mlin mlin requested review from ebezzi and ivirshup February 11, 2025 20:58
Comment on lines +80 to +81
if len(test_tokens[i]) != len(true_tokens[i]):
assert test_tokens[i] == true_tokens[i] # to show diff
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very minor, but numpy has comparison + assertion functions that give helpful errors messages under np.testing. E.g.:

In [3]: np.testing.assert_array_equal([1, 2, 3], [1, 3, 3])
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[3], line 1
----> 1 np.testing.assert_array_equal([1, 2, 3], [1, 3, 3])

File ~/miniforge3/envs/ingestion/lib/python3.11/site-packages/numpy/_utils/__init__.py:85, in _rename_parameter.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
     83             raise TypeError(msg)
     84         kwargs[new_name] = kwargs.pop(old_name)
---> 85 return fun(*args, **kwargs)

    [... skipping hidden 1 frame]

File ~/miniforge3/envs/ingestion/lib/python3.11/site-packages/numpy/testing/_private/utils.py:885, in assert_array_compare(comparison, x, y, err_msg, verbose, header, precision, equal_nan, equal_inf, strict, names)
    880         err_msg += '\n' + '\n'.join(remarks)
    881         msg = build_err_msg([ox, oy], err_msg,
    882                             verbose=verbose, header=header,
    883                             names=names,
    884                             precision=precision)
--> 885         raise AssertionError(msg)
    886 except ValueError:
    887     import traceback

AssertionError: 
Arrays are not equal

Mismatched elements: 1 / 3 (33.3%)
Max absolute difference among violations: 1
Max relative difference among violations: 0.33333333
 ACTUAL: array([1, 2, 3])
 DESIRED: array([1, 3, 3])

@mlin mlin merged commit 77fdea0 into main Feb 12, 2025
10 of 15 checks passed
@mlin mlin deleted the mlin/geneformer-dec2024 branch February 12, 2025 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants