Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic modeling with ~300,000 cell dataset: polars' maximum length reached #194

Open
ChenWeiyan opened this issue Dec 3, 2024 · 1 comment

Comments

@ChenWeiyan
Copy link

Hi,

Thanks for the great tool!

I am now working with large scATAC-seq data, which contains ~300,000 cells. And the peak number is 575177. So it is a quite large dataset.

The problem occurred during the topic modeling, when I ran run_cgs_models_mallet function. And it produce error with:

PanicException: polars' maximum length reached. Consider installing 'polars-u64-idx'.: TryFromIntError(())

I increased my memory from 800GB to 1000GB. But it didn't help. It seems to be a data structure error?

The version I used is pycisTopic: 2.0a0.

Do you have any suggestion or alternative solution?

Thanks,
Weiyan

@ghuls
Copy link
Member

ghuls commented Dec 9, 2024

The topic modeling related code is recently rewritten (polars_1xx branch), which makes it a lot faster and uses a lot less memory when you have a lot of cells and regions.

See:
#106 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants