Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-replicable results #11

Open
elcilorien opened this issue Dec 12, 2019 · 2 comments
Open

Non-replicable results #11

elcilorien opened this issue Dec 12, 2019 · 2 comments

Comments

@elcilorien
Copy link

Hi, I've run into an issue where the output of the sLDA model (i.e., predicted values, topic assignments, etc) are different when I re-run the exact same code on the exact same input data. My understanding was that if the random seed variable was unchanged, that I should get the same output. This is an issue because I want to be able to go back and use the exact same model to create out-of-sample predictions for a new set of out-of-sample documents. Can you help me figure out what I should be doing to make sure the model doesn't change? Thanks!

@bearnshaw
Copy link
Contributor

Hi @elcilorien, I'm not sure why you are experiencing this. Note that the only randomness is in create_topic_lookup and create_rands, and you can verify that both use seed if it is not None. Are you sure seed is not None?

@elcilorien
Copy link
Author

Hi @beanrshaw. It turns out there were actually some differences in the input files, so that was completely my fault. However, I've run into a different issue I was hoping you could help me with... I've been running sLDA models with about 5,000 documents and 150 topics and I keep getting quite negative R-squared values. In one case it was -2. Is this a sign of overfitting of the model (150 is too many topics?) or do I just have no predictive power from my texts? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants