- Suppress messages from internal functions.
- Move quanteda from Depends to Imports.
- Fix tests for quanteda v4.2.0.
- Fix regression in 1.4.0 on Linux-like OS.
- Use configure to link the TBB library on MacOS.
- Add
adjust_alpha
as an experimental argument to optimizealpha
automatically. - Add
update_model
to update terms of existing models to classify documents with unseen words more accurately.
- Improve the way to convert
std::vector
toarma::mat
.
- Fix C++ code for Armadillo v14.
- Add
perplexity()
to compute perplexity scores of fitted LDA models. - Improve documentation.
- Fix tests on systems when the TBB library is unavailable.
- The RcppParallel package is no longer required as the TBB library in the operating system (Linux and MacOS) or Rtools (Windows) is used.
- Linux and MacOS must have the TBB library to enable parallel computing before installing this package from the source.
- Allow
alpha
andbeta
to be a vector for asymmetric Dirichlet priors.
- Remove
uniform
to simplify the computation of seed word weights. - Add
levels
argument to better handle hierarchical dictionaries.
- Fix the error when
textmodel_seqlda()
is called. - Save values in the Array object in double to avoid rounding error (#60).
- Add
auto_iter
totextmodel_seededlda()
andtextmodel_lda()
to stop Gibbs sampling automatically beforemax_iter
is reached. - Add
batch_size
totextmodel_seededlda()
andtextmodel_lda()
to enable the distributed LDA algorithm for parallel computing.
- Add the gamma parameter to
textmodel_seededlda()
andtextmodel_lda()
for sequential classification. - Add
textmodel_seqlda()
as as short cut fortextmodel_lda(gamma = 0.5)
. - Improve the calculation of weights for seed words.
- Add the
regularize
argument todivergence()
for the regularized topic divergence measure.
- Fix for deprecation in Matrix 1.5-4.
- Add
data_corpus_moviereviews
to the package to reduce dependency.
- Add
min_prob
andselect
totopics()
for greater flexibility - Change the divergence measure from Kullback-Leibler to Jensen-Shannon.
- Add
weighted
,min_size
,select
todivergence()
for regularized topic divergence scores.
- Change
textmodel_seededlda()
to set positive integer values toresidual
. - Fix a bug in
textmodel_seededlda()
that ignores n-grams whenconcatenator
is not "_". - Change
topics()
to return document names. - Add
divergence()
to optimize the number of topics or the seed words (#26).
- Add the
model
argument totextmodel_lda()
to replacepredict()
.
- Change the
textmodel_seededlda
object to save dictionary and related settings (#18)
- Add
predict()
to identify topics of unseen documents (#9) - Allow selecting seed words based on their frequencies using
dfm_trim()
intextmodel_seededlda()
via...
(#8)
- Change
topics()
to return factor with NA for empty documents - Fix a bug in initializing LDA that leads to incorrect phi (#4 and #6)
- Implement original LDA estimator using the LDAGibbs++ library