This fork implements Probabilistic Label Trees (PLTs) in Vowpal Wabbit for extreme multi-label classification. It was marged to the main Vowpal Wabbit repository.
Our other PLTs implementations are available here:
PLTs have been introduced and extended in the articles listed below. Please cite this article if you use PLTs in your research.
- Marek Wydmuch, Kalina Jasinska-Kobus, Rohit Babbar, Krzysztof Dembczyński: Propensity-scored Probabilistic Label Trees Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021
@inproceedings{Wydmuch_at_el_2021,
author = {Wydmuch, Marek and Jasinska-Kobus, Kalina and Babbar, Rohit and Dembczynski, Krzysztof},
title = {Propensity-Scored Probabilistic Label Trees},
year = {2021},
isbn = {9781450380379},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3404835.3463084},
doi = {10.1145/3404835.3463084},
booktitle = {Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages = {2252–2256},
numpages = {5},
keywords = {label trees, recommendation, multi-label classification, missing labels, tagging, propensity model, supervised learning, extreme classification, ranking},
location = {Virtual Event, Canada},
series = {SIGIR '21}
}
- Kalina Jasinska-Kobus, Marek Wydmuch, Devanathan Thiruvenkatachari, Krzysztof Dembczyński: Online probabilistic label trees PMLR, Volume 130: International Conference on Artificial Intelligence and Statistics, AISTATS, 2021
@inproceedings{Jasinska-Kobus_Wydmuch_at_el_2021,
title = {Online probabilistic label trees},
author = {Jasinska-Kobus, Kalina and Wydmuch, Marek and Thiruvenkatachari, Devanathan and Dembczynski, Krzysztof},
booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics},
pages = {1801--1809},
year = {2021},
editor = {Banerjee, Arindam and Fukumizu, Kenji},
volume = {130},
series = {Proceedings of Machine Learning Research},
month = {13--15 Apr},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v130/jasinska-kobus21a/jasinska-kobus21a.pdf},
url = {http://proceedings.mlr.press/v130/jasinska-kobus21a.html},
}
- Kalina Jasinska-Kobus, Marek Wydmuch, Krzysztof Dembczyński: Probabilistic Label Trees for Extreme Multi-label Classification
@misc{Jasinska-Kobus_at_el_2020,
title= {Probabilistic Label Trees for Extreme Multi-label Classification},
author= {Kalina Jasinska-Kobus and Marek Wydmuch and Krzysztof Dembczynski and Mikhail Kuznetsov and Robert Busa-Fekete},
year= {2020},
eprint= {2009.11218},
archivePrefix = {arXiv},
primaryClass = {cs.LG}
}
- Marek Wydmuch, Kalina Jasinska, Mikhail Kuznetsov, Robert Busa-Fekete: A no-regret generalization of hierarchical softmax to extreme multi-label classification NeurIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018
@incollection{Wydmuch_at_el_2018b,
title = {A no-regret generalization of hierarchical softmax to extreme multi-label classification},
author = {Wydmuch, Marek and Jasinska, Kalina and Kuznetsov, Mikhail and Busa-Fekete, R\'{o}bert and Dembczynski, Krzysztof},
booktitle = {Advances in Neural Information Processing Systems},
volume = {31},
editor = {S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett},
pages = {6358--6368},
year = {2018},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/7872-a-no-regret-generalization-of-hierarchical-softmax-to-extreme-multi-label-classification.pdf}
}
- Kalina Jasinska, Krzysztof Dembczynski, Robert Busa-Fekete, Karlson Pfannschmidt, Timo Klerx, Eyke Hullermeier: Extreme F-measure Maximization using Sparse Probability Estimates. Proceedings of The 33rd International Conference on Machine Learning, 2016.
@inproceedings{Jasinska_et_al_2016,
title = {Extreme F-measure Maximization using Sparse Probability Estimates},
author = {Kalina Jasinska and Krzysztof Dembczynski and Robert Busa-Fekete and Karlson Pfannschmidt and Timo Klerx and Eyke Hullermeier},
booktitle = {Proceedings of The 33rd International Conference on Machine Learning},
pages = {1435--1444},
year = {2016},
editor = {Maria Florina Balcan and Kilian Q. Weinberger},
volume = {48},
series = {Proceedings of Machine Learning Research},
address = {New York, New York, USA},
publisher = {PMLR},
}
--plt arg Use PLT for multi-label learning with arg labels
--kary_tree arg (=2) Use an arg-ary tree. By default the tree is binary
--top_k arg (=1) Predict arg top labels
--threshold arg Predict labels with marginal probabilities greater than arg
We recommended to use --sgd
with --plt
for the fastest learning and the best memory efficiency.
# To train:
vw --plt <num labels> <train dataset> -f <output model> --sgd -l <learning rate> --kary_tree <tree arity> --passes <num epochs> -b <number of bits in the feature table> -c
# To test:
vw -t -i <model file> <test dataset> --top_k <k top label> -p <prediction file>
More examples and scripts to replicate results on datasets from The Extreme Classification Repository can be found in the xml_experiments
directory.
/*
Copyright (c) by respective owners including Yahoo!, Microsoft, and
individual contributors. All rights reserved. Released under a BSD (revised)
license as described in the file LICENSE.
*/
This is the vowpal wabbit fast online learning code. For Windows, look at README.windows.txt
These prerequisites are usually pre-installed on many platforms. However, you may need to consult your favorite package manager (yum, apt, MacPorts, brew, ...) to install missing software.
- Boost library, with the
Boost::Program_Options
library option enabled. - The zlib compression library + headers. In linux distros: package
zlib-devel
(Red Hat/CentOS), orzlib1g-dev
(Ubuntu/Debian) - lsb-release (RedHat/CentOS: redhat-lsb-core, Debian: lsb-release, Ubuntu: you're all set, OSX: not required)
- GNU autotools: autoconf, automake, libtool, autoheader, et. al. This is not a strict prereq. On many systems (notably Ubuntu with
libboost-program-options-dev
installed), the providedMakefile
works fine. - (optional) git if you want to check out the latest version of vowpal wabbit, work on the code, or even contribute code to the main project.
You can download the latest version from here. The very latest version is always available via 'github' by invoking one of the following:
## For the traditional ssh-based Git interaction:
$ git clone git://github.com/JohnLangford/vowpal_wabbit.git
## For HTTP-based Git interaction
$ git clone https://github.com/JohnLangford/vowpal_wabbit.git
You should be able to build the vowpal wabbit on most systems with:
$ make
$ make test # (optional)
If that fails, try:
$ ./autogen.sh
$ make
$ make test # (optional)
$ make install
Note that ./autogen.sh
requires automake (see the prerequisites, above.)
./autogen.sh
's command line arguments are passed directly to configure
as
if they were configure
arguments and flags.
Note that ./autogen.sh
will overwrite the supplied Makefile
, including the Makefile
s in sub-directories, so
keeping a copy of the Makefile
s may be a good idea before running autogen.sh
. If your original Makefile
s were overwritten by autogen.sh
calling automake
, you may always get the originals back from git using:
git checkout Makefile */Makefile
Be sure to read the wiki: https://github.com/JohnLangford/vowpal_wabbit/wiki for the tutorial, command line options, etc.
The 'cluster' directory has it's own documentation for cluster parallel use, and the examples at the end of test/Runtests give some example flags.
The default C++ compiler optimization flags are very aggressive. If you should run into a problem, consider creating and running configure
with the --enable-debug
option, e.g.:
$ ./configure --enable-debug
or passing your own compiler flags via the OPTIM_FLAGS
make variable:
$ make OPTIM_FLAGS="-O0 -g"
On Ubuntu/Debian/Mint and similar the following sequence should work for building the latest from github:
# -- Get libboost program-options and zlib:
apt-get install libboost-program-options-dev zlib1g-dev
# -- Get the python libboost bindings (python subdir) - optional:
apt-get install libboost-python-dev
# -- Get the vw source:
git clone git://github.com/JohnLangford/vowpal_wabbit.git
# -- Build:
cd vowpal_wabbit
make
make test # (optional)
make install
If you prefer building with clang
instead of gcc
(much faster build
and slighly faster executable), install clang
and change the make
step slightly:
apt-get install clang
make CXX=clang++
A statically linked vw
executable that is not sensitive to boost
version upgrades and can be safely copied between different Linux
versions (e.g. even from Ubuntu to Red-Hat) can be built and tested with:
make CXX='clang++ -static' clean vw test # ignore warnings
OSX requires glibtools, which is available via the brew or MacPorts package managers.
brew install vowpal-wabbit
The homebrew formula for VW is located on github.
brew install libtool
brew install autoconf
brew install automake
brew install boost
brew install boost-python
## Install glibtool and other GNU autotool friends:
$ port install libtool autoconf automake
## Build Boost for Mac OS X 10.8 and below
$ port install boost +no_single +no_static +openmpi +python27 configure.cxx_stdlib=libc++ configure.cxx=clang++
## Build Boost for Mac OS X 10.9 and above
$ port install boost +no_single +no_static +openmpi +python27
Mac OS X 10.8 and below: configure.cxx_stdlib=libc++
and configure.cxx=clang++
ensure that clang++
uses
the correct C++11 functionality while building Boost. Ordinarily, clang++
relies on the older GNU g++
4.2 series
header files and stdc++
library; libc++
is the clang
replacement that provides newer C++11 functionality. If
these flags aren't present, you will likely encounter compilation errors when compiling vowpalwabbit/cbify.cc. These
error messages generally contain complaints about std::to_string
and std::unique_ptr
types missing.
To compile:
$ sh autogen.sh --enable-libc++
$ make
$ make test # (optional)
When using Anaconda as the source for Python the default Boost libraries used in the Makefile need to be adjusted. Below are the steps needed to install the Python bindings for VW. This should work for Python 2 and 3. Adjust the directories to match where anaconda is installed.
# create anaconda environment with boost
conda create --name vw boost
source activate vw
git clone https://github.com/JohnLangford/vowpal_wabbit.git
cd vowpal_wabbit
# edit Makefile
# change BOOST_INCLUDE to use anaconda env dir: /anaconda/envs/vw/include
# change BOOST_LIBRARY to use anaconda lib dir: /andaconda/envs/vw/lib
cd python
python setup.py install
To browse the code more easily, do
make doc
and then point your browser to doc/html/index.html
.