This repository contains the code used for the experiments in "Fast solution to the fair ranking problem using the Sinkhorn algorithm" (PRICAI 2024)
In two-sided marketplaces such as online flea markets, recommender systems for providing consumers with personalized item rankings play a key role in promoting transactions between providers and consumers. Meanwhile, two-sided marketplaces face the problem of balancing consumer satisfaction and fairness among items to stimulate activity of item providers. Saito and Joachims (2022) devised an impact-based fair ranking method for maximizing the Nash social welfare based on fair division; however, this method, which requires solving a large-scale constrained nonlinear optimization problem, is very difficult to apply to practical-scale recommender systems. We thus propose a fast solution to the impact-based fair ranking problem. We first transform the fair ranking problem into an unconstrained optimization problem and then design a gradient ascent method that repeatedly executes the Sinkhorn algorithm. Experimental results demonstrate that our algorithm provides fair rankings of high quality and is about 1000 times faster than application of commercial optimization software.
1. install rye
MacOS
curl -sSf https://rye-up.com/get | bash
echo 'source "$HOME/.rye/env"' >> ~/.zshrc
source ~/.zshrc
Linux
curl -sSf https://rye-up.com/get | bash
echo 'source "$HOME/.rye/env"' >> ~/.bashrc
source ~/.bashrc
Windows
see install documentation
Set the dependency resolution method to uv
rye config --set-bool behavior.use-uv=true
First, Please match the index url to the environment at hand to use torch.
# pyproject.toml
dependencies = [
...,
"torch == 2.3.0+cu121", # change this to match your environment
]
[[tool.rye.sources]]
name = "torch"
url = "https://download.pytorch.org/whl/cu121" # change this to match your environment
type = "index"
Second, create virtual environment.
rye sync
. .venv/bin/activate
(This explanation is based on the repository of the kdd2022-fair-ranking-nsw by yuta-saito)
We use "Delicious" and "Wiki10-31K" from The Extreme Classification Repository. Please install the above datasets from the repository and put them under the ./data/ as follows.
root/
├── data
│ ├── delicious.txt
│ └── wiki.txt
Note that we rename the files as follows:
- Delicious_data.txt in Delicious.zip to delicious.txt
- train.txt in Wiki10.bow.zip to wiki.txt , respectively.
from src import create_generator, create_optimizer, evaluate_pi
g = create_generator(
generator_name="synthetic",
# specific generator parameters
n_query=250,
n_doc=1600,
lam=1.0,
)
rel_mat_true, rel_mat_obs = g.generate_rel_mat() # relevance matrix
expo = g.exam_func(K=10, shape="inv") # exposure
optimizer = create_optimizer(
optimizer_name="ot_nsw",
# specific optimizer parameters
eps=1,
lr=0.5,
use_amp=False,
tol=0.01,
device="cuda",
)
pi = optimizer.solve(rel_mat_obs, expo)
print(evaluate_pi(pi, rel_mat_true, expo))
from src import list_optimizers, list_generators
print(list_optimizers())
print(list_generators())
Before running an experiment, you need register wandb. please see wandb documentation for details.
You can easily perform experiments by changing the parameters because hydra is used. The following commands perform experiments with optimizer=nsw, number of documents of 100, 200, 300, and 400.
rye run python main.py -m optimizer=nsw generator.n_doc=100,200,300,400
make run-all
The following command exports the experiment results from wandb to the local directory.
rye run python tools/export_results.py --user_name {your wandb user name}
- Saito, Y., & Joachims, T. (2022, August). Fair Ranking as Fair Division: Impact-Based Individual Fairness in Ranking. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 1514-1524).