A new Bayesian generative model for social interaction data, for uncovering influence relations from time-stamped conversation data.
Please refer to
Fangjian Guo, Charles Blundell, Hanna Wallach and Katherine A. Heller. The Bayesian Echo Chamber: modeling social influence via linguistic accommodation. AISTATS 2015, San Diego, CA, USA. JMLR: W&CP volume 38.
for details of the model.
├── data a collection of datasets
├── results results are produced here
│ ├── 12-angry-men-analytics.Rmd generating report from result
│ └── Makefile compile an html report from R markdown
├── src
│ ├── bec.py main "Bayesian echo chamber" class
│ ├── bec_sampler.py a wrapper of the sampler of bec
│ ├── hawkes.py an implementation of Hawkes process
│ ├── likelihoods.py several likelihoods
│ ├── run_bec_12angrymen.py a demo script producing result for data/12-angry-men
│ ├── slice_sampler.py slice sampler
│ ├── talkbankXMLparse.py parser for talkbank xml format
└── stopwords
└── english.stop list of stop words in English
- Run
python run_bec_12angrymen.py
would produce samples and other auxiliary files underresults/12-angry-men/
. One could customize scripts based onrun_bec_12angrymen.py
for other datasets and configurations. - Run
make
underresults/
could produce an html report compiled from R Markdown file12-angry-men-analytics.Rmd
. One could customize theRmd
file for analyzing other datasets.
- Python modules (tested under Python 2.7)
- numpy, scipy
- matplotlib
- nltk for word stemming in
talkbankXMLparse.py
- R libraries for generating report
- knitr
- ggplot2
- coda
- plyr
- qgraph
- pander
The conversation data is read from the TalkBank xml format. A conversation consists of several utterances, with each utterance described with the following entities: speaker, content, start time and end time, which looks like the snippet below.
<u who="Juror 7" uID="#7">
<w>So</w>
<w>how</w>
<w>come</w>
<w>you</w>
<w>vote</w>
<w>not</w>
<w>guilty</w>
<media start="47.4640" end="49.3820" unit="s"/>
</u>
Currently, we have prepared the following datasets under data/
directory.
- 12 Angry Men: transcribed from the 1957 movie subtitle.
- SCOTUS: oral arguments from 50 years of the United States Supreme Court, obtained from TalkBank.
- synthetic: a synthetic example with 3 agents speaking with a vocabulary of 20, with time stamps generated from a Hawkes process and contents generated from the BEC model.
This repo is maintained by Richard Guo. We also acknowledge the earlier contribution of Juston Moore to bec.py
, likelihoods.py
and slice_sampler.py
.