-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathlegislators.Rmd
481 lines (409 loc) · 17.3 KB
/
legislators.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
# Legislators: Estimating Legislators' Ideal Points From Voting Histories {#legislators}
```{r legislators_setup,message=FALSE}
library("pscl")
library("tidyverse")
library("forcats")
library("stringr")
library("rstan")
library("sn")
```
Recorded votes in legislative settings (roll calls) are often used to recover
the underlying preferences of legislators.[^legislators-src] Political
scientists analyze roll call data using the Euclidean spatial voting model:
each legislator ($i = 1, ..., n$) has a preferred policy position ($x_i$, a
point in low-dimensional Euclidean space), and each vote ($j = 1, ..., m$)
amounts to a choice between "Aye" and a "Nay" locations, $q_j$ and $Y_j$,
respectively. Legislators are assumed to choose on the basis of utility
maximization, with utilities (in one-dimension)
In these models, the only observed data are votes, and the analyst wants to
model those votes as a function of legislator- ($\xi_i$), and vote-specific ($\alpha_i$, $\beta_i$) parameters.
The vote of legislator $i$ on roll-call $j$ ($y_{i,j}$) is a function of a
the legislator's ideal point ($\xi_i$), the vote's cutpoint ($\alpha_j$),
and the vote's discrimination ($\beta_j$):
$$
\begin{aligned}[t]
y_{i,j} &\sim \mathsf{Bernoulli}(\pi_i) \\
\pi_i &= \frac{1}{1 + \exp(-\mu_{i,j})} \\
\mu_{i,j} &= \beta_j \xi_i - \alpha_j
\end{aligned}
$$
## Identification
Ideal points (like many latent space models) are unidentified.
In particular, there are three types of invariance:
1. Additive aliasing
1. Multiplicative aliasing
1. Rotation (reflection) invariance
Scale invariance:
$$
\begin{aligned}[t]
\mu_{i,j} &= \alpha_j + \beta_j \xi_i \\
&= \alpha_j + \left(\frac{\beta_j}{c}\right) \left(\xi_i c \right) \\
&= \alpha_j + \beta^*_j \xi^*_i
\end{aligned}
$$
Addition invariance:
$$
\begin{aligned}[t]
\mu_{i,j} &= \alpha_j + \beta_j \xi_i \\
&= \alpha_j - \beta_j c + \beta_j c + \beta_j \xi_i \\
&= (\alpha_j - \beta_j c) + \beta_j (\xi_i + c) \\
&= \alpha_j^{*} + \beta_j \xi^{*}_i
\end{aligned}
$$
Rotation invariance:
$$
\begin{aligned}[t]
\mu_{i,j} &= \alpha_j + \beta_j \xi_i \\
&= \alpha_j + \beta_j (-1) (-1) \xi_i \\
&= \alpha_j + (-\beta_j) (-\xi_i) \\
&= \alpha_j + \beta_j^{*} \xi^{*}_i
\end{aligned}
$$
Example:
```{r legislators_identification}
xi <- c(-1, -0.5, 0.5, 1)
alpha <- c(1, 0, -1)
beta <- c(-0.5, 0, 0.5)
y <- matrix(c(1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1), 3, 4)
k <- 1
list(sum(plogis(y - (alpha + beta %o% xi))),
sum(plogis(y - (alpha + -beta %o% -xi))),
sum(plogis(y - ((alpha - beta * k) + beta %o% (xi + k)))),
sum(plogis(y - ((alpha + (beta / k) %o% (xi * k))))))
```
For each of these: Which types of rotation does it solve?
1. Fix one element of $\beta$.
1. Fix one element of $\xi$.
1. Fix one element of $\alpha$.
1. Fix two elements of $\alpha$.
1. Fix two elements of $\xi$.
1. Fix two elements of $\beta$.
## 109th Senate
This example models the voting of the [109th U.S. Senate](https://en.wikipedia.org/wiki/109th_United_States_Congress).
Votes for the 109th Senate is included in the **pscl** package:
```{r}
data("s109", package = "pscl")
```
The `s109` object is not a data frame, so see its documentation for information about its structure.
```{r}
s109
```
This data includes all
[roll-call](https://en.wikipedia.org/wiki/Voting_methods_in_deliberative_assemblies)
votes, votes in which the responses of the senators are recorded.
For simplicity, the ideal point model uses binary responses, but the `s109`
data includes multiple [codes](http://voteview.com/senate109.htm) for response
to roll-calls.
| 0 | not a member |
| 1 | Yea |
| 2 | Paired Yea |
| 3 | Announced Yea |
| 4 | Announced Nay |
| 5 | Paired Nay |
| 6 | Nay |
| 7 | Present (some Congresses, also not used some Congresses) |
| 8 | Present (some Congresses, also not used some Congresses) |
| 6 | Nay |
| 9 | Not Voting |
In the data processing, we will aggregate the responses into "Yes", "No", and missing values.
- `close`: Definition of non-lopsided votes in ; votes with between 35% and 65% yeas in which the parties are likely to whip members.
- `lopsided`: Definition of lopsided votes used in W-NOMINATE and dropped. Votes with less than 2.5% or greater than 97.5% yeas.
```{r}
s109_vote_data <- as.data.frame(s109$vote.data) %>%
mutate(rollcall = paste(session, number, sep = "-"),
passed = result %in% c("Confirmed", "Agreed To", "Passed"),
votestotal = yeatotal + naytotal,
yea_pct = yeatotal / (yeatotal + naytotal),
unanimous = yea_pct %in% c(0, 1),
close = yea_pct < 0.35 | yea_pct > 0.65,
lopsided = yea_pct < 0.025 | yea_pct > 0.975) %>%
filter(!unanimous) %>%
select(-unanimous) %>%
mutate(.rollcall_id = row_number())
s109_legis_data <- as.data.frame(s109$legis.data) %>%
rownames_to_column("legislator") %>%
mutate(.legis_id = row_number(),
party = fct_recode(party,
"Democratic" = "D",
"Republican" = "R",
"Independent" = "Indep"))
s109_votes <- s109$votes %>%
as.data.frame() %>%
rownames_to_column("legislator") %>%
gather(rollcall, vote, -legislator) %>%
# recode to Yea (TRUE), Nay (FALSE), or missing
mutate(yea = NA,
yea = if_else(vote %in% c(1, 2, 3), TRUE, yea),
yea = if_else(vote %in% c(4, 5, 6), FALSE, yea)
) %>%
filter(!is.na(yea)) %>%
inner_join(dplyr::select(s109_vote_data, rollcall, .rollcall_id), by = "rollcall") %>%
inner_join(dplyr::select(s109_legis_data, legislator, party, .legis_id), by = "legislator")
partyline <-
s109_votes %>%
group_by(.rollcall_id, party) %>%
summarise(yea = mean(yea)) %>%
spread(party, yea) %>%
ungroup() %>%
mutate(partyline = NA_character_,
partyline = if_else(Republican < 0.1 & Democratic > 0.9,
"Democratic", partyline),
partyline = if_else(Republican > 0.9 & Democratic < 0.1,
"Republican", partyline)) %>%
rename(pct_yea_D = Democratic, pct_yea_R = Republican) %>%
select(-Independent)
s109_vote_data <-
left_join(s109_vote_data, partyline, by = ".rollcall_id")
```
## Identification by Fixing Legislator's Ideal Points
Identification of latent state models can be challenging. The first method for identifying ideal point models is to fix the values of two legislators.
These can be arbitrary, but if they are chosen along the ideological dimension of
interest it can help the substantive interpretation.
Since we \textit{a priori} know, or expect, that the primary ideological dimension is Liberal-Conservative [@PooleRosenthal2000a], I'll fix the ideal points of the two
party leaders in that congress.
In the 109th Congress, the Republican party was the majority party and [Bill Frist](https://en.wikipedia.org/wiki/Bill_Frist) (Tennessee) was the majority (Republican) leader, and [Harry Reid](https://en.wikipedia.org/wiki/Harry_Reid) (Nevada) wad the minority (Democratic) leader:
$$
\begin{aligned}[t]
\xi_\text{FRIST (R TN)} & = 1 \\
\xi_\text{REID (D NV)} & = -1
\end{aligned}
$$
For all of those give a weakly informative prior to the ideal points, and item difficulty and discrimination parameters,
$$
\begin{aligned}[t]
\xi_{i} &\sim \mathsf{Normal}(\zeta, \tau) \\
\zeta &\sim \mathsf{Normal}(0., 10) \\
\tau &\sim \mathsf{HalfCauchy}(0., 5) \\
\alpha_{j} &\sim \mathsf{Normal}(0, 10) \\
\beta_{j} &\sim \mathsf{Normal}(0, 2.5) && j \in 1, \dots, J
\end{aligned}
$$
```{r legislators_mod_ideal_point_1,results='hide',cache.extra=tools::md5sum("stan/ideal_point_1.stan")}
mod_ideal_point_1 <- stan_model("stan/ideal_point_1.stan")
```
```{r}
mod_ideal_point_1
```
Create a data frame with the fixed values for identification.
Additionally, set initial values of ideal points: Republicans at `xi = 1`, Democrats at `xi = -1`, and independents at `xi = 0`.
This may help speed up convergence.
```{r legislators_xi_init_1}
xi_1 <-
s109_legis_data %>%
mutate(
xi = if_else(legislator == "FRIST (R TN)", 1,
if_else(legislator == "REID (D NV)", -1, NA_real_)),
init = if_else(party == "Republican", 1,
if_else(party == "Democratic", -1, 0)))
```
Define and setup all the data needed for this
```{r legislators_data_1}
legislators_data_1 <-
within(list(), {
y <- as.integer(s109_votes$yea)
y_idx_leg <- as.integer(s109_votes$.legis_id)
y_idx_vote <- as.integer(s109_votes$.rollcall_id)
Y_obs <- length(y)
N <- max(s109_votes$.legis_id)
K <- max(s109_votes$.rollcall_id)
# priors
alpha_loc <- 0
alpha_scale <- 5
beta_loc <- 0
beta_scale <- 2.5
N_xi_obs <- sum(!is.na(xi_1$xi))
idx_xi_obs <- which(!is.na(xi_1$xi))
xi_obs <- xi_1$xi[!is.na(xi_1$xi)]
N_xi_param <- sum(is.na(xi_1$xi))
idx_xi_param <- which(is.na(xi_1$xi))
tau_scale <- 5
zeta_loc <- 0
zeta_scale <- 10
})
```
```{r legislators_init_1}
legislators_init_1 <- list(
list(xi_param = xi_1$init[is.na(xi_1$xi)])
)
```
```{r message=FALSE,results='hide',warning=FALSE}
mod_ideal_point_1 <- stan_model("stan/ideal_point_1.stan")
```
```{r legislators_fit_1,results='hide'}
legislators_fit_1 <-
sampling(mod_ideal_point_1, data = legislators_data_1,
chains = 1, iter = 500,
init = legislators_init_1,
refresh = 100,
pars = c("alpha", "beta", "xi"))
```
Extract the ideal point data:
```{r legislator_summary_1}
legislator_summary_1 <-
bind_cols(s109_legis_data,
as_tibble(summary(legislators_fit_1, par = "xi")$summary)) %>%
mutate(legislator = fct_reorder(legislator, mean))
```
```{r legislator_plot_1,fig.height=8,fig.width=4,fig.cap="Estimated Ideal Points of the Senators of the 109th Congress"}
ggplot(legislator_summary_1,
aes(x = legislator, y = mean,
ymin = `2.5%`, ymax = `97.5%`, colour = party)) +
geom_pointrange() +
coord_flip() +
scale_color_manual(values = c(Democratic = "blue", Independent = "gray", Republican = "red")) +
labs(y = expression(xi[i]), x = "", colour = "Party") +
theme(legend.position = "bottom")
```
## Identification by Fixing Legislator's Signs
We can identify the scale and location of the latent dimensions by fixing the mean and location of the distribution of the legislator's ideal points.
This can be done by
$$
\xi_i \sim \mathsf{Normal}(0, 1)
$$
This does not exactly fix the mean of $\xi$ in any particular simulation.
As the sample size increases, $n \to \infty$ will give $mean(\xi) \to 0$ and $sd(\xi) \to 1$.
This "soft-identification" will also overestimate the uncertainty in the ideal points of legislators (see paper by ... ? ).
So in the estimation, I do the following to ensure that in each simulation, $\xi$ has exactly mean 0 and standard deviation 1.
$$
\begin{aligned}[t]
\xi_i^{*} &\sim \mathsf{Normal}(0, 1) \\
\xi_i &= \frac{\xi^*_i - \mathrm{mean}(\xi)}{\mathrm{sd}(\xi)}
\end{aligned}
$$
However, this does not identify the direction of the latent variables.
This can be done by restriction the sign of either a single legislator ($\xi_i$) or roll-call ($\beta_i$).
In this case, instead of fixing the ideal points of two legislators, instead use skew-normal distributions to set the sides of each legislator.
$$
\beta_j \sim \mathsf{SkewNormal}(0, 2.5, d_j )
$$
where
$$
d_j =
\begin{cases}
-50 & \text{if Democratic party line vote} \\
0 & \text{not a party line vote} \\
50 & \text{if Republican party line vote}
\end{cases} .
$$
The skew-normal distribution is an extension of the normal distribution with an additional skewness parameter,
$$
y \sim \mathsf{SkewNormal}(\mu, \sigma, \alpha)
$$
as $|\alpha| \to \infty$, the skew-normal approaches a half-normal distribution.
```{r}
map_df(c(-50, 0, 50),
function(alpha) {
tibble(x = seq(-4, 4, by = 0.1),
density = dsn(x, alpha = alpha),
alpha = alpha)
}) %>%
ggplot(aes(x = x, y = density, colour = factor(alpha))) +
geom_line()
```
```{r legislators_mod_ideal_point_3,results='hide',cache.extra=tools::md5sum("stan/ideal_point_3.stan")}
mod_ideal_point_3 <- stan_model("stan/ideal_point_3.stan")
```
```{r}
mod_ideal_point_3
```
Instead of fixing the ideal points, set the skewness parameter
of the skew normal distribution so that $\xi_{\text{FRIST (R TN)}} > 0$.
```{r legislators_data_2}
legislators_data_2 <-
within(list(), {
y <- as.integer(s109_votes$yea)
y_idx_leg <- as.integer(s109_votes$.legis_id)
y_idx_vote <- as.integer(s109_votes$.rollcall_id)
Y_obs <- length(y)
N <- max(s109_votes$.legis_id)
K <- max(s109_votes$.rollcall_id)
# priors
alpha_loc <- 0
alpha_scale <- 5
beta_loc <- 0
beta_scale <- 2.5
xi_skew <- if_else(s109_legis_data$legislator == "FRIST (R TN)", 50, 0)
})
```
```{r legislators_init_2}
legislators_init_2 <- function(chain_id) {
list(xi_raw = if_else(s109_legis_data$party == "Republican", 1,
if_else(s109_legis_data$party == "Democratic", -1, 0)))
}
```
```{r legislators_fit_2,results='hide'}
legislators_fit_2 <- sampling(mod_ideal_point_3,
data = legislators_data_2,
init = legislators_init_2,
chains = 1, iter = 500,
pars = c("alpha", "beta", "xi"))
```
## Identification by Discrimination Parameters' Signs
Alternatively, we can identify the location and scale of the latent dimensions with the
legislator's ideal points,
$$
\begin{aligned}[t]
\xi_i^{*} &\sim \mathsf{Normal}(0, 1) \\
\xi_i &= \frac{\xi^{*}_i - \mathrm{mean}(\xi)}{\mathrm{sd}(\xi)} ,
\end{aligned}
$$
and identify the rotation of latent dimensions by fixing the sign of the discrimination parameter of a single roll-call vote ($\beta_j$).
```{r legislators_mod_ideal_point_2,results='hide',cache.extra=tools::md5sum("stan/ideal_point_2.stan")}
mod_ideal_point_2 <- stan_model("stan/ideal_point_2.stan")
```
```{r}
mod_ideal_point_2
```
As before, we will restrict the sign using a skew-normal distribution with a large skewness parameter.
Theoretically, it does not matter which parameter is restricted, but it can be useful for both interpretation and computation to restrict the sign of the parameter that *ex ante* you expect to be far from zero.
Since, in this case, it is expected that the primary dimension is liberal-conservative, and that the current Republican/Democratic parties divide on those lines, we will choose
a bill that splits on party lines.
We'll fix $\beta_{\text{2-169}}$ which was a roll-call vote which perfectly split on party
lines: 55 Republican senators in favor, and 43 Democratic senators opposed.
Of the roll-call votes that perfectly split on party lines, this had the most total votes cast.
```{r legislators_data_3}
legislators_data_3 <-
within(list(), {
y <- as.integer(s109_votes$yea)
y_idx_leg <- as.integer(s109_votes$.legis_id)
y_idx_vote <- as.integer(s109_votes$.rollcall_id)
Y_obs <- length(y)
N <- max(s109_votes$.legis_id)
K <- max(s109_votes$.rollcall_id)
# priors
alpha_loc <- 0
alpha_scale <- 5
beta_loc <- rep(0, K)
beta_scale <- rep(2.5, K)
beta_skew <- if_else(s109_vote_data$rollcall == "2-169", 50, 0)
})
```
```{r legislators_init_3}
legislators_init_3 <- function(chain_id) {
list(beta = if_else(s109_vote_data$partyline %in% "Republican", 1,
if_else(s109_vote_data$partyline %in% "Democratic", -1,
0)),
alpha = plogis(s109_vote_data$yea_pct),
xi_raw = if_else(s109_legis_data$party == "Republican", 1,
if_else(s109_legis_data$party == "Democratic", -1, 0)))
}
```
```{r legislators_fit_3,results='hide'}
legislators_fit_3 <- sampling(mod_ideal_point_2,
data = legislators_data_3,
init = legislators_init_3,
chains = 1, iter = 500,
pars = c("alpha", "beta", "xi"))
```
## Questions
- Plot the the distributions of $\alpha$ and $\beta$.
- Estimate the model with improper priors for $\alpha$, $\beta$, and $\xi$. What happens?
- Estimate the the model that fixes the distribution of legislators, but
does not fix the signs of legislators. Run two chains. In one chain use
the following starting values, $\xi_i = 1$ for all Democratic and
Independent Senators, $\xi_i = -1$ for all Republican Senators; in the
other use $\xi_i = -1$ for all Democratic and Independent Senators, and
$\xi_i = 1$ for all Republican Senators. Visualize the densities of
various $\xi_i$ points. Look for evidence of bimodality.
- Extend the model to $K > 1$ dimensions.
[^legislators-src]: Example derived from Simon Jackman, "Legislators: estimating legislators' ideal points from voting histories (roll call data)", *BUGS Examples,* 2007-07-24, [URL](https://web-beta.archive.org/web/20070724034141/http://jackman.stanford.edu:80/mcmc/legislators.odc).