Analyzing US survey data using an LPM

Dataset

TODO: explain CCES

Exploring data using an LPM: Mutual Information and Conditional Probability

We have access to all this data, now what? We can start by trying to assess the relationships between variables: one way to do this is by computing mutual information values. A way to measure the dependence between two variables, mutual information is equal to zero if and only if the variables are independent, and it is positive if there is any dependence between the variables. By plotting the mutual information between all pairs of variables, we can get a sense of how strongly different variables are related:

What is the strength of dependence between all pairs of variables?

Select var1, var2, mi
From Select
Mutual Information
Between var1 and var2
Under model as mi,
var1,
var2,
From schema

A few relationships pop out as specially strong. TODO For instance, if we're interested in figuring out which variables are strongly associated with whether or not someone supports a ban on assault rifles, we can see from the graph above that the variable most strongly associated with this is whether the respondent approves of the president:

What is the probability that someone supports a ban
on assault rifles given their approval of the president?

Probability of support_ban_assault_rifle
Under model
Given approval_president

Conditional mutual information: finding interactions between multiple variables

LPMs allow us to go beyond just looking at the relationships between pairs of variables. We can also look at the relationships between three variables, by computing conditional mutual information: the mutual information between two variables, conditioned on a third variable. In particular, we can ask under which cases the relationship between two variables changes once we condition on a third one. To answer this question, we'll expand the above example on support for banning assault rifles by computing the difference between conditional mutual information and mutual information:

How does the dependence between supporting a ban on assault rifles
and other variables change after conditioning?

Select var, condition_var, cmi - mi
From Select
Mutual Information
Between support_ban_assault_rifle and var
Under model Given condition_var as cmi,
Mutual Information
Between support_ban_assault_rifle and var2
Under model as mi,
var,
condition_var,
From schema

Two kinds of relationship pop out: - Large negative values (dark red) indicate that the relationship between two variables is much weaker once we condition on a third variable. - Large positive values (dark blue) indicate that the relationship between two variables is much stronger once we condition on a third variable. Zooming in on the latter case, we see that the relationship between support for banning assault rifles and approval of the current member of the state's congress is much stronger once condition on which party has a majority in the state senate. Here is what the relationship looks like conditioned on the party that has a majority in that state's senate:

What is the relationship between supporting ban on assault rifles
and approving of the current state congress members, conditioned
on the party that has a majority in the state's senate?

Probability of support_ban_assault_rifle
Under model
Given approval_current_member_congress,
which_party_has_majority_in_my_state_senate