panelView: Visualizing Panel Data

The panelView package has two main functionalities: (1) it visualizes the treatment and missing-value statuses of each observation in a panel/time-series-cross-sectional (TSCS) dataset; and (2) it plots the outcome variable (either continuous or discrete) in a time-series fashion.

We develop this package in the belief that it is always a good idea to understand your raw data better before conducting statistical analyses.

Authors: Licheng Liu (MIT); Yiqing Xu (Stanford)

Date: July 21, 2021

Version: 1.1.5 (Github); 1.1.5 (CRAN)

Reference: Liu, Licheng and Yiqing Xu (2018). “panelView: an R package for visualizing panel data.” Available at https://bit.ly/panelview4r.

R code used in this demonstration can be downloaded from here.

Updates in v.1.1.5

Fix bugs. CRAN release.

Installation
Plot treatment status and missing values (type = "treat")
Plot an outcome variable (or any variable) in a panel dataset (type = "outcome")
Update log

Installation

You can install the panelView package from CRAN:

install.packages('panelView')

You can also install the up-to-date development version from GitHub:

install.packages('devtools', repos = 'http://cran.us.r-project.org') # if not already installed
devtools::install_github('xuyiqing/panelView')

panelView depends on the following packages, which will be installed automatically when panelView is being installed; you can also install them manually:

require(ggplot2)  
require(gridExtra)

Plotting Treatment Conditions and Missing Values

First, we show how to visualize the treatment conditions and missing values in a panel dataset. The treatment indicator may be dichotomous or have more than 2 levels. We first load the panelView package, which ships three datasets.

library(panelView)
data(panelView)
ls()

## [1] "capacity" "simdata"  "turnout"

Using the turnout dataset (a balanced panel), we show the treatment status of Election Day Registration (EDR) in each state in a given year (Xu 2017). The first variable on the right-hand-side of the regression formula is designated as the treatment indicator. Including covariates may change the plot because of missing values in these covariates. The index option specifies the unit (group) and time indicators. We can change the labels (titles) of x- and y-axes through xlab and ylab, respectively.

panelView(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, index = c("abb","year"), xlab = "Year", ylab = "State")

We can use the by.timing option to sort units by the timing of receiving the treatment (then by the total number of periods exposed to the treatment). Users can also specify the background color using the background option and use various cex options to adjust fontsizes for texts appearing in the main title (cex.main), axes (cex.axis), axis labels (cex.lab), and legend (cex.legend).

panelView(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, index = c("abb","year"), xlab = "Year", ylab = "State", by.timing = TRUE, legend.labs = c("No EDR", "EDR"), background = "white", cex.main = 20, cex.axis= 8, cex.lab = 12, cex.legend = 12)

## Specified labels in the order of: Under Control, Under Treatment.

We can use the main option to change the title of the plot, use the axis.lab.gap option to change the gaps between labels on the x- and y-axes, and use the legend.labs option to change the labels shown in the legend. For DID-type TSCS data with a dichotomous treatment indicator, we can distinguish the pre- and post-treatment periods for treated units by specifying pre.post = TURE (which is set as default prior to v.1.1.0):

panelView(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, index = c("abb","year"), xlab = "Year", ylab = "State", pre.post = TRUE)

Again, we can change the labels in the legend. Make sure the length of legend.labs is the same as the number of treatment statuses (in this case, 3).

panelView(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, index = c("abb","year"), xlab = "Year", ylab = "State", by.timing = TRUE, pre.post = TRUE, legend.labs = c("Control States", "Treated States (before EDR)", "Treated States (after EDR)"))

## Specified labels in the order of: Controls, Treated (Pre), Treated (Post).

We can remove the labels on the y-axis (or x-axis) by specifying axis.lab = "time" (or "unit"). Setting axis.lab = "off" will remove labels on both axes. The default is axis.lab = "both".

panelView(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, index = c("abb","year"), main = "EDR Reform", axis.lab = "time")

Instead of specifying a formula, we can also directly give the variable name of the treatment indicator:

panelView(D = "policy_edr", data = turnout, index = c("abb","year"), xlab = "Year", ylab = "State", pre.post = TRUE,main = "EDR Reform", axis.lab = "time")

We can change the colors of the bricks for the controls and treated using the color option. Colors should be specified in the order of “treated-pre”, “treated-post” and “control”—plus “missing” if the dataset contains missing values—if pre.post = TRUE and in the order of “control”, “treated” if pre.post = FALSE:

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

turnout %>% panelView(turnout ~ policy_edr + policy_mail_in + policy_motor, color = c("#B0C4DE","white"), by.timing = TRUE, index = c("abb","year"), xlab = "Year", ylab = "State")

## Specified colors in the order of: Under Control, Under Treatment.

For a panel dataset in which the treatment may switch on and off, we do not differentiate between pre- and post-treatment statuses. To demonstrate how panelView can be used in a more general setting, the following plot uses the capacity dataset, which is used to investigate the effect of democracy, the treatment, on state capacity, the outcome (Wang and Xu 2018). From the figure below, we see quite a few cases of democratic reversals and that there are many missing values.

panelView(Capacity ~ demo + lnpop + lngdp, data = capacity, index = c("ccode", "year"), axis.lab.gap = c(2,10), main = "Democracy and State Capacity")

Sorting units based on the first period a unit receives the treatment gives a more appealing visual:

panelView(Capacity ~ demo + lnpop + lngdp, data = capacity, index = c("ccode", "year"), axis.lab.gap = 2, main = "Democracy and State Capacity: Treatment Status", by.timing = TRUE, axis.lab = "time")

Plotting a subset of units

Sometimes a dataset has many units and we only want to take a peak of a subset of the units. panelView allows users to specify the units to be be shown by specifying the show.id (units in their alphabetical order) or id (original unit ids recoded in the “unit” variable) options. In the following figure, we plot the treatment statuses of the first 25 units.

panelView(Capacity ~ demo + lnpop + lngdp, data = capacity, index = c("ccode", "year"), axis.lab.gap = c(2,0), main = "Democracy and State Capacity", show.id = c(1:25))

If the formula is not provided, we can use the argument D to specify the treatment indicator. For example:

panelView(D = "demo", data = capacity, index = c("ccode", "year"), axis.lab.gap = c(2,0), main = "Democracy and State Capacity", show.id = c(1:25), type = "treat")

Below we plot the treatment statuses of the next 25 units by calling their names, i.e., their ccode. Note that several countries are removed from the plot due to missing values in the specified variables.

panelView(Capacity ~ demo + lnpop + lngdp, data = capacity, index = c("ccode", "year"), axis.lab.gap = c(2,0), main = "Democracy and State Capacity", id = unique(capacity$ccode)[26:50], by.timing = TRUE)

## List of units removed from dataset: 260 265 342 346 347

Ignoring treatment conditions

Starting from v.1.0.3, we allow users to omit the treatment variable in a "treat" plot, in which case, the plot will show missing and non-missing values only.

capacity %>% panelView(Capacity ~ 1, index = c("ccode","year"), axis.lab="off")

Alternatively, if right-hand-side variables are included, we can specify ignore.treat=TRUE. Variables in the formula will change the plot by introducing missing values.

capacity %>%  panelView(Capacity ~ demo, ignore.treat = TRUE, index = c("ccode","year"), axis.lab= "off")

We can directly specify the variable name of the outcome Y instead to show its missing pattens (no formula is supplied):

capacity %>% panelView(Y = "Capacity", index = c("ccode", "year"), axis.lab.gap = c(2,10))

More than Two Treatment Conditions

Starting from v.1.1.0, panelView supports TSCS data with more than 2 treatment levels. For example, we create a measure of regime type with three treatment levels.

demo2 <- rep(0, dim(capacity)[1])
demo2[which(capacity$polity2 < -0.5)] <- -1
demo2[which(capacity$polity2 > 0.5)] <- 1
capacity$demo2 <- demo2

panelView(Capacity ~ demo2 + lngdp, data = capacity, index = c("ccode", "year"), axis.lab.gap = c(2,10), main = "Regime Type")

## 3 treatment levels.

library(RColorBrewer) 
mycol<-brewer.pal(3,"Set1")[c(1,3,2)]
panelView(Capacity ~ demo2, data = capacity, index = c("ccode", "year"), axis.lab.gap = c(2), main = "Regime Type", axis.lab = c("time"), color = mycol, legend.labs = c("Autocracy", "Hybrid", "Democracy"))

## 3 treatment levels.
## Specified colors in the order of: Treatment level: -1, Treatment level: 0, Treatment level: 1.
## Specified labels in the order of: Treatment level: -1, Treatment level: 0, Treatment level: 1.

This is equivalent to:

panelView(D = "demo2", data = capacity, index = c("ccode", "year"), axis.lab.gap = c(2), main = "Regime Type", axis.lab = c("time"), color = mycol, legend.labs = c("Autocracy", "Hybrid", "Democracy"))

If the number of treatment levels is greater than 5, then the treatment indicator will be regarded as a continuous variable. We can remove the grid lines by specifying gridOff = TRUE:

panelView(Capacity ~ polity2 + lngdp, data = capacity, index = c("ccode", "year"), axis.lab.gap = c(2,10), main = "Regime Type", gridOff = TRUE)

## 21 treatment levels.
## Continuous treatment.

We can also change the (start and end) colors using the option color:

panelView(Capacity ~ polity2 + lngdp, data = capacity, index = c("ccode", "year"), axis.lab.gap = c(2,10), main = "Regime Type", color = c("yellow", "red"), background = "white")

## 21 treatment levels.
## Continuous treatment.

Plotting a Variable in a Panel Dataset

The second functionality of panelView is to show the raw outcome variable of a panel dataset in a time-series fashion. The syntax is very similar except that we need to specify type = "outcome". We can control the ranges of the data to be shown by supplying xlim and ylim. Different colors represent different treatment conditions.

panelView(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, index = c("abb","year"), type = "outcome", main = "EDR Reform and Turnout", ylim = c(0,100),xlab = "Year", ylab = "Turnout")

Similarly, we can specify the labels shown in the legend using legend.labs. Note that we can also turn off legends by specifying legendOff = TRUE.

panelView(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, index = c("abb","year"), type = "outcome", main = "EDR Reform and Turnout", legend.labs = c("Control States","Treated States (before EDR)", "Treated States (after EDR)"))

We can also use the black and white theme by specifying theme.bw = TRUE.

panelView(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, index = c("abb","year"), type = "outcome", main = "EDR Reform and Turnout", theme.bw = TRUE)

And we can change the colors for observations under different treatment statuses using the option color.

panelView(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, index = c("abb","year"), type = "outcome", main = "EDR Reform and Turnout", color = c("#FC8D6280", "red", "#99999950"), legend.labs = c("Control States","Treated States (before EDR)", "Treated States (after EDR)"), theme.bw = TRUE)

## Specified colors in the order of "treated (pre)", "treated (post)", "control".

Again, we can specify which unit(s) we want to take a look at:

panelView(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, index = c("abb","year"), type = "outcome", main = "EDR Reform and Turnout (AL, AR, CT)", id = c("AL", "AR", "CT"))

To better understand the data, sometimes we want to plot the outcome based on whether the treatment status has changed during the observed time period. We can simply add an option by.group = TRUE. The algorithm will analyze the data and automatically put each unit into different groups, e.g. (1) always treated, (2) always in control, (3) treatment status changed. Users can adjust the fontsizes of the title and subtitles using the cex.main and cex.main.sub options, respectively.

panelView(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, index = c("abb","year"), type = "outcome", main = "EDR Reform and Turnout", by.group = TRUE, cex.main = 20, cex.main.sub = 15)

Ignore treatment status

Starting from v 1.0.3, we allow users to omit the treatment indicator. Therefore, panelView can in fact be used to visualize any variable in a panel dataset.

panelView(turnout ~ 1, data = turnout, index = c("abb","year"), type = "outcome", main = "Turnout", ylim = c(0,100), xlab = "Year", ylab = "Turnout")

Alternatively, by setting ignore.treat = TRUE. The right-hand-side variables may change the plot by introducing missing values.

panelView(Y = "turnout", data = turnout, index = c("abb","year"), type = "outcome", main = "Turnout", ylim = c(0,100), xlab = "Year", ylab = "Turnout")

Discrete outcomes

We can accommodate discrete variables by setting outcome.type = "discrete". Below is an example using the simdata dataset, in which the outcome variable takes three values: 0, 1, and 2.

panelView(Y ~ D, data = simdata, index = c("id", "time"), by.group = FALSE, outcome.type = "discrete", type = "outcome", xlim = c(8, 15))

We split the sample based on changes in treatment status and use the black and white theme:

panelView(Y ~ D, data = simdata, index = c("id", "time"), by.group = TRUE, outcome.type = "discrete", type = "outcome",  xlim = c(8, 15), theme.bw = TRUE)

If the treatment indicator has more than 2 treatment levels or a continuous variable (e.g. polity2), then treatment status will not be shown on the "outcome" plot:

panelView(Capacity ~ polity2 + lngdp, data = capacity, index = c("ccode", "year"), 
          main = "Measuring State Capacity", type = "outcome", legendOff = TRUE)

## 21 treatment levels.

Update Log

Updates in v.1.0.3

Allow users to change the color of bricks in the “missing” plot.
Allow users to leave the treatment blank in both the “missing” and “raw” plots.

Updates in v.1.0.4

Allow users to plot treated units on top of control units in the “missing” plot.
Streamline the color option for both the “missing” and “raw” plots.

Updates in v.1.0.5

Fix typos. CRAN release.

Updates in v.1.1.2

Change the plot type: we now use "treat" ("missing" in earlier versions) to plot treatment status and "outcome" ("raw" in earlier versions) to plot raw outcomes.
Allow >2 treatment levels.
Add a new option pre.post to distinguish pre- and post-treatment observations for treated units in a DID setting.
Replace options by.treatment with by.timing and treatment with ignore.treat for easier interpretations.
Add fontsize options.

Updates in v.1.1.4

Add a new option treat.type to control whether the treatment variable should be seen as a continuous (treat.type = "continuous") or discrete (treat.type = "discrete") variable.

Please report bugs and let us know if you have any suggestions! -> yiqingxu [at] ucsd.edu