4  Network Structure

Note

The type = "network" feature is currently available only on the development branch. Install it with:

devtools::install_github("xuyiqing/panelView@dev")

When estimating models with multiple sets of fixed effects, the connectivity structure of the data determines what can be identified. As Correia (2016) shows, the fixed-effect estimation problem is equivalent to solving a linear system on a graph, and the structure of that graph has direct consequences for estimation and inference.

Two features of this structure are particularly important:

The panelView package visualizes them with the type = "network" option, constructing a \(k\)-partite graph from \(k \geq 2\) sets of fixed effects.

4.1 Network elements

In the network plot, each distinct level of a fixed-effect dimension becomes a node. In a standard unit \(\times\) time panel, there is one node for each unit and one node for each time period. Different fixed-effect dimensions are distinguished by shape: circles for the first dimension (e.g., units), squares for the second (e.g., time periods), triangles for the third, and so on.

Each observation in the data creates an edge (link) between the nodes it connects. For example, if unit \(A\) is observed at time \(t\), an edge is drawn between node \(A\) and node \(t\). If the same combination appears multiple times (duplicate observations), the edge becomes thicker to reflect the count.

In short,

  • nodes represent fixed-effect levels (units, time periods, firms, etc.);
  • edges represent observed combinations.

The resulting plot reveals connected components, singletons, and duplicate observations at a glance.

  • Connected components: groups of nodes that are linked to each other through some chain of edges. Nodes in different components share no observations and are shaded with distinct convex hulls.
  • Singletons: nodes with exactly one edge (degree 1), highlighted with a colored glow ring.
  • Duplicate observations: when the same combination of fixed-effect levels appears more than once, the edge becomes thicker to reflect the count.

We first load the package.

library(panelView)
data(panelView)

4.2 Basic usage: Unit \(\times\) Time

With the standard panel structure (unit \(\times\) time), the function constructs a bipartite graph: units are shown as circles, time periods as squares, and edges connect each unit to the periods in which it is observed.

4.2.1 Balanced panel

The turnout dataset is a balanced panel of 47 US states over 24 election years. Because every state is observed in every year, the graph is a complete bipartite graph with a single connected component and no singletons.

panelview(turnout, formula = ~ 1,
          index = c("abb", "year"),
          type = "network",
          main = "Turnout: Balanced Panel")

4.2.2 Unbalanced panel with singletons

In many applied settings, panels are unbalanced: some units are observed in only a subset of periods. Units or periods with only one connection (degree 1) are called singletons. Correia (2016) shows that singletons can be iteratively removed without affecting the estimation of multi-way fixed effects.

We construct a simple example where some units appear in only one period:

set.seed(42)

## start with a 20-unit, 8-period balanced panel
sim_unbalanced <- expand.grid(unit = paste0("U", 1:20), time = 2001:2008,
                              stringsAsFactors = FALSE)

## randomly drop 40% of observations to create an unbalanced panel
sim_unbalanced <- sim_unbalanced[sample(nrow(sim_unbalanced),
                                        round(nrow(sim_unbalanced) * 0.6)), ]

## add two units that each appear in only one period (singletons)
sim_unbalanced <- rbind(sim_unbalanced,
                        data.frame(unit = "U21", time = 2003))
sim_unbalanced <- rbind(sim_unbalanced,
                        data.frame(unit = "U22", time = 2007))

## add two units in a separate time range (disconnected component)
sim_unbalanced <- rbind(sim_unbalanced,
                        data.frame(unit = "U23", time = 2050),
                        data.frame(unit = "U24", time = 2050),
                        data.frame(unit = "U24", time = 2051))
p.network <- panelview(sim_unbalanced, formula = ~ 1,
                       index = c("unit", "time"),
                       type = "network",
                       main = "Unbalanced Panel with Singletons")

Singletons are highlighted with a crimson glow. The function returns graph diagnostics invisibly. The $singletons element is a data frame listing each singleton along with the fixed-effect dimension it belongs to:

p.network$singletons

The $n_components element reports the number of connected components — groups of units and time periods that share no observations with each other:

## two components: the main group and the {U23, U24, 2050, 2051} cluster
p.network$n_components
#> [1] 2

4.2.3 Missingness in Data

When a formula such as Y ~ D + X is supplied with type = "network", observations with missing values in any of the specified variables are dropped before the graph is constructed. This way the graph reflects only the observations usable for estimation.

sim_missing <- data.frame(
  unit = rep(c("A", "B", "C"), each = 4),
  time = rep(2001:2004, 3),
  Y    = c(1, NA, 3, 4,   5, 6, NA, 8,   9, 10, 11, 12),
  D    = c(0, 0, 1, 1,    0, 0, 0, NA,   1, 1, 1, 1)
)

## ~1 keeps all 12 observations (missingness in Y/D is ignored)
p.all <- panelview(sim_missing, ~ 1,
                   index = c("unit", "time"), type = "network")


## Y ~ D drops rows where Y or D is NA (3 rows dropped → 9 edges)
p.yd <- panelview(sim_missing, Y ~ D,
                  index = c("unit", "time"), type = "network",
                  main = "Missingness from formula: Y ~ D")

cat("Edges with ~1:", igraph::ecount(p.all$graph), "\n")
#> Edges with ~1: 12
cat("Edges with Y ~ D:", igraph::ecount(p.yd$graph), "\n")
#> Edges with Y ~ D: 9

4.3 Layout options

Three layout algorithms are available via the layout parameter:

  • "fr" (default): Fruchterman–Reingold force-directed layout. Good for revealing cluster structure.
  • "bipartite": Two-row layout with each fixed-effect dimension on a separate horizontal line. Best for small panels.
  • "circle": Nodes arranged on a circle.
sim_small <- data.frame(
  unit = c("A","A","A","B","B","C","C","D"),
  time = c(1, 2, 3, 1, 2, 3, 4, 5)
)

panelview(sim_small, ~ 1, index = c("unit", "time"), type = "network",
          layout = "fr", show.labels = "all", main = "FR layout")

panelview(sim_small, ~ 1, index = c("unit", "time"), type = "network",
          layout = "bipartite", show.labels = "all", 
          main = "Bipartite layout")

panelview(sim_small, ~ 1, index = c("unit", "time"), type = "network",
          layout = "circle", show.labels = "all", 
          main = "Circle layout")

4.4 Multi-way fixed effects

Many empirical settings involve more than two sets of fixed effects (\(k \geq 3\)). For example, matched employer–employee data has worker, firm, and year fixed effects simultaneously. The index parameter accepts 3 or more column names for the network type.

Each fixed-effect dimension is rendered with a distinct shape and color:

Dimension Shape Default color
1st Circle Steel blue
2nd Square Burnt orange
3rd Triangle Sage green
4th Diamond Purple
5th Inv. triangle Amber

4.4.1 Three-way example: worker \(\times\) firm \(\times\) year

sim_workers <- data.frame(
  worker = c("Alice", "Alice", "Bob", "Bob", "Carol", "Carol", "Dave"),
  firm   = c("Google", "Meta", "Google", "Apple", 
             "Meta", "Apple", "Netflix"),
  year   = c(2020, 2021, 2020, 2021, 2020, 2021, 2022)
)
p.workers <- panelview(sim_workers, formula = ~ 1,
                       index = c("worker", "firm", "year"),
                       type = "network",
                       show.labels = "all",
                       main = "3-way FE: Worker x Firm x Year")

Dave works only at Netflix and only in 2022, forming a separate connected component from the main group. The $n_components element confirms the number of connected components in the graph:

## main group (Alice, Bob, Carol with Google, Meta, Apple in 2020-2021)
## and the isolated {Dave, Netflix, 2022} cluster
p.workers$n_components
#> [1] 2

4.4.2 Four-way example: student \(\times\) teacher \(\times\) school \(\times\) year

sim_schools <- data.frame(
  student = c("S1","S1","S2","S2","S3","S3","S4","S5"),
  teacher = c("T1","T2","T1","T3","T2","T3","T4","T5"),
  school  = c("A","A","A","A","B","B","C","D"),
  year    = c(2020,2021,2020,2021,2020,2021,2022,2022)
)

panelview(sim_schools, ~ 1,
          index = c("student", "teacher", "school", "year"),
          type = "network", show.labels = "all",
          main = "4-way FE: Student x Teacher x School x Year")

4.5 Weighted edges

Theories for standard panel data methods often assume each combination of fixed-effect indices (e.g., unit and time) uniquely identifies an observation. However, in many empirical settings this assumption does not hold. For example, in matched employer–employee data, a worker may appear at the same firm in multiple records within the same year. It is important for researchers to diagnose these cases before estimation.

The network plot aggregates duplicate edges and renders them with proportionally thicker lines. To illustrate, we take the turnout dataset and deliberately duplicate some state-year observations:

## take a subset of turnout and create duplicates
sim_turnout_dup <- turnout[turnout$year <= 1940, c("abb", "year")]

## duplicate some state-year pairs: MN appears 3x in 1920, WI 2x in 1924
sim_turnout_dup <- rbind(sim_turnout_dup,
  data.frame(abb = "MN", year = 1920),
  data.frame(abb = "MN", year = 1920),
  data.frame(abb = "WI", year = 1924)
)
p.dup <- panelview(sim_turnout_dup, ~ 1, index = c("abb", "year"),
                   type = "network",
                   main = "Turnout Subset with Duplicate Observations")

The thicker edges between MN–1920 and WI–1924 are clearly visible. The $multi_edges element is a data frame with one row per duplicated combination, with columns for each fixed-effect dimension and a count column:

p.dup$multi_edges

4.6 Customization

4.6.1 Edge visibility

For dense graphs, the default edge transparency may need adjustment. Use edge.alpha (0–1) and edge.width (in mm) to control edge appearance:

panelview(turnout, ~ 1, index = c("abb", "year"), type = "network",
          edge.alpha = 0.8, edge.width = 0.4,
          main = "Custom: edge.alpha = 0.8, edge.width = 0.4")

4.6.2 Custom colors

Supply a vector of colors (one per fixed-effect dimension) via the color parameter:

panelview(sim_small, ~ 1, index = c("unit", "time"), type = "network",
          color = c("#6366F1", "#10B981"), show.labels = "all",
          main = "Custom Colors: Indigo + Emerald")

4.6.3 Singleton color

The color used to highlight singletons can be changed with singleton.color:

panelview(sim_unbalanced, ~ 1, index = c("unit", "time"), type = "network",
          singleton.color = "#FF8C00",
          main = "Custom Singleton Color (Dark Orange)")

4.6.4 Node size and edge width

Use node.size to control node size and edge.width for edge thickness:

panelview(sim_small, ~ 1, index = c("unit", "time"), type = "network",
          node.size = 5, edge.width = 1, show.labels = "all",
          main = "Larger Nodes and Thicker Edges")

4.6.5 Other options

  • show.singletons = FALSE: do not highlight singletons.
  • highlight.components = FALSE: do not draw convex hulls around connected components.
  • show.labels = "singletons": label only singleton nodes (useful for large panels).
  • legendOff = TRUE: hide the legend.
  • main = "": suppress the title.

4.7 Accessing diagnostics

When type = "network", panelview() invisibly returns a list with the full graph structure. This allows programmatic inspection beyond what the plot shows.

p.network <- panelview(sim_unbalanced, ~ 1,
                       index = c("unit", "time"), type = "network",
                       main = "")

4.7.1 Identifying singletons in the data

The $singletons element is a data frame with one column per fixed-effect dimension, plus a singleton_fe column indicating which dimension the singleton belongs to. Each row shows the singleton node and its connected FE levels:

p.network$singletons

The singleton_fe column tells you which dimension is the singleton. To extract just the singleton units:

p.network$singletons[p.network$singletons$singleton_fe == "unit", ]

4.7.2 Identifying duplicate observations

The $multi_edges data frame has one row per duplicated fixed-effect combination, with columns named after the fixed-effect dimensions and a count column:

p.dup$multi_edges

To find the original rows in the data that correspond to duplicated combinations:

## rows where (abb, year) is a duplicated combination
dup_idx <- duplicated(sim_turnout_dup[, c("abb", "year")]) |
           duplicated(sim_turnout_dup[, c("abb", "year")], fromLast = TRUE)
sim_turnout_dup[dup_idx, ]

4.7.3 Full return value

Element Description
graph An igraph object for further analysis
singletons Data frame with one column per FE dimension + singleton_fe: rows are singleton nodes with their connected FE levels
multi_edges Data frame with FE columns + count: duplicated combinations
components Component membership vector
n_components Number of connected components
plot The ggplot2 object for further customization

The igraph object can be used for additional graph analysis:

library(igraph)

## degree distribution
deg <- degree(p.network$graph)
summary(deg)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   1.000   3.000   5.000   5.941   7.750  17.000

## number of connected components
p.network$n_components
#> [1] 2