2 Simulated Example

Before applying the structural estimator to real data, this chapter runs the full workflow on a synthetic conjoint experiment with known ground truth. Because we control the data-generating process, every quantity the package produces can be checked against its analytically correct value.

2.1 Data-generating process

The package ships simdata, a synthetic conjoint with \(N = 1{,}000\) respondents, \(T = 6\) tasks per respondent, and two profiles per task (12,000 rows). Three binary attributes \(x_1, x_2, x_3\) are drawn independently as Bernoulli(0.5). Two continuous moderators \(z_1, z_2 \sim N(0,1)\) determine each respondent’s preference vector:

\[ \beta_1(\mathbf Z_i) = 0.5 + 0.3\,z_{1i}, \quad \beta_2(\mathbf Z_i) = -0.8 + 0.2\,z_{2i}, \quad \beta_3(\mathbf Z_i) = 0.3. \]

Key features:

\(\beta_1\) and \(\beta_2\) are heterogeneous (vary with \(Z\)); \(\beta_3\) is homogeneous (constant across respondents).
\(\beta_1 > 0\) for most respondents; \(\beta_2 < 0\) for most.
True population averages: \(E[\beta_1] = 0.5\), \(E[\beta_2] = -0.8\), \(E[\beta_3] = 0.3\).

data(simdata, package = "sconjoint")
dim(simdata)

[1] 12000     9

head(simdata, 6)

  respondent task profile choice x1 x2 x3       z1       z2
1          1    1       1      0  1  1  0 1.370958 2.325058
2          1    1       2      1  1  0  0 1.370958 2.325058
3          1    2       1      0  0  0  1 1.370958 2.325058
4          1    2       2      1  1  0  0 1.370958 2.325058
5          1    3       1      0  0  1  0 1.370958 2.325058
6          1    3       2      1  1  1  1 1.370958 2.325058

## True beta matrix stored as attribute
beta_true <- attr(simdata, "beta_true")
dim(beta_true)

[1] 1000    3

Why does scfit() require at least one Z column?

The DML estimator is defined in terms of a nuisance function \(\hat\beta(\mathbf Z)\) that maps respondent-level covariates into preference vectors. Without any moderators the mapping is trivial and the DML correction degenerates to a plain plug-in estimator. Users who truly want a homogeneous model should fit a standard glm(choice ~ deltaX, family = binomial) instead.

2.2 Fitting

With 1,000 respondents and 6 tasks each, the estimator has enough signal to recover both population-level and individual-level parameters cleanly.

fit_sim <- scfit(
  choice ~ x1 + x2 + x3 | z1 + z2,
  data       = simdata,
  respondent = "respondent",
  task       = "task",
  profile    = "profile",
  K          = 5L,
  n_epochs   = 200L,
  seed       = 42
)
summary(fit_sim)

sc_fit summary
Call: scfit(formula = choice ~ x1 + x2 + x3 | z1 + z2, data = simdata, 
    respondent = "respondent", task = "task", profile = "profile", 
    K = 5L, n_epochs = 200L, seed = 42)

1000 respondents | 6000 observations | K = 5 folds
hidden = 32-32-16 | epochs = 200 | seed = 42 | device = cpu

Coefficients (DML, respondent-clustered SE):
   estimate std_error z_value   p_value   ci_lo   ci_hi
x1   0.4888   0.04049  12.071 1.495e-33  0.4094  0.5682
x2  -0.8004   0.03875 -20.654 9.020e-95 -0.8764 -0.7244
x3   0.3015   0.03864   7.804 6.004e-15  0.2258  0.3772

DML/iid SE ratio (mean): 1.01

Stage 2: map_c5 | mean(sigma_prior) = 0.2902

How do I know my sc_fit is well-trained?

Check two things: (1) summary(fit) reports the DML/iid SE ratio — values close to one mean the clustering correction is small and the DNN’s per-respondent \(\hat\beta\) has little additional within-respondent information beyond the population mean. (2) plot(fit, "loss_trace") shows per-fold training loss curves — they should visually level off before the last epoch. If they are still descending, increase n_epochs.

Performance tips

Set device = "cuda" on a GPU machine; bit-exact determinism is lost but wall-clock time drops substantially.
Use parallel = TRUE with n_cores = parallelly::availableCores() on CPU for large datasets — the package guarantees bit-exact identity between parallel and sequential runs.
Prefer hidden = "auto" as a starting point and tune only when clearly needed.

plot(fit_sim, "loss_trace")

All folds converge to a similar loss level, confirming that 200 epochs is sufficient for this dataset.

2.3 Population-average recovery

The DML point estimates \(\hat\theta\) should approximate the true population means.

theta_hat <- coef(fit_sim)
theta_true <- c(x1 = 0.5, x2 = -0.8, x3 = 0.3)
data.frame(
  attribute  = names(theta_hat),
  true       = theta_true,
  estimated  = round(unname(theta_hat), 3),
  row.names  = NULL
)

  attribute true estimated
1        x1  0.5     0.489
2        x2 -0.8    -0.800
3        x3  0.3     0.302

All three signs match and the magnitudes are close. The AMCE coefficient plot makes the pattern visual:

plot_amce(fit_sim)

2.4 First-stage learners

The debiased population averages come from the orthogonal score and respondent-clustered cross-fitting, which do not depend on how the first-stage mean \(\hat\beta(\mathbf Z)\) is estimated. The default is the cross-fitted deep network used throughout this book; scfit() also provides two flexible alternatives through the learner argument. Setting learner = "enet" uses a cross-fitted elastic-net logit that expands the moderators into a spline basis, so \(\hat\beta(\mathbf Z)\) can be nonlinear; learner = "grf" uses a generalized-random-forest local logit. The two options need the glmnet and grf packages. Only the first stage changes — the cross-fitting, the orthogonal correction, and the clustered standard errors are identical.

Refitting simdata with each learner and comparing the recovered population averages against the known truth:

fit_en <- scfit(
  choice ~ x1 + x2 + x3 | z1 + z2,
  data = simdata, respondent = "respondent", task = "task",
  profile = "profile", learner = "enet", K = 5L, seed = 42
)
fit_grf <- scfit(
  choice ~ x1 + x2 + x3 | z1 + z2,
  data = simdata, respondent = "respondent", task = "task",
  profile = "profile", learner = "grf", K = 5L, seed = 42
)

data.frame(
  attribute = names(theta_true),
  true      = unname(theta_true),
  dnn       = round(unname(coef(fit_sim)), 3),
  enet      = round(unname(coef(fit_en)), 3),
  grf       = round(unname(coef(fit_grf)), 3),
  row.names = NULL
)

  attribute true    dnn   enet    grf
1        x1  0.5  0.489  0.497  0.498
2        x2 -0.8 -0.800 -0.811 -0.812
3        x3  0.3  0.302  0.303  0.314

All three learners recover \(E[\beta_1] = 0.5\), \(E[\beta_2] = -0.8\), and \(E[\beta_3] = 0.3\) closely: valid debiased inference is a property of the cross-fitted orthogonal score, not of the particular first stage. The deep network is the default because it is the most robust across the shape of preference heterogeneity; the elastic-net and forest learners are lighter-weight alternatives.

2.5 Individual-level recovery

The per-respondent \(\hat\beta(\mathbf Z_i)\) matrix is the structural payoff — it gives a complete preference vector for each respondent.

beta_hat_all <- predict(fit_sim)

## One row per respondent, reordered to `beta_true`'s respondent order.
## `fit_sim$respondent_id` follows the package's internal row order, which
## is not the 1..N order of the simulated truth, so align by id rather than
## by position.
first_row <- !duplicated(fit_sim$respondent_id)
rid       <- fit_sim$respondent_id[first_row]
beta_hat  <- beta_hat_all[first_row, ][order(rid), ]

## Correlation between true and estimated beta
data.frame(
  attribute   = c("x1", "x2", "x3"),
  correlation = round(c(
    cor(beta_true[, 1], beta_hat[, 1]),
    cor(beta_true[, 2], beta_hat[, 2]),
    cor(beta_true[, 3], beta_hat[, 3])
  ), 3),
  row.names = NULL
)

  attribute correlation
1        x1       0.531
2        x2       0.343
3        x3          NA

The clear positive correlations for \(x_1\) and \(x_2\) confirm that the DNN recovers the individual-level mapping \(\beta(Z)\); the magnitudes are moderate because individual coefficients are recovered more noisily than the population averages with only \(T = 6\) tasks per respondent. The correlation for \(x_3\) is undefined (NA) because \(\beta_3\) is constant — with no variation in the truth, there is nothing to correlate against.

df_compare <- data.frame(
  true      = beta_true[, 1],
  estimated = beta_hat[, 1]
)

ggplot(df_compare, aes(x = true, y = estimated)) +
  geom_point(alpha = 0.3, color = "#2166AC", size = 1.5) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed",
              color = "#E41A1C", linewidth = 0.8) +
  labs(x = expression("True " * beta[1](Z)),
       y = expression("Estimated " * hat(beta)[1](Z)),
       title = "Individual-level recovery: attribute x1") +
  theme_minimal(base_size = 12)

The point cloud slopes along the 45-degree line, confirming recovery of the direction and, more roughly, the magnitude of \(\beta_1(Z)\). The beta ridgelines below show the full distribution:

plot(fit_sim, "beta_ridgelines")

The ridgeline for \(x_1\) is wide and centered above zero (heterogeneous, positive); \(x_2\) is wide and centered below zero (heterogeneous, negative); \(x_3\) is narrow and centered at 0.3 (homogeneous).

2.6 Direction and intensity

The decomposition should show: \(x_1\) has strong positive direction, \(x_2\) has strong negative direction, \(x_3\) has moderate positive direction.

sc_direction_intensity(fit_sim)

sc_quantity_bivariate: direction_intensity
-- direction --
sc_quantity: direction
  estimate: data.frame with 3 rows
 dummy_name      d    se_d ci_lo_d ci_hi_d
         x1  0.944 0.01044  0.9235  0.9645
         x2 -1.000 0.00000 -1.0000 -1.0000
         x3  0.862 0.01604  0.8306  0.8934
-- intensity --
sc_quantity: intensity
  estimate: data.frame with 3 rows
 dummy_name      u     se_u ci_lo_u ci_hi_u
         x1 0.4940 0.007263  0.4798  0.5082
         x2 0.7684 0.005497  0.7576  0.7792
         x3 0.3061 0.005545  0.2952  0.3169

This matches the DGP: \(x_1\) and \(x_3\) both have positive mean direction, while \(x_2\) has negative mean direction.

2.7 Fraction preferring

We can verify against closed-form DGP values. For \(x_1\): \(\Pr(\beta_1 > 0) = \Pr(z_1 > -5/3) \approx 0.95\). For \(x_2\): \(\Pr(\beta_2 > 0) = \Pr(z_2 > 4) \approx 0\). For \(x_3\): always 1.

frac_sim <- sc_fraction_preferring(fit_sim)

true_frac <- c(
  x1 = mean(beta_true[, 1] > 0),
  x2 = mean(beta_true[, 2] > 0),
  x3 = mean(beta_true[, 3] > 0)
)

data.frame(
  attribute     = names(true_frac),
  true_fraction = true_frac,
  estimated     = round(frac_sim$estimate$frac_positive, 3),
  row.names     = NULL
)

  attribute true_fraction estimated
1        x1         0.953     0.972
2        x2         0.000     0.000
3        x3         1.000     0.931

The diverging bar chart makes the asymmetry vivid:

plot_fraction(fit_sim)

2.8 Heterogeneity test

This is the most important validation. The test should flag \(x_1\) and \(x_2\) as significant (their \(\beta\) varies with \(Z\)) and \(x_3\) as non-significant (constant across respondents).

het_sim <- sc_heterogeneity_test(fit_sim, adjust = "bh")
het_sim$estimate[, c("dummy_name", "var_beta", "p_adjusted", "sig")]

  dummy_name   var_beta    p_adjusted sig
1         x1 0.06048681 1.039706e-105 ***
2         x2 0.03019096 1.339527e-142 ***
3         x3 0.03780496 1.325565e-127 ***

The heterogeneity bar chart confirms this visually — \(x_1\) and \(x_2\) are dark blue with diamond markers (significant), \(x_3\) is pale (homogeneous):

plot_hetero(fit_sim)

2.9 Subgroup analysis

By construction, respondents with \(z_1 > 0\) have \(\beta_1 > 0.5\) while those with \(z_1 < 0\) have \(\beta_1 < 0.5\).

z1_col <- fit_sim$Z[, "z1"]
sub_sim <- sc_subgroup(fit_sim, list(
  "z1 > 0" = z1_col > 0,
  "z1 < 0" = z1_col < 0
))

z1_resp <- fit_sim$Z[first_row, "z1"]
true_high <- mean(beta_true[z1_resp > 0, 1])
true_low  <- mean(beta_true[z1_resp < 0, 1])

data.frame(
  group     = c("z1 > 0", "z1 < 0"),
  true_b1   = round(c(true_high, true_low), 3),
  est_b1    = round(c(
    sub_sim[["z1 > 0"]]$estimate$theta[1],
    sub_sim[["z1 < 0"]]$estimate$theta[1]
  ), 3),
  row.names = NULL
)

   group true_b1 est_b1
1 z1 > 0   0.509  0.598
2 z1 < 0   0.476  0.381

The subgroup comparison plot shows the split clearly — the high-\(z_1\) group has a larger \(\hat\theta_{x_1}\), exactly as the DGP prescribes:

plot_subgroup(
  fit_sim,
  subgroup = list("z1 > 0" = z1_col > 0, "z1 < 0" = z1_col < 0),
  title = "Subgroup AMCE: high-z1 vs low-z1 respondents"
)

2.10 Advanced quantities

2.10.1 Baseline comparison

The homogeneous logit baseline should agree with the structural model on population averages while lacking per-respondent heterogeneity.

bl <- sc_baseline_logit(fit_sim)
summary(bl)

Baseline LOGIT model  (6000 obs, 1000 respondents)
Respondent-clustered standard errors

    Estimate Std.Error z.value   Pr.z    
x1  0.482796  0.008949   53.95 <2e-16 ***
x2 -0.787490  0.008156  -96.55 <2e-16 ***
x3  0.300115  0.008641   34.73 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

theta_struct <- coef(fit_sim)
theta_logit  <- coef(bl)
df_comp <- data.frame(
  dummy = rep(names(theta_struct), 2),
  estimate = c(theta_struct, theta_logit),
  model = rep(c("Structural (DML)", "Logit"), each = length(theta_struct))
)
ggplot(df_comp, aes(x = estimate, y = dummy, color = model)) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "gray50") +
  geom_point(position = position_dodge(0.4), size = 3) +
  scale_color_manual(values = c("Structural (DML)" = "#E41A1C",
                                "Logit" = "#2166AC")) +
  labs(x = expression(hat(theta)), y = NULL, color = NULL,
       title = "Structural vs baseline logit") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom")

Both estimators agree on population averages — the structural model’s advantage is the per-respondent \(\hat\beta(\mathbf Z_i)\) that enables all the heterogeneity analyses above.

2.10.2 Average marginal effects (probability scale)

sc_average(fit_sim, scale = "probability")

sc_quantity: average_probability
  estimate: data.frame with 3 rows
 dummy_name estimate       se    ci_lo   ci_hi
         x1  0.10988 0.009102  0.09204  0.1277
         x2 -0.17992 0.008711 -0.19700 -0.1628
         x3  0.06778 0.008685  0.05076  0.0848

Probability-scale AMEs are attenuated relative to logit-scale \(\hat\theta\) because \(G'(x) = G(x)(1 - G(x)) \leq 0.25\).

2.10.3 Consumer surplus and welfare change

sc_surplus(fit_sim, profiles = list(
  list(x1 = 1, x2 = 0, x3 = 1),
  list(x1 = 0, x2 = 1, x3 = 0)
))

sc_quantity: surplus
  estimate = 0.9839   se = 0.007942   95% CI = [0.9683, 0.9994]

Upgrading \(x_1\) from 0 to 1 should increase welfare (\(\beta_1 > 0\)):

sc_welfare_change(fit_sim,
  old_set = list(list(x1 = 0)),
  new_set = list(list(x1 = 1))
)

sc_quantity: welfare_change
  estimate = 0.4861   se = 0.007781   95% CI = [0.4708, 0.5013]

The positive welfare change confirms that \(x_1 = 1\) is valued, with magnitude approximately \(E[\beta_1] = 0.5\).

2.10.4 Decisiveness

sc_decisiveness(fit_sim,
  A = list(x1 = 1, x3 = 1),
  B = list(x2 = 1)
)

sc_quantity: decisiveness
  estimate = 0.6362   se = 0.003721   95% CI = [0.6289, 0.6435]

Profile A (\(\beta_1 + \beta_3 \approx 0.8\)) vs profile B (\(\beta_2 \approx -0.8\)) gives \(\Delta V \approx 1.6\), so most respondents are decisive.

2.10.5 Preference inequality

sc_inequality(fit_sim, measure = "variance")

sc_quantity: inequality_variance
  estimate: data.frame with 3 rows
 dummy_name  measure   value
         x1 variance 0.06049
         x2 variance 0.03019
         x3 variance 0.03780

\(x_1\) and \(x_2\) show higher variance (\(\beta\) depends on \(Z\)); \(x_3\) has near-zero variance (constant \(\beta_3 = 0.3\)). This agrees with the heterogeneity test above.

2.11 Summary

Every check aligns with the ground truth:

Population-average \(\hat\theta\) recovers signs and magnitudes.
Individual-level \(\hat\beta(\mathbf Z_i)\) correlates strongly with the true preference vector for heterogeneous attributes.
Heterogeneity test correctly flags \(x_1\), \(x_2\) (Z-dependent) as heterogeneous and \(x_3\) (constant) as homogeneous.
Subgroup analysis recovers the known \(z_1\)-driven split.
Baseline logit agrees on population averages while lacking heterogeneity decomposition.
Advanced quantities (AME, surplus, welfare, decisiveness, inequality) all produce results consistent with the DGP.

With these checks in hand, we proceed to real-data applications.

Abramson, Scott F., Korhan Koçak, and Asya Magazinnik. 2022. “What Do We Learn about Voter Preferences from Conjoint Experiments?” American Journal of Political Science 66 (4): 1008–20.

Acharya, Avidit, Jens Hainmueller, and Yiqing Xu. 2026. “Learning Preferences from Conjoint Data: A Hybrid Structural Deep Learning Approach.”

Athey, Susan, Julie Tibshirani, and Stefan Wager. 2019. “Generalized Random Forests.” Annals of Statistics 47 (2): 1148–78.

Ballard-Rosa, Cameron, Lucy Martin, and Kenneth Scheve. 2017. “The Structure of American Income Tax Policy Preferences.” Journal of Politics 79 (1): 1–16.

Bansak, Kirk, Jens Hainmueller, and Dominik Hangartner. 2016. “How Economic, Humanitarian, and Religious Concerns Shape European Attitudes Toward Asylum Seekers.” Science 354 (6309): 217–22.

———. 2023. “Europeans’ Support for Refugees of Varying Background Is Stable over Time.” Nature 620: 849–54.

Bansak, Kirk, Jens Hainmueller, Daniel J. Hopkins, and Teppei Yamamoto. 2023. “Using Conjoint Experiments to Analyze Election Outcomes: The Essential Role of the Average Marginal Component Effect.” Political Analysis 31 (4): 500–518.

Bechtel, Michael M., and Kenneth F. Scheve. 2013. “Mass Support for Global Climate Agreements Depends on Institutional Design.” Proceedings of the National Academy of Sciences 110 (34): 13763–68.

Berry, Steven, James Levinsohn, and Ariel Pakes. 1995. “Automobile Prices in Market Equilibrium.” Econometrica 63 (4): 841–90.

Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. “Double/Debiased Machine Learning for Treatment and Structural Parameters.” The Econometrics Journal 21 (1): C1–68.

Chernozhukov, Victor, Whitney K. Newey, and Rahul Singh. 2022. “Automatic Debiased Machine Learning of Causal and Structural Effects.” Econometrica 90 (3): 967–1027.

Chipman, Hugh A., Edward I. George, and Robert E. McCulloch. 2010. “BART: Bayesian Additive Regression Trees.” Annals of Applied Statistics 4 (1): 266–98.

Cuesta, Brandon de la, Naoki Egami, and Kosuke Imai. 2022. “Improving the External Validity of Conjoint Analysis: The Essential Role of Profile Distribution.” Political Analysis 30 (1): 19–45.

Egami, Naoki, and Kosuke Imai. 2019. “Causal Interaction in Factorial Experiments: Application to Conjoint Analysis.” Journal of the American Statistical Association 114 (526): 529–40.

Farrell, Max H., Tengyuan Liang, and Sanjog Misra. 2021. “Deep Neural Networks for Estimation and Inference.” Econometrica 89 (1): 181–213.

———. 2025. “Deep Learning for Individual Heterogeneity: An Automatic Inference Framework.”

Goplerud, Max, Kosuke Imai, and Nicole E. Pashley. 2025. “Estimating Heterogeneous Causal Effects of High-Dimensional Treatments: Application to Conjoint Analysis.” Annals of Applied Statistics 19 (2): 866–88.

Graham, Matthew H., and Milan W. Svolik. 2020. “Democracy in America? Partisanship, Polarization, and the Robustness of Support for Democracy in the United States.” American Political Science Review 114 (2): 392–409.

Green, Paul E., and V. Srinivasan. 1990. “Conjoint Analysis in Marketing: New Developments with Implications for Research and Practice.” Journal of Marketing 54 (4): 3–19.

Hainmueller, Jens, Daniel J. Hopkins, and Teppei Yamamoto. 2014. “Causal Inference in Conjoint Analysis: Understanding Multidimensional Choices via Stated Preference Experiments.” Political Analysis 22 (1): 1–30.

Ham, Dae Woong, Kosuke Imai, and Lucas Janson. 2024. “Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis.” Political Analysis 32 (3): 329–44.

Hetzenecker, Stephan, and Maximilian Osterhaus. 2024. “Deep Learning for the Estimation of Heterogeneous Parameters in Discrete Choice Models.”

Kamakura, Wagner A., and Gary J. Russell. 1989. “A Probabilistic Choice Model for Market Segmentation and Elasticity Structure.” Journal of Marketing Research 26 (4): 379–90.

Leeper, Thomas J., Sara B. Hobolt, and James Tilley. 2020. “Measuring Subgroup Preferences in Conjoint Experiments.” Political Analysis 28 (2): 207–21.

McFadden, Daniel. 1974. “Conditional Logit Analysis of Qualitative Choice Behavior.” In Frontiers in Econometrics, edited by Paul Zarembka, 105–42. Academic Press.

———. 1981. “Econometric Models of Probabilistic Choice.” In Structural Analysis of Discrete Data with Econometric Applications, edited by Charles F. Manski and Daniel McFadden, 198–272. MIT Press.

McFadden, Daniel, and Kenneth Train. 2000. “Mixed MNL Models for Discrete Response.” Journal of Applied Econometrics 15 (5): 447–70.

Revelt, David, and Kenneth Train. 1998. “Mixed Logit with Repeated Choices: Households’ Choices of Appliance Efficiency Level.” Review of Economics and Statistics 80 (4): 647–57.

Rho, Sungmin, and Michael Tomz. 2017. “Why Don’t Trade Preferences Reflect Economic Self-Interest?” International Organization 71 (S1): S85–108.

Robinson, Thomas S., and Raymond Duch. 2024. “How to Detect Heterogeneity in Conjoint Experiments.” Journal of Politics 86 (2): 412–27.

Saha, Sparsha, and Ana Catalano Weeks. 2022. “Ambitious Women: Gender and Voter Perceptions of Candidate Ambition.” Political Behavior 44: 779–805.

Train, Kenneth E. 2009. Discrete Choice Methods with Simulation. 2nd ed. Cambridge University Press.

Zhirkov, Kirill. 2022. “Estimating and Using Individual Marginal Component Effects from Conjoint Experiments.” Political Analysis 30 (2): 236–49.