8  Advanced options

This chapter is a reference for arguments to scfit() and the sc_* quantity functions that most users will never need to change. Read it when you need to deviate from defaults: alternative Stage-2 estimators, manual tuning of training hyperparameters, parallel execution, or reproducibility audits.

The four example chapters (Saha-Weeks, Graham-Svolik, Ballard-Rosa, Bechtel-Scheve) all use the defaults. This chapter explains what those defaults are and when you might override them.

8.1 Stage 2: choosing the empirical-Bayes refinement

scfit() runs a Stage-2 refinement on top of the Stage-1 DNN cross-fit by default. The Stage-2 output is what sc_fit$beta_hat carries and what every sc_* quantity function reads. The DML population estimates \(\hat\theta\) and clustered SEs are bit-exactly invariant across Stage-2 choices on the same seed — they are always computed from the Stage-1 DNN.

The four stage2 choices:

Value What it does When to use
"map_c5" (default) Paper EnsC5. Trains a 2nd DNN with stage2_seed, averages with the first DNN, runs a Newton MAP solver per respondent with a score-based diagonal prior scaled by \(1/5\). The paper’s recommended default. Best individual-\(\beta\) correlation under realistic conjoint designs.
"none" Skip Stage 2. beta_hat is the raw Stage-1 cross-fitted DNN. Reproduce v0.1 behavior, or when you want the raw DNN view for downstream code that does its own refinement.
"varref" Use \(0.5 \cdot \mathrm{Var}_i(\hat\beta_{\text{ens},i,k})\) as the diagonal prior variance (floored at varref_floor, default \(10^{-3}\)). Otherwise identical to MAP. Recommended for continuous-attribute designs (e.g. Ballard-Rosa tax rates); see varref_floor below.
"mixed_logit" DNN-offset BLUP via lme4::glmer. Treats the DNN prediction as a fixed offset and fits respondent-specific random slopes by Laplace approximation. When you want a likelihood-based BLUP as a robustness check. Paper §A.4 notes this is more competitive when \(T\) is large, \(p\) is modest, and \(\mathbf{Z}\) already explains most heterogeneity.
## v0.1 behavior
fit_dnn <- scfit(formula, data, ..., stage2 = "none")

## Paper EnsC5 default
fit_map <- scfit(formula, data, ..., stage2 = "map_c5")

## Mixed-logit alternative (requires lme4)
fit_ml  <- scfit(formula, data, ..., stage2 = "mixed_logit")

The mixed-logit option calls lme4::glmer() and is wrapped in a deterministic fallback: on convergence failure it sets stage2_method = "mixed_logit_failed", emits a warning() with the underlying glmer message, and sets beta_hat <- beta_hat_dnn. Quantity functions continue to work; they simply read the unrefined DNN matrix.

8.1.1 Inspecting what Stage 2 produced

fit$stage2_method     # one of: none, map_c5, varref, mixed_logit, mixed_logit_failed
fit$beta_hat          # Stage-2-refined task-level betas (what quantities read)
fit$beta_hat_dnn      # Stage-1 single-DNN betas (always present in v0.2)
fit$beta_hat_dnn2     # 2nd-DNN cross-fit (NULL if stage2 = "none")
fit$beta_hat_ens      # ensemble average of dnn + dnn2 (NULL if stage2 = "none")
fit$beta_hat_resp     # respondent-level Stage-2 betas (N_resp x p)
fit$sigma_prior       # diagonal prior variance used in MAP (length p)
fit$sigma_post_diag   # mean posterior variance from MAP Hessian (length p)
fit$stage2_warnings   # character vector of glmer warnings, if any

8.2 which_beta: hybrid vs raw DNN view on quantities

Every sc_* quantity now accepts which_beta = c("hybrid", "dnn"). The default "hybrid" reads fit$beta_hat (whatever Stage 2 produced). Passing "dnn" reads fit$beta_hat_dnn (the raw Stage-1 view). When stage2 = "none" the two are numerically identical and the helper silently falls back.

## Same fit, two views of the same distribution
sc_fraction_preferring(fit)                     # MAP-shrunk
sc_fraction_preferring(fit, which_beta = "dnn") # raw DNN

When the paper text says “the structural model reveals that X% of respondents prefer Y”, it means the MAP-shrunk fraction (the "hybrid" default). The raw DNN fraction is usually wider because unrefined respondent-level coefficients have more sampling noise.

8.3 Reproducibility: seed and stage2_seed

scfit() ships a bit-exact determinism guarantee:

## All three return identical(beta_hat, ...) output
scfit(..., seed = 42, stage2_seed = 12345L, parallel = FALSE)
scfit(..., seed = 42, stage2_seed = 12345L, parallel = TRUE, n_cores = 2)
scfit(..., seed = 42, stage2_seed = 12345L, parallel = TRUE, n_cores = 4)

seed controls Stage 1; stage2_seed controls the 2nd DNN in the Stage-2 ensemble. They are independent — changing stage2_seed does not perturb theta or vcov (those are computed from the Stage-1 cross-fit alone), but it does shift the MAP beta_hat through the 2nd DNN’s contribution to the ensemble. The default stage2_seed = 12345L matches the paper’s prototype.

If seed = NULL (the default), Stage 1 is non-deterministic; we still respect stage2_seed so the 2nd DNN’s behavior remains predictable given seed.

8.4 normalize_deltaX: scale-aware MAP regularization

New in v0.2.1. The v0.2 score-based MAP prior is calibrated assuming \(\mathrm{Var}(\Delta X_k) \approx 1\). For factor dummies under typical randomization this is approximately true. For designs with continuous attributes on very different scales — e.g. the Ballard-Rosa tax-rate brackets in percentage points (0–50) — the un-standardized prior becomes loose by the same factor as \(\mathrm{Var}(\Delta X_k)\), and the MAP solver leaves per-respondent estimates with extreme tails.

normalize_deltaX = TRUE divides each deltaX column by its sample SD before training / Lambda(Z) / DML / MAP, then un-standardizes all coefficient-bearing slots at assembly time. The user-facing surface (coef(fit), vcov(fit), fit$beta_hat, etc.) is on the original units; only the internal pipeline runs on a common-variance scale.

## Recommended for designs with continuous attributes on different
## scales (e.g. price in USD + age in years + 0/1 dummies)
fit <- scfit(formula, data, ..., normalize_deltaX = TRUE)

## The SDs used are kept on the fit for diagnostics:
fit$sd_dx

Default is FALSE, matching the paper’s stated runtime (which keeps attributes on their original units, e.g. tax rates in percentage points). On continuous-attribute designs the v0.2 score-based prior can leave per-respondent MAP estimates with extreme tails when \(\mathrm{Var}(\Delta X_k) \gg 1\); pass normalize_deltaX = TRUE in that case. For pure factor-dummy designs the option is a near-no-op.

8.5 sc_design_diagnostic(): which recovery tiers does my design support?

New in v0.2.1. Estimates per-coefficient \(\hat R^2_{Z,k}\) from the MAP posterior and maps the design to four recovery tiers per paper §6 heuristics:

Tier Condition Quantities recoverable
mean & aggregate any reasonable design AMCE, average vote shares
distributional T ≥ 5 and \(\bar R^2_Z\) ≥ 0.35 preference shares, importance rankings
individual-level T ≥ 8 and \(\bar R^2_Z\) ≥ 0.55 respondent-specific quantities
ratio (MRS / WTP) T ≥ 10 and \(\bar R^2_Z\) ≥ 0.55 and N ≥ 5000 individual MRS / WTP
diag <- sc_design_diagnostic(fit)
print(diag)
## Top / bottom R^2_Z attributes also reported -- helps spot which
## dummies are best-pinned by Z and which rely most on T for recovery.

Requires stage2 != "none" (needs the MAP Hessian’s posterior variance). The v0.2.1 estimator is flagged experimental = TRUE in print output because it has not yet been validated against the paper’s 5,760-cell simulation grid.

8.6 Training hyperparameters

scfit() exposes the DNN training loop. Defaults are tuned to handle realistic conjoint sample sizes (1k–10k respondents). The arguments worth knowing:

Argument Default Effect
K 10L Number of respondent-clustered cross-fitting folds. Higher = less Stage-1 overfit; lower = faster. Below K = 3 is not recommended.
n_epochs 1000L Adam epochs per fold (paper v13 default). Tutorial chapters use 200–400 for render speed. Monitor the loss trace via plot(fit, "loss_trace").
learning_rate 0.01 Adam step size. Lower if the loss diverges; higher if convergence is slow.
weight_decay "adaptive" L2 penalty on DNN weights, applied through the Adam optimizer. "adaptive" uses the paper’s v13 rule \(K_{adaptive}/NT\) with \(K_{adaptive}=15\) if \(NT/p<300\) else \(25\); pass a fixed numeric (e.g. 1e-4) to override. The resolved value is reported on fit$weight_decay_used.
varref_floor 1e-3 Lower bound on the prior variance when stage2 = "varref" (continuous-attribute designs). The default matches the paper’s Ballard-Rosa setting; raising it over-shrinks per-respondent estimates. Ignored for other stage2 choices.
ridge_lambda 1e-4 Ridge penalty on the local information matrix \(\Lambda(Z)\) used in DML. Independent from weight_decay.
hidden "auto" Hidden-layer widths. "auto" picks the paper v13 base c(32L, 32L, 16L) for any \(N \cdot T \ge 2000\) (with a c(128L, 64L, 64L) override at \(p \ge 40\) and \(NT \ge 80{,}000\)); you can pass an integer vector directly.
fit <- scfit(formula, data, ...,
             K = 10L, n_epochs = 1000L,
             learning_rate = 0.005, weight_decay = 5e-4,
             ridge_lambda = 1e-3,
             hidden = c(128L, 64L, 32L))

8.7 Parallel execution

Cross-fitting parallelizes across folds. Use parallel = TRUE and set n_cores to the number of physical cores you want to use. The bit-exact determinism guarantee holds across worker counts (folds are seeded by (seed, fold_id), never by which worker picks them up).

fit <- scfit(formula, data, ...,
             parallel = TRUE, n_cores = 4L,
             seed = 42, stage2_seed = 12345L)

The bit-exact guarantee is CPU-only. device = "cuda" runs the DNN on a GPU and is faster on large designs but is not deterministic.

8.8 keep_modules: forward-pass prediction on new respondents

By default, scfit() retains the trained per-fold torch::nn_module objects on the returned sc_fit, which enables forward-pass prediction on new \(\mathbf Z\):

fit <- scfit(formula, data, ..., keep_modules = TRUE)  # default

## Forward-pass on new respondents
new_Z <- matrix(stats::rnorm(50 * length(fit$z_names)),
                nrow = 50, ncol = length(fit$z_names))
beta_new <- predict(fit, newdata = new_Z, type = "beta")

Set keep_modules = FALSE to halve the in-memory size of the fit when forward-pass prediction is not needed.

8.9 When to override defaults: a quick decision guide

Situation Default Override
Routine paper-quality conjoint analysis use defaults none
Code targeting a specific published v0.1 reproduction stage2 = "map_c5" stage2 = "none"
Comparing the MAP refinement to a likelihood-based alternative use defaults also run stage2 = "mixed_logit"; check stage2_warnings
Sample very small (\(N < 100\) respondents) K = 10L drop to K = 3L (and accept higher Stage-1 variance)
Need GPU speedup device = "cpu" device = "cuda" (gives up bit-exact determinism)
Distributional claims about polarization or fraction-preferring which_beta = "hybrid" sanity-check with which_beta = "dnn"; if results flip qualitatively, design is too sparse for individual-level claims
Forward-pass on out-of-sample \(\mathbf Z\) keep_modules = TRUE none — the default does what you want