8 Advanced options

This chapter is a reference for arguments to scfit() and the sc_* quantity functions that most users will never need to change. Read it when you need to deviate from defaults: alternative Stage-2 estimators, manual tuning of training hyperparameters, parallel execution, or reproducibility audits.

The four example chapters (Saha-Weeks, Graham-Svolik, Ballard-Rosa, Bechtel-Scheve) all use the defaults. This chapter explains what those defaults are and when you might override them.

8.1 Stage 2: choosing the empirical-Bayes refinement

scfit() runs a Stage-2 refinement on top of the Stage-1 DNN cross-fit by default. The Stage-2 output is what sc_fit$beta_hat carries and what every sc_* quantity function reads. The DML population estimates $\hat\theta$ and clustered SEs are bit-exactly invariant across Stage-2 choices on the same seed — they are always computed from the Stage-1 DNN.

The four stage2 choices:

Value	What it does	When to use
`"map_c5"` (default)	Paper EnsC5. Trains a 2nd DNN with `stage2_seed`, averages with the first DNN, runs a Newton MAP solver per respondent with a score-based diagonal prior scaled by $1/5$.	The paper’s recommended default. Best individual-$\beta$ correlation under realistic conjoint designs.
`"none"`	Skip Stage 2. `beta_hat` is the raw Stage-1 cross-fitted DNN.	Reproduce v0.1 behavior, or when you want the raw DNN view for downstream code that does its own refinement.
`"varref"`	Use $0.5 \cdot \mathrm{Var}_i(\hat\beta_{\text{ens},i,k})$ as the diagonal prior variance (floored at `varref_floor`, default $10^{-3}$). Otherwise identical to MAP.	Recommended for continuous-attribute designs (e.g. Ballard-Rosa tax rates); see `varref_floor` below.
`"mixed_logit"`	DNN-offset BLUP via `lme4::glmer`. Treats the DNN prediction as a fixed offset and fits respondent-specific random slopes by Laplace approximation.	When you want a likelihood-based BLUP as a robustness check. Paper §A.4 notes this is more competitive when $T$ is large, $p$ is modest, and $\mathbf{Z}$ already explains most heterogeneity.

## v0.1 behavior
fit_dnn <- scfit(formula, data, ..., stage2 = "none")

## Paper EnsC5 default
fit_map <- scfit(formula, data, ..., stage2 = "map_c5")

## Mixed-logit alternative (requires lme4)
fit_ml  <- scfit(formula, data, ..., stage2 = "mixed_logit")

The mixed-logit option calls lme4::glmer() and is wrapped in a deterministic fallback: on convergence failure it sets stage2_method = "mixed_logit_failed", emits a warning() with the underlying glmer message, and sets beta_hat <- beta_hat_dnn. Quantity functions continue to work; they simply read the unrefined DNN matrix.

8.1.1 Inspecting what Stage 2 produced

fit$stage2_method     # one of: none, map_c5, varref, mixed_logit, mixed_logit_failed
fit$beta_hat          # Stage-2-refined task-level betas (what quantities read)
fit$beta_hat_dnn      # Stage-1 single-DNN betas (always present in v0.2)
fit$beta_hat_dnn2     # 2nd-DNN cross-fit (NULL if stage2 = "none")
fit$beta_hat_ens      # ensemble average of dnn + dnn2 (NULL if stage2 = "none")
fit$beta_hat_resp     # respondent-level Stage-2 betas (N_resp x p)
fit$sigma_prior       # diagonal prior variance used in MAP (length p)
fit$sigma_post_diag   # mean posterior variance from MAP Hessian (length p)
fit$stage2_warnings   # character vector of glmer warnings, if any

8.2 `which_beta`: hybrid vs raw DNN view on quantities

Every sc_* quantity now accepts which_beta = c("hybrid", "dnn"). The default "hybrid" reads fit$beta_hat (whatever Stage 2 produced). Passing "dnn" reads fit$beta_hat_dnn (the raw Stage-1 view). When stage2 = "none" the two are numerically identical and the helper silently falls back.

## Same fit, two views of the same distribution
sc_fraction_preferring(fit)                     # MAP-shrunk
sc_fraction_preferring(fit, which_beta = "dnn") # raw DNN

When the paper text says “the structural model reveals that X% of respondents prefer Y”, it means the MAP-shrunk fraction (the "hybrid" default). The raw DNN fraction is usually wider because unrefined respondent-level coefficients have more sampling noise.

8.3 Reproducibility: `seed` and `stage2_seed`

scfit() ships a bit-exact determinism guarantee:

## All three return identical(beta_hat, ...) output
scfit(..., seed = 42, stage2_seed = 12345L, parallel = FALSE)
scfit(..., seed = 42, stage2_seed = 12345L, parallel = TRUE, n_cores = 2)
scfit(..., seed = 42, stage2_seed = 12345L, parallel = TRUE, n_cores = 4)

seed controls Stage 1; stage2_seed controls the 2nd DNN in the Stage-2 ensemble. They are independent — changing stage2_seed does not perturb theta or vcov (those are computed from the Stage-1 cross-fit alone), but it does shift the MAP beta_hat through the 2nd DNN’s contribution to the ensemble. The default stage2_seed = 12345L matches the paper’s prototype.

If seed = NULL (the default), Stage 1 is non-deterministic; we still respect stage2_seed so the 2nd DNN’s behavior remains predictable given seed.

8.4 `normalize_deltaX`: scale-aware MAP regularization

New in v0.2.1. The v0.2 score-based MAP prior is calibrated assuming $\mathrm{Var}(\Delta X_k) \approx 1$. For factor dummies under typical randomization this is approximately true. For designs with continuous attributes on very different scales — e.g. the Ballard-Rosa tax-rate brackets in percentage points (0–50) — the un-standardized prior becomes loose by the same factor as $\mathrm{Var}(\Delta X_k)$, and the MAP solver leaves per-respondent estimates with extreme tails.

normalize_deltaX = TRUE divides each deltaX column by its sample SD before training / Lambda(Z) / DML / MAP, then un-standardizes all coefficient-bearing slots at assembly time. The user-facing surface (coef(fit), vcov(fit), fit$beta_hat, etc.) is on the original units; only the internal pipeline runs on a common-variance scale.

## Recommended for designs with continuous attributes on different
## scales (e.g. price in USD + age in years + 0/1 dummies)
fit <- scfit(formula, data, ..., normalize_deltaX = TRUE)

## The SDs used are kept on the fit for diagnostics:
fit$sd_dx

Default is FALSE, matching the paper’s stated runtime (which keeps attributes on their original units, e.g. tax rates in percentage points). On continuous-attribute designs the v0.2 score-based prior can leave per-respondent MAP estimates with extreme tails when $\mathrm{Var}(\Delta X_k) \gg 1$; pass normalize_deltaX = TRUE in that case. For pure factor-dummy designs the option is a near-no-op.

8.5 `sc_design_diagnostic()`: which recovery tiers does my design support?

New in v0.2.1. Estimates per-coefficient $\hat R^2_{Z,k}$ from the MAP posterior and maps the design to four recovery tiers per paper §6 heuristics:

Tier	Condition	Quantities recoverable
mean & aggregate	any reasonable design	AMCE, average vote shares
distributional	T ≥ 5 and $\bar R^2_Z$ ≥ 0.35	preference shares, importance rankings
individual-level	T ≥ 8 and $\bar R^2_Z$ ≥ 0.55	respondent-specific quantities
ratio (MRS / WTP)	T ≥ 10 and $\bar R^2_Z$ ≥ 0.55 and N ≥ 5000	individual MRS / WTP

diag <- sc_design_diagnostic(fit)
print(diag)
## Top / bottom R^2_Z attributes also reported -- helps spot which
## dummies are best-pinned by Z and which rely most on T for recovery.

Requires stage2 != "none" (needs the MAP Hessian’s posterior variance). The v0.2.1 estimator is flagged experimental = TRUE in print output because it has not yet been validated against the paper’s 5,760-cell simulation grid.

8.6 Training hyperparameters

scfit() exposes the DNN training loop. Defaults are tuned to handle realistic conjoint sample sizes (1k–10k respondents). The arguments worth knowing:

Argument	Default	Effect
`K`	`10L`	Number of respondent-clustered cross-fitting folds. Higher = less Stage-1 overfit; lower = faster. Below `K = 3` is not recommended.
`n_epochs`	`1000L`	Adam epochs per fold (paper v13 default). Tutorial chapters use 200–400 for render speed. Monitor the loss trace via `plot(fit, "loss_trace")`.
`learning_rate`	`0.01`	Adam step size. Lower if the loss diverges; higher if convergence is slow.
`weight_decay`	`"adaptive"`	L2 penalty on DNN weights, applied through the Adam optimizer. `"adaptive"` uses the paper’s v13 rule $K_{adaptive}/NT$ with $K_{adaptive}=15$ if $NT/p<300$ else $25$; pass a fixed numeric (e.g. `1e-4`) to override. The resolved value is reported on `fit$weight_decay_used`.
`varref_floor`	`1e-3`	Lower bound on the prior variance when `stage2 = "varref"` (continuous-attribute designs). The default matches the paper’s Ballard-Rosa setting; raising it over-shrinks per-respondent estimates. Ignored for other `stage2` choices.
`ridge_lambda`	`1e-4`	Ridge penalty on the local information matrix $\Lambda(Z)$ used in DML. Independent from `weight_decay`.
`hidden`	`"auto"`	Hidden-layer widths. `"auto"` picks the paper v13 base `c(32L, 32L, 16L)` for any $N \cdot T \ge 2000$ (with a `c(128L, 64L, 64L)` override at $p \ge 40$ and $NT \ge 80{,}000$); you can pass an integer vector directly.

fit <- scfit(formula, data, ...,
             K = 10L, n_epochs = 1000L,
             learning_rate = 0.005, weight_decay = 5e-4,
             ridge_lambda = 1e-3,
             hidden = c(128L, 64L, 32L))

8.7 Parallel execution

Cross-fitting parallelizes across folds. Use parallel = TRUE and set n_cores to the number of physical cores you want to use. The bit-exact determinism guarantee holds across worker counts (folds are seeded by (seed, fold_id), never by which worker picks them up).

fit <- scfit(formula, data, ...,
             parallel = TRUE, n_cores = 4L,
             seed = 42, stage2_seed = 12345L)

The bit-exact guarantee is CPU-only. device = "cuda" runs the DNN on a GPU and is faster on large designs but is not deterministic.

8.8 `keep_modules`: forward-pass prediction on new respondents

By default, scfit() retains the trained per-fold torch::nn_module objects on the returned sc_fit, which enables forward-pass prediction on new $\mathbf Z$:

fit <- scfit(formula, data, ..., keep_modules = TRUE)  # default

## Forward-pass on new respondents
new_Z <- matrix(stats::rnorm(50 * length(fit$z_names)),
                nrow = 50, ncol = length(fit$z_names))
beta_new <- predict(fit, newdata = new_Z, type = "beta")

Set keep_modules = FALSE to halve the in-memory size of the fit when forward-pass prediction is not needed.

8.9 When to override defaults: a quick decision guide

Situation	Default	Override
Routine paper-quality conjoint analysis	use defaults	none
Code targeting a specific published v0.1 reproduction	`stage2 = "map_c5"`	`stage2 = "none"`
Comparing the MAP refinement to a likelihood-based alternative	use defaults	also run `stage2 = "mixed_logit"`; check `stage2_warnings`
Sample very small ($N < 100$ respondents)	`K = 10L`	drop to `K = 3L` (and accept higher Stage-1 variance)
Need GPU speedup	`device = "cpu"`	`device = "cuda"` (gives up bit-exact determinism)
Distributional claims about polarization or fraction-preferring	`which_beta = "hybrid"`	sanity-check with `which_beta = "dnn"`; if results flip qualitatively, design is too sparse for individual-level claims
Forward-pass on out-of-sample $\mathbf Z$	`keep_modules = TRUE`	none — the default does what you want

8.1 Stage 2: choosing the empirical-Bayes refinement

8.1.1 Inspecting what Stage 2 produced

8.2 which_beta: hybrid vs raw DNN view on quantities

8.3 Reproducibility: seed and stage2_seed

8.4 normalize_deltaX: scale-aware MAP regularization

8.5 sc_design_diagnostic(): which recovery tiers does my design support?