skills/experimental-design/SKILL.md
The design of a study — how units are assigned to conditions, what is held constant, what is varied, and in what structure — determines what questions the data can answer. No analysis can rescue a confounded or pseudoreplicated design after the fact. This skill is about the decisions made before data collection: picking a design that isolates the effect of interest, randomizing to license causal claims, blocking to remove known nuisance variation, and structuring multi-factor experiments so effects are estimable rather than tangled together.
The three ideas behind almost every good design (Fisher's principles):
This skill helps you choose among design types, generate the actual randomization or DOE layout (with reproducible scripts), and avoid the structural mistakes that make data uninterpretable.
uv pip install "numpy>=1.26" "pandas>=2.0" pyDOE3
pyDOE3 is the maintained successor to pyDOE/pyDOE2 and supplies factorial,
fractional-factorial, Plackett-Burman, central-composite, Box-Behnken, and
Latin-hypercube generators. The bundled scripts wrap it to return designs in real
factor units with named columns and randomized run order.
Start from the question and the structure of your units, not from a favorite design.
What are you trying to learn?
│
├─ Compare a few predefined conditions (A vs B vs C)?
│ ├─ Units independent, possibly with a known nuisance factor (day, batch, site)?
│ │ → Completely randomized (no nuisance) or RANDOMIZED BLOCK design.
│ ├─ Each unit can receive every condition in sequence (washout possible)?
│ │ → CROSSOVER / repeated-measures design (more power, watch carry-over).
│ └─ You can only randomize groups, not individuals (schools, clinics)?
│ → CLUSTER-randomized design (analyze at the cluster level; see pseudoreplication).
│
├─ Screen MANY factors (5+) to find the few that matter?
│ → FRACTIONAL FACTORIAL or PLACKETT-BURMAN screening design.
│
├─ Quantify main effects AND interactions among a handful of factors?
│ → FULL 2^k FACTORIAL design.
│
├─ Find the settings that OPTIMIZE a response (curvature matters)?
│ → RESPONSE-SURFACE design: central composite or Box-Behnken.
│
└─ Explore a simulation/computer model over a continuous space?
→ SPACE-FILLING design: Latin hypercube.
Detailed guidance per branch:
references/randomization_and_blocking.mdreferences/factorial_and_doe.mdreferences/design_types.mdreferences/sequential_and_adaptive.mdTwo scripts produce ready-to-use, reproducible layouts. Run them from the skill's
scripts/ directory or add it to sys.path. Everything is seeded so the exact
schedule can be archived and regenerated — a requirement for trial registration
and good lab practice.
scripts/randomization.pyfrom randomization import (
simple_randomization, block_randomization,
stratified_block_randomization, cluster_randomization,
assign_factorial_runs, arm_balance,
)
# Permuted blocks keep the arms balanced throughout enrollment (use for n < ~100
# or sequential intake — simple randomization can drift out of balance with small n)
sched = block_randomization(n=60, arms=["treatment", "control"], seed=42)
# Balance a prognostic variable across arms by randomizing within each stratum
sched = stratified_block_randomization({"siteA": 30, "siteB": 30},
arms=["drug", "placebo"], ratio=(2, 1), seed=42)
# Randomize whole clusters, not individuals (the cluster is the unit)
sched = cluster_randomization(["clinic1", "clinic2", "clinic3", "clinic4"], seed=42)
arm_balance(sched) # sanity-check the counts per arm
sched.to_csv("allocation_schedule.csv", index=False)
Choosing among them: simple is fine for large n but can produce imbalance with
small n; block guarantees balance throughout; stratified block additionally
balances a known prognostic factor; cluster is mandatory when the intervention
is delivered at a group level. See references/randomization_and_blocking.md.
scripts/doe_designs.pyfrom doe_designs import (
full_factorial, two_level_factorial, fractional_factorial,
plackett_burman, central_composite, box_behnken, latin_hypercube,
)
# Factors as real-world (low, high) ranges -> design comes back in real units
factors = {"temp_C": (20, 60), "conc_mM": (1, 10), "pH": (6, 8)}
# Full 2^3: all main effects + all interactions (8 runs), run order randomized
design = two_level_factorial(factors, seed=42)
# Screen 7 factors cheaply (main effects only)
many = {f"factor_{i}": (0, 1) for i in range(7)}
design = plackett_burman(many, seed=42)
# Optimize over 2 factors with curvature (response-surface)
design = central_composite({"temp_C": (20, 60), "conc_mM": (1, 10)}, seed=42)
design.to_csv("experimental_runs.csv", index=False)
Run order is randomized by default so factors aren't confounded with time/drift
(machine warm-up, reagent aging). See references/factorial_and_doe.md for picking
generators, reading the alias structure, and choosing resolution.
These are structural — they can't be fixed in analysis, only in design.
references/design_types.md.randomization.py / doe_designs.py, seeded.scripts/randomization.py — seeded allocation schedules: simple_randomization,
block_randomization, stratified_block_randomization, cluster_randomization,
assign_factorial_runs, arm_balance.scripts/doe_designs.py — DOE matrices in real units: full_factorial,
two_level_factorial, fractional_factorial, plackett_burman,
central_composite, box_behnken, latin_hypercube.references/randomization_and_blocking.md — randomization methods, blocking,
stratification, controls, blinding, batch/plate layout.references/factorial_and_doe.md — factorial and fractional designs, resolution
and aliasing, screening, and response-surface methodology.references/design_types.md — completely randomized, randomized block, crossover,
repeated-measures, split-plot, Latin-square, cluster, and nested designs; the
pseudoreplication problem in depth.references/sequential_and_adaptive.md — group-sequential designs, alpha spending,
interim stopping, and adaptive sample-size re-estimation.