e1 NPU quantization pipeline

The quantization pipeline produces calibration manifests consumed by the elizanpu IREE backend. Five formats target the NPU's hardware opcodes:

Format	Hardware path	Default use
PTQ INT8 (per-channel weights / per-tensor activations)	`GEMM_S8`, `DOT4_S8`	dense default for most CNN / small transformer
AWQ INT4 weight-only	`DOT8_S4`	LLM weights (best PPL at 3-4 bit)
GPTQ INT4 weight-only	`DOT8_S4`	fallback for non-LLM small-batch
FP8 E4M3	`DOT4_FP8_E4M3` (scalar contract today; tensor path BLOCKED)	long-context LLM where INT8/INT4 PPL degrades
2:4 structured sparse INT4	`SDOT4_S4_2_4`	dense matmul layers with 50% magnitude pruning
INT2 BitNet	`DOT16_S2` (scalar contract today; tensor path BLOCKED)	experimental ultra-low-precision LLM

Manifest schemas

Every calibrator emits a JSON manifest with a versioned schema string:

The IREE backend dispatches on the schema string at compile time.

All six calibrators committed under compiler/quantization/.
Unit tests pass (8/8) in repo CI without torch installed.
Real PyTorch model integration: BLOCKED on torch + IREE inside the canonical Linux container.