LEARNINGS.md
Date: 2026-02-10
Implemented end-to-end swish.beta support for CPU + MPS + CUDA GPU_REF paths, added non-one-beta coverage in unit/int tests, and integrated beta support into MFA Swish kernels with a special-case for beta == 1 to keep the original kernel code path unchanged.
lib/nnc/ccv_nnc.h (swish.beta in ccv_nnc_cmd_param_t)_beta:
lib/nnc/cmd/swish/ccv_nnc_swish.clib/nnc/cmd/ccv_nnc_cmd_easy.hcmd.info.swish.beta:
lib/nnc/cmd/swish/ccv_nnc_swish_cpu_ref.clib/nnc/cmd/swish/mps/ccv_nnc_swish_mps.m
cmd.info.swish.beta in forward/backward.beta == 1.beta != 1 in MPSGraph fallback.beta == 1).lib/nnc/cmd/swish/gpu/ccv_nnc_swish_gpu_ref.cu
cmd.info.swish.beta in forward/backward dispatch.beta == 1.beta != 1.lib/nnc/mfa/ccv_nnc_mfa_swish.hpplib/nnc/mfa/ccv_nnc_mfa_swish.cpplib/nnc/mfa/kernels/SwishDescriptor.hpplib/nnc/mfa/kernels/SwishDescriptor.cpplib/nnc/mfa/kernels/SwishKernel.hpplib/nnc/mfa/kernels/SwishKernel.cppbeta == 1 in MFA kernel source:
beta != 1.test/unit/nnc/swish.tests.cTEST_CASE("swish with non-one beta")TEST_CASE("swish gradient with non-one beta")test/int/nnc/swish.tests.cTEST_CASE("mps swish gradient with non-one beta in half precision")TEST_CASE("swish gradient with non-one beta in half precision") (GPU_REF)test/unit/nnc
make swish.tests -j4 && ./swish.tests -> passtest/int/nnc
./swish.tests -> pass (8/8, with 3 expected skips for MPS on non-macOS).Date: 2026-03-06
Implemented forward MPS support for EWPOW / EWSIN / EWCOS, added MFA sigmoid, migrated the remaining non-attention MFAv2 wrappers (gemv, depalettize, adam, normalization) to the Descriptor / Kernel model, and then renamed lib/nnc/mfa/v2 to lib/nnc/mfa/kernels with v2_cache renamed to kernel_cache.
EWEXP was the right precedent for new forward-only MPS elementwise ops:
MPSGraph first.EWPOW, EWSIN, and EWCOS were added that way.test/int/nnc/mpsblas.tests.cMPSGraph, not MFA:
lib/nnc/cmd/sigmoid/mps/ccv_nnc_sigmoid_mps.m1 / (1 + exp(-x)) form.g * y * (1 - y)gemvdepalettizeadamnormalizationattention was intentionally left alonedepalettize must keep the old qbits == 5, qbits == 6, and qbits == 8 shader behavior exactly.normalization MFA still only covers layer_norm and rmsnorm, matching prior behavior.
group_norm stays on MPSGraph.group_norm on MPS was never wired to MFA in the backend.lib/nnc/mfa/v2 -> lib/nnc/mfa/kernelscontext->v2_cache -> context->kernel_cachelib/BUILD.bazel already uses glob(["nnc/mfa/**/*.cpp", "nnc/mfa/**/*.inc"])glob(["nnc/mfa/**/*.hpp"])Package.swiftbin/nnc/* kernel generator / utility sources./mpsblas.tests <substring>./mpsdnn.tests <substring>mpsblas.testsmpsdnn.testsv2 to kernels should be performance-neutral if:
test/int/nnc
make debug -j4 -> pass./mpsblas.tests gemv -> pass./mpsblas.tests depalettize -> pass./mpsdnn.tests -> pass (83/83)./mpsblas.tests -> pass (68/68)bazel build //lib:nnc_mfa_compat did not validate the rename because this checkout currently lacks a resolved @local_config_ccv repository.nnc/mfa/**, so there was no explicit v2 path to update there.