documentation/debugging-methodology.md
This document outlines best practices for debugging complex cross-platform build failures and other intricate issues in SkiaSharp. These guidelines help avoid common pitfalls that lead to wasted time and compounding errors.
Problem: Jumping to fixes when errors appear rather than understanding the system first.
Solution: Before any fix, answer these questions:
Before any investigation, determine:
Maintain a running log during debugging sessions:
| Time/Commit | Change Made | Result | Notes |
|---|---|---|---|
| baseline | none | macOS ✓, Windows ✗ | |
| commit abc | added define X | macOS ✗, Windows ✗ | X broke macOS! |
Never lose track of cause and effect.
When platform X works and platform Y fails, the difference between X and Y IS the answer. Focus investigation there.
Example: If ARM64 builds pass but x86/x64 fail, immediately ask "what's x86-specific?" (e.g., AVX2, SSE instructions, different compiler flags).
For cross-platform issues, understand exactly which code paths are active on each platform:
Example of tracing preprocessor logic:
// cpu.h - trace this for each platform:
#if defined(_MSC_VER) && _MSC_VER >= 1700 // Windows MSVC/clang-cl: YES
#define WEBP_MSC_AVX2 // macOS clang: NO
#endif
#if defined(__AVX2__) || defined(WEBP_MSC_AVX2) // Evaluate for each platform
#define WEBP_USE_AVX2
#endif
Different compilers define different macros:
_MSC_VER or __AVX2__ by default_MSC_VER for MSVC compatibility_MSC_VER, may define __AVX2__ with /arch:AVX2Don't assume - verify with a minimal test or documentation if uncertain.
#if defined(X) vs #if X#define FEATURE 0
#if defined(FEATURE) // TRUE - macro EXISTS
// This code IS compiled!
#endif
#if FEATURE // FALSE - value is 0
// This code is NOT compiled
#endif
Setting FEATURE=0 does NOT disable code guarded by defined(FEATURE).
If you need to apply a fix to "all platforms just to be safe," you probably don't understand the problem yet. Broad fixes:
Prefer surgical fixes that target exactly the affected platforms.
Running parallel builds that write to the same output file causes race conditions. When building multiple architectures:
Never say an error is "safe to ignore" without explaining exactly WHY. If you can't explain why it's safe, it's not safe.
State hypotheses explicitly and test them:
Example:
_MSC_VER, triggering AVX2 code paths"_MSC_VER is defined by examining the preprocessor output"_MSC_VER=1900undefined symbol ErrorsWhen you see undefined symbol: xxx errors, the symbol is missing from the linked libraries.
# Compare linked libraries between platforms
docker run --rm -v $(pwd):/work debian:bookworm-slim bash -c \
"apt-get update -qq && apt-get install -y -qq binutils >/dev/null && \
echo '=== x64 ===' && readelf -d /work/output/native/linux/x64/libSkiaSharp.so | grep NEEDED && \
echo && echo '=== ARM64 ===' && readelf -d /work/output/native/linux/arm64/libSkiaSharp.so | grep NEEDED"
If a library appears in one but not the other, that's your root cause.
The ninja file may have -lfoo but the linker silently skips it if it can't find the library:
# Check ninja file for expected libraries
grep "libs = " externals/skia/out/linux/arm64/obj/SkiaSharp.ninja
# Check if library exists in cross-compile sysroot
docker run --rm skiasharp-linux-gnu-cross-arm64 bash -c \
"ls -la /usr/aarch64-linux-gnu/lib/libfontconfig*"
Common issue: The -dev package provides a broken symlink (libfoo.so -> libfoo.so.1.2.3)
but the actual .so.1.2.3 file is in the runtime package (libfoo1), not the dev package.
| Root Cause | Fix Location |
|---|---|
| Library missing from linker flags | native/linux/build.cake or externals/skia/third_party/BUILD.gn |
| Library missing from cross-compile sysroot | scripts/Docker/debian/clang-cross/*/Dockerfile |
| Indirect dependency (A→B→C missing) | Fix B's linkage or add C explicitly |
Symptom: undefined symbol: uuid_generate_random on ARM64 only
Investigation:
libfontconfig.so.1 in DT_NEEDEDlibfontconfig.so.1 in DT_NEEDED-lfontconfig for BOTH buildsRoot cause: Cross-compile Docker only had libfontconfig1-dev which provides a broken symlink.
The actual shared library is in libfontconfig1 (runtime package).
Fix: Download both -dev (headers) AND runtime (actual .so) packages in the Dockerfile.
When testing different SkiaSharp NuGet versions on WASM, native .wasm binaries are cached in bin/obj/_framework and are version-specific (tied to Emscripten version). Changing the NuGet version reference (e.g., via sed) without cleaning these directories leaves stale native files, producing false positive/negative results. Always use fresh project directories per version or clean bin/, obj/, and _framework before rebuilding.
| Do | Don't |
|---|---|
| Establish baseline first | Jump to fixing immediately |
| Track changes and effects | Lose track of what changed when |
| Trace conditional code completely | Skim for keywords |
| Use platform differences as clues | Ignore success patterns |
| Make one change at a time | Batch multiple changes |
| Verify claims with evidence | State assumptions as facts |
| Explain why errors are safe to ignore | Dismiss errors without explanation |
| Revert when fixes make things worse | Pile more fixes on top |