packages/shared-skills/skills/ast-grep/references/pitfalls.md
This is the failure-mode field guide. The scripts/ast_grep_helper.py validate subcommand mechanically checks for the items in §1 before calling sg; the rest are lower-frequency but still common.
ast-grep does not interpret regex inside patterns. The following all fail:
| Bad | Why | Use instead |
|---|---|---|
foo|bar | | is regex alternation. ast-grep does not alternate. | Two separate calls, OR any: [pattern: foo, pattern: bar] in a YAML rule, OR rg -e foo -e bar. |
foo.*bar | .* is a regex wildcard. | foo($$$) bar if the gap is a list of nodes; otherwise switch to rg. |
\w+, \d+, \s | Regex character classes. | $VAR to capture any identifier. For digits-only, use kind: number_literal. |
[a-z]+ | Regex character class. | No AST equivalent — switch to rg. |
^foo$ | Regex anchors. | Anchor by AST: use kind: program > expression_statement or use inside/not has. |
Why this happens: LLMs default to regex thinking. The mental switch is "ast-grep patterns are code, not strings."
When you genuinely need regex, use the regex rule field in YAML (matches node text with Rust regex):
rule:
all:
- kind: identifier
- regex: '^[A-Z][a-z]+$' # CamelCase identifiers only
Note: regex matches the whole node text — no partial matches. Combine with kind or pattern for performance.
Patterns must be valid code that the parser accepts as a complete node. Common mistakes:
# JS/TS
function foo ❌ no params, no body
function $NAME($$$) { $$$ } ✅
async function $NAME ❌
async function $NAME($$$) { $$$ } ✅
# Python
def foo: ❌ trailing colon makes it a statement
def $FN($$$) ✅
class Foo: ❌
class $C($$$) ✅
# Go
func foo ❌
func $NAME($$$) { $$$ } ✅
# Rust
fn foo ❌
fn $NAME($$$) -> $RET { $$$ } ✅
fn $NAME($$$) { $$$ } ✅ (-> () inferred)
# Java
public void foo ❌
public void $NAME($$$) { $$$ } ✅
If a pattern returns 0 matches and looks correct, run sg run -p '<pattern>' --lang <lang> --debug-query=ast --stdin <<< 'echo' and see what the parser thinks the pattern is. If it returns an ERROR node, the pattern is malformed.
A class field initializer a = 123 also parses as an assignment expression. If you want only field definitions, you must disambiguate:
# WRONG — pattern parses as assignment_expression, not field_definition
pattern: a = 123
kind: field_definition
# CORRECT — use pattern object with context + selector
pattern:
context: 'class C { a = 123 }'
selector: field_definition
kind and pattern are independent constraints, not modifiers of each other. ast-grep does not change how it parses based on kind.
| ambiguityA bare | in a pattern is interpreted as bitwise-or in most languages, not alternation. So:
pattern: foo | bar # parses as: foo bitwise-or'd with bar
…matches expressions like x | y, not "either foo or bar". To get alternation, use any:
rule:
any:
- pattern: foo
- pattern: bar
In TypeScript union types (A | B), | is part of the type syntax — pattern: A | B correctly parses as a union type and matches that.
// Pattern: $X = $X
// Captures only when both sides are TEXTUALLY identical.
// Matches:
a = a
foo.bar = foo.bar
// Does NOT match:
a = b
let x = compute() // because $X needs to bind once and re-use
If you actually want two independent captures, name them differently: $X = $Y.
$$$ is greedy then commits$$$ does not backtrack. It captures as much as possible, then commits. If your pattern needs a non-greedy match, structure it differently:
// You want "match foo($X), where $X is any single arg"
// BAD: foo($$$X) // matches foo(a), foo(a, b), foo(a, b, c) - too broad
// GOOD: foo($X) // matches only single-arg calls
// You want "match foo() with at least one arg"
// BAD: foo($$$X) // also matches foo()
// GOOD: foo($X, $$$REST) // forces at least one arg
kind names depend on tree-sitter grammarkind: function_declaration works for JavaScript, but Python uses function_definition, Rust uses function_item, Go uses function_declaration (same as JS by coincidence). To find the right name, parse a known-good file:
sg run -p '$_' --lang python --debug-query=cst path/to/example.py | grep -i function
Or open https://ast-grep.github.io/playground.html and click on a node to see its kind.
inside / has defaults to stopBy: neighborinside:
kind: function_declaration # only checks the IMMEDIATE parent
If you want "anywhere inside a function (any depth)":
inside:
kind: function_declaration
stopBy: end # walks up to the file root
Same for has (descendants):
has:
kind: return_statement
stopBy: end # walks down the whole subtree
Without stopBy: end, has only matches direct children.
--update-all when --json is setThis is the single biggest gotcha when scripting ast-grep. If you run:
sg run -p 'foo()' -r 'bar()' --json=compact --update-all .
…you get the JSON output but no files are mutated. ast-grep silently drops --update-all when --json is on. To both preview and apply, run two passes:
# Pass 1: preview as JSON
sg run -p 'foo()' -r 'bar()' --json=compact .
# Pass 2: actually apply
sg run -p 'foo()' -r 'bar()' --update-all .
scripts/ast_grep_helper.py replace does this automatically when --apply is set.
all and any evaluate against one target node at a time:
# WRONG — wants "node has BOTH a number child AND a string child"
has:
all:
- kind: number # impossible: one node cannot be both at once
- kind: string
# CORRECT
all:
- has: { kind: number }
- has: { kind: string }
Lift relational rules out of composites when the relation is "the surrounding node has X children matching Y."
When a rule object has multiple fields:
rule:
pattern: $X = compute()
has: { kind: number }
…ast-grep evaluates them as an implicit all, but the order in which metavariables are captured is not guaranteed. If your transform or fix depends on capture order, use an explicit all array:
rule:
all:
- pattern: function $F() { $$$ }
- has: { pattern: $F() } # $F captured by pattern first; here we just check
regex without kind is slowregex alone scans every node text in the file. On large repos this is noticeably slow. Always combine:
# Slow
rule:
regex: '^TODO'
# Fast
rule:
all:
- kind: comment
- regex: '^//\s*TODO'
ast-grep is a structural matcher. It does NOT know:
foo references point to the same variable.For those questions, use a real type-aware tool: TypeScript LSP, Pyright, Semgrep with type inference, CodeQL, etc.
ast-grep is great when the syntactic shape is what you care about: "find every call to eval(...)", "find every as any", "find every empty catch block." It is weak for "find every variable that's never used."
When a pattern returns 0 matches and you can't see why:
Or locally:
sg run -p '<pattern>' --lang <lang> --debug-query=ast --stdin <<< '<sample-code>'
stderr shows the parsed pattern; stdout shows the JSON match result. If the pattern shows up as ERROR (XXX), it doesn't parse.
references/patterns.md — meta-variables, strictness, naming rules.references/recipes.md — known-good patterns by language.references/cli.md — --debug-query, --strictness, --update-all.