docs/wiki/Getting-Started-C#-Syntax-Analysis.md
Today, the Visual Basic and C# compilers are black boxes - text goes in and bytes come out - with no transparency into the intermediate phases of the compilation pipeline. With the .NET Compiler Platform (formerly known as "Roslyn"), tools and developers can leverage the exact same data structures and algorithms the compiler uses to analyze and understand code with confidence that that information is accurate and complete.
In this walkthrough we'll explore the Syntax API. The Syntax API exposes the parsers, the syntax trees themselves, and utilities for reasoning about and constructing them.
The Syntax API exposes the syntax trees the compilers use to understand Visual Basic and C# programs. They are produced by the same parser that runs when a project is built or a developer hits F5. The syntax trees have full-fidelity with the language; every bit of information in a code file is represented in the tree, including things like comments or whitespace. Writing a syntax tree to text will reproduce the exact original text that was parsed. The syntax trees are also immutable; once created a syntax tree can never be changed. This means consumers of the trees can analyze the trees on multiple threads, without locks or other concurrency measures, with the security that the data will never change under.
The four primary building blocks of syntax trees are:
Trivia, tokens, and nodes are composed hierarchically to form a tree that completely represents everything in a fragment of Visual Basic or C# code. For example, were you to examine the following C# source file using the Syntax Visualizer (In Visual Studio, choose View -> Other Windows -> Syntax Visualizer) it tree view would look like this:
SyntaxNode: Blue | SyntaxToken: Green | SyntaxTrivia: Red
By navigating this tree structure you can find any statement, expression, token, or bit of whitespace in a code file!
The following steps use Edit and Continue to demonstrate how to parse C# source text and find a parameter declaration contained in the source.
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.CSharp.Syntax;
SyntaxTree tree = CSharpSyntaxTree.ParseText(
@"using System;
using System.Collections;
using System.Linq;
using System.Text;
namespace HelloWorld
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(""Hello, World!"");
}
}
}");
var root = (CompilationUnitSyntax)tree.GetRoot();
var firstMember = root.Members[0];
var helloWorldDeclaration = (NamespaceDeclarationSyntax)firstMember;
var programDeclaration = (ClassDeclarationSyntax)helloWorldDeclaration.Members[0];
Execute this statement.
Locate the Main declaration in the programDeclaration.Members collection and store it in a new variable:
var mainDeclaration = (MethodDeclarationSyntax)programDeclaration.Members[0];
var argsParameter = mainDeclaration.ParameterList.Parameters[0];
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.CSharp.Syntax;
namespace GettingStartedCS
{
class Program
{
static void Main(string[] args)
{
SyntaxTree tree = CSharpSyntaxTree.ParseText(
@"using System;
using System.Collections;
using System.Linq;
using System.Text;
namespace HelloWorld
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(""Hello, World!"");
}
}
}");
var root = (CompilationUnitSyntax)tree.GetRoot();
var firstMember = root.Members[0];
var helloWorldDeclaration = (NamespaceDeclarationSyntax)firstMember;
var programDeclaration = (ClassDeclarationSyntax)helloWorldDeclaration.Members[0];
var mainDeclaration = (MethodDeclarationSyntax)programDeclaration.Members[0];
var argsParameter = mainDeclaration.ParameterList.Parameters[0];
}
}
}
In addition to traversing trees using the properties of the SyntaxNode derived classes you can also explore the syntax tree using the query methods defined on SyntaxNode. These methods should be immediately familiar to anyone familiar with XPath. You can use these methods with LINQ to quickly find things in a tree.
var firstParameters = from methodDeclaration in root.DescendantNodes()
.OfType<MethodDeclarationSyntax>()
where methodDeclaration.Identifier.ValueText == "Main"
select methodDeclaration.ParameterList.Parameters.First();
var argsParameter2 = firstParameters.Single();
Start debugging the program.
Open the Immediate Window.
Often you'll want to find all nodes of a specific type in a syntax tree, for example, every property declaration in a file. By extending the CSharpSyntaxWalker class and overriding the VisitPropertyDeclaration method you can process every property declaration in a syntax tree without knowing its structure beforehand. CSharpSyntaxWalker is a specific kind of SyntaxVisitor which recursively visits a node and each of its children.
This example shows how to implement a CSharpSyntaxWalker which examines an entire syntax tree and collects any using directives it finds which aren't importing a System namespace.
Create a new C# Stand-Alone Code Analysis Tool project; name it "UsingCollectorCS".
Add the following using directives to your Program.cs file:
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.CSharp.Syntax;
SyntaxTree tree = CSharpSyntaxTree.ParseText(
@"using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
namespace TopLevel
{
using Microsoft;
using System.ComponentModel;
namespace Child1
{
using Microsoft.Win32;
using System.Runtime.InteropServices;
class Foo { }
}
namespace Child2
{
using System.CodeDom;
using Microsoft.CSharp;
class Bar { }
}
}");
var root = (CompilationUnitSyntax)tree.GetRoot();
Note that this source text contains using directives scattered across four different locations: the file-level, in the top-level namespace, and in the two nested namespaces.
Add a new class file to the project.
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.CSharp.Syntax;
class UsingCollector : CSharpSyntaxWalker
public readonly List<UsingDirectiveSyntax> Usings = new List<UsingDirectiveSyntax>();
public override void VisitUsingDirective(UsingDirectiveSyntax node)
{
}
if (node.Name.ToString() != "System" &&
!node.Name.ToString().StartsWith("System."))
{
this.Usings.Add(node);
}
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.CSharp.Syntax;
namespace UsingCollectorCS
{
class UsingCollector : CSharpSyntaxWalker
{
public readonly List<UsingDirectiveSyntax> Usings = new List<UsingDirectiveSyntax>();
public override void VisitUsingDirective(UsingDirectiveSyntax node)
{
if (node.Name.ToString() != "System" &&
!node.Name.ToString().StartsWith("System."))
{
this.Usings.Add(node);
}
}
}
}
Return to the Program.cs file.
Add the following code to the end of the Main method to create an instance of the UsingCollector, use that instance to visit the root of the parsed tree, and iterate over the UsingDirectiveSyntax nodes collected and print their names to the Console:
var collector = new UsingCollector();
collector.Visit(root);
foreach (var directive in collector.Usings)
{
Console.WriteLine(directive.Name);
}
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.CSharp.Syntax;
namespace UsingCollectorCS
{
class Program
{
static void Main(string[] args)
{
SyntaxTree tree = CSharpSyntaxTree.ParseText(
@"using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
namespace TopLevel
{
using Microsoft;
using System.ComponentModel;
namespace Child1
{
using Microsoft.Win32;
using System.Runtime.InteropServices;
class Foo { }
}
namespace Child2
{
using System.CodeDom;
using Microsoft.CSharp;
class Bar { }
}
}");
var root = (CompilationUnitSyntax)tree.GetRoot();
var collector = new UsingCollector();
collector.Visit(root);
foreach (var directive in collector.Usings)
{
Console.WriteLine(directive.Name);
}
}
}
}
Microsoft.CodeAnalysis
Microsoft.CodeAnalysis.CSharp
Microsoft
Microsoft.Win32
Microsoft.CSharp
Press any key to continue . . .
Observe that the walker has located all non-System namespace using directives in all four places.
Congratulations! You've just used the Syntax API to locate specific kinds of C# statements and declarations in C# source code.