entity-framework/core/querying/single-split-queries.md
When working against relational databases, EF loads related entities by introducing JOINs into a single query. While JOINs are quite standard when using SQL, they can create significant performance issues if used improperly. This page describes these performance issues, and shows an alternative way to load related entities which works around them.
Let's examine the following LINQ query and its translated SQL equivalent:
var blogs = await ctx.Blogs
.Include(b => b.Posts)
.Include(b => b.Contributors)
.ToListAsync();
SELECT [b].[Id], [b].[Name], [p].[Id], [p].[BlogId], [p].[Title], [c].[Id], [c].[BlogId], [c].[FirstName], [c].[LastName]
FROM [Blogs] AS [b]
LEFT JOIN [Posts] AS [p] ON [b].[Id] = [p].[BlogId]
LEFT JOIN [Contributors] AS [c] ON [b].[Id] = [c].[BlogId]
ORDER BY [b].[Id], [p].[Id]
In this example, since both Posts and Contributors are collection navigations of Blog - they're at the same level - relational databases return a cross product: each row from Posts is joined with each row from Contributors. This means that if a given blog has 10 posts and 10 contributors, the database returns 100 rows for that single blog. This phenomenon - sometimes called cartesian explosion - can cause huge amounts of data to unintentionally get transferred to the client, especially as more sibling JOINs are added to the query; this can be a major performance issue in database applications.
Note that cartesian explosion does not occur when the two JOINs aren't at the same level:
var blogs = await ctx.Blogs
.Include(b => b.Posts)
.ThenInclude(p => p.Comments)
.ToListAsync();
SELECT [b].[Id], [b].[Name], [t].[Id], [t].[BlogId], [t].[Title], [t].[Id0], [t].[Content], [t].[PostId]
FROM [Blogs] AS [b]
LEFT JOIN [Posts] AS [p] ON [b].[Id] = [p].[BlogId]
LEFT JOIN [Comment] AS [c] ON [p].[Id] = [c].[PostId]
ORDER BY [b].[Id], [t].[Id]
In this query, Comments is a collection navigation of Post, unlike Contributors in the previous query, which was a collection navigation of Blog. In this case, a single row is returned for each comment that a blog has (through its posts), and a cross product does not occur.
JOINs can create another type of performance issue. Let's examine the following query, which only loads a single collection navigation:
var blogs = await ctx.Blogs
.Include(b => b.Posts)
.ToListAsync();
SELECT [b].[Id], [b].[Name], [b].[HugeColumn], [p].[Id], [p].[BlogId], [p].[Title]
FROM [Blogs] AS [b]
LEFT JOIN [Posts] AS [p] ON [b].[Id] = [p].[BlogId]
ORDER BY [b].[Id]
Examining at the projected columns, each row returned by this query contains properties from both the Blogs and Posts tables; this means that the blog properties are duplicated for each post that the blog has. While this is usually normal and causes no issues, if the Blogs table happens to have a very big column (e.g. binary data, or a huge text), that column would get duplicated and sent back to the client multiple times. This can significantly increase network traffic and adversely affect your application's performance.
If you don't actually need the huge column, it's easy to simply not query for it:
var blogs = await ctx.Blogs
.Select(b => new
{
b.Id,
b.Name,
b.Posts
})
.ToListAsync();
By using a projection to explicitly choose which columns you want, you can omit big columns and improve performance; note that this is a good idea regardless of data duplication, so consider doing it even when not loading a collection navigation. However, since this projects the blog to an anonymous type, the blog isn't tracked by EF and changes to it can't be saved back as usual.
It's worth noting that unlike cartesian explosion, the data duplication caused by JOINs isn't typically significant, as the duplicated data size is negligible; this typically is something to worry about only if you have big columns in your principal table.
To work around the performance issues described above, EF allows you to specify that a given LINQ query should be split into multiple SQL queries. Instead of JOINs, split queries generate an additional SQL query for each included collection navigation:
[!code-csharpMain]
It will produce the following SQL:
SELECT [b].[BlogId], [b].[OwnerId], [b].[Rating], [b].[Url]
FROM [Blogs] AS [b]
ORDER BY [b].[BlogId]
SELECT [p].[PostId], [p].[AuthorId], [p].[BlogId], [p].[Content], [p].[Rating], [p].[Title], [b].[BlogId]
FROM [Blogs] AS [b]
INNER JOIN [Posts] AS [p] ON [b].[BlogId] = [p].[BlogId]
ORDER BY [b].[BlogId]
[!WARNING] When using split queries with Skip/Take on EF versions prior to 10, pay special attention to making your query ordering fully unique; not doing so could cause incorrect data to be returned. For example, if results are ordered only by date, but there can be multiple results with the same date, then each one of the split queries could each get different results from the database. Ordering by both date and ID (or any other unique property or combination of properties) makes the ordering fully unique and avoids this problem. Note that relational databases do not apply any ordering by default, even on the primary key.
[!NOTE] One-to-one related entities are always loaded via JOINs in the same query, as it has no performance impact.
You can also configure split queries as the default for your application's context:
[!code-csharpMain]
When split queries are configured as the default, it's still possible to configure specific queries to execute as single queries:
[!code-csharpMain]
EF Core uses single query mode by default in the absence of any configuration. Since it may cause performance issues, EF Core generates a warning whenever following conditions are met:
AsSingleQuery/AsSplitQuery operator on the query.To turn off the warning, configure query splitting mode globally or at the query level to an appropriate value.
While split query avoids the performance issues associated with JOINs and cartesian explosion, it also has some drawbacks:
Unfortunately, there isn't one strategy for loading related entities that fits all scenarios. Carefully consider the advantages and disadvantages of single and split queries to select the one that fits your needs.