Cartesian Explosion and Split Query in Entity Framework Core


Cartesian explosion and split query in entity framework core

In this tutorial, we will understand what cartesian explosion is and how to solve that problem. Let’s look at this query:

var people = await ctx.People
   .Include(p => p.Emails)
   .Include(p => p.Addresses)
   .ToListAsync();

It translates to:

 SELECT "p"."Id", "p"."FirstName", "p"."LastName", "e"."Id", "e"."PersonEmail", "e"."PersonId", "a"."Id", "a"."PersonAddress", "a"."PersonId"
      FROM "People" AS "p"
      LEFT JOIN "Emails" AS "e" ON "p"."Id" = "e"."PersonId"
      LEFT JOIN "Addresses" AS "a" ON "p"."Id" = "a"."PersonId"
      ORDER BY "p"."Id", "e"."Id"
  • Person joins with Email and Address. Both joins are at the same level.
  • A person can have multiple emails and adressess.

Let’s say a person with id = 1 have 10 emails and 10 addresses. The query returns 1x10x10 = 100 rows for 1 person. It is just for one person, how much they can be for 100 people. This problem is known as cartesian explosion

surprised cat [Image credit: pngegg.com]

📢 Cartesian explosion does not occurs if joins are in multiple leveles. Let’s say Person->Order->OrderItems. In this situation, Person joins with Order and Order joins with OrderItems. Ef core Query would be something like ctx.People.Include(p=>p.Orders).ThenInclude(o=>o.OrderItems).

What is the solution then? [Image generated by Gemini]

EF Core 5.0 introduced a feature called SplitQuery.

var people = await ctx.People
   .Include(p => p.Emails)
   .Include(p => p.Addresses)
   .AsSplitQuery()
   .ToListAsync();

Which will translate into:

info: Microsoft.EntityFrameworkCore.Database.Command[20101]
      Executed DbCommand (33ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
      SELECT "p"."Id", "p"."FirstName", "p"."LastName"
      FROM "People" AS "p"
      ORDER BY "p"."Id"

info: Microsoft.EntityFrameworkCore.Database.Command[20101]
      Executed DbCommand (1ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
      SELECT "e"."Id", "e"."PersonEmail", "e"."PersonId", "p"."Id"
      FROM "People" AS "p"
      INNER JOIN "Emails" AS "e" ON "p"."Id" = "e"."PersonId"
      ORDER BY "p"."Id"

info: Microsoft.EntityFrameworkCore.Database.Command[20101]
      Executed DbCommand (0ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
      SELECT "a"."Id", "a"."PersonAddress", "a"."PersonId", "p"."Id"
      FROM "People" AS "p"
      INNER JOIN "Addresses" AS "a" ON "p"."Id" = "a"."PersonId"
      ORDER BY "p"."Id"

It will more helpful if person has 10 email and 10 addresses. Query with join return 1 x 10 x 10 = 100 rows while split query return 1 + 10 + 10 = 21 rows. Which is quite an improvement.