Case 7: New CE changes distinct value count estimation
For queries where many-to-many join operations are not involved, distinct value count estimation differences between the legacy and new CE are minimal. For example, the following query joins two tables and groups using one column from each table:
USE [AdventureWorksDW2012];
GO
-- Enable "Include Actual Execution Plan"
ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = On; -- CardinalityEstimationModelVersion 70
GO
SELECT [f].[ProductKey], [d].[DayNumberOfYear]
FROM dbo.[FactInternetSales] AS [f]
INNER JOIN dbo.[DimDate] AS [d]
ON [f].[OrderDateKey] = [d].[DateKey]
WHERE [f].[SalesTerritoryKey] = 8
GROUP BY [f].[ProductKey], [d].[DayNumberOfYear]; -- Estimated number of rows is 3811.19 for Distinct Sort operator
ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = Off; -- CardinalityEstimationModelVersion 130
GO
SELECT [f].[ProductKey], [d].[DayNumberOfYear]
FROM dbo.[FactInternetSales] AS [f]
INNER JOIN dbo.[DimDate] AS [d]
ON [f].[OrderDateKey] = [d].[DateKey]
WHERE [f].[SalesTerritoryKey] = 8
GROUP BY [f].[ProductKey], [d].[DayNumberOfYear]; -- Estimated number of rows is 5287.73 for Distinct Sort operator
There were minimal estimation differences for a Sort (Distinct Sort) operator, and the plan shape was identical for both the legacy and new CE.
Actual Rows: 4723
New CE Estimated Rows: 5287.73
Legacy CE Estimated Rows: 3811.19
If a join condition amplifies the cardinality of either side of the join using a many-to-many join operation, the legacy CE may provide inaccurate estimates. The new CE establishes common-sense boundaries based on the chosen distinct values and the participating join or filter predicates defined in a query. The new CE uses “ambient cardinality”, which is the cardinality of the smallest set of joins that contains the GROUP BY or DISTINCT columns. This effectively reduces the distinct count when the overall join cardinality itself is large.
The following example extends the previous example by adding an additional INNER JOIN to the FactProductInventory table (estimates are for the Distinct Sort operator).
USE [AdventureWorksDW2012];
GO
-- Enable "Include Actual Execution Plan"
ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = On; -- CardinalityEstimationModelVersion 70
GO
SELECT [f].[ProductKey], [d].[DayNumberOfYear]
FROM dbo.[FactInternetSales] AS [f]
INNER JOIN dbo.[DimDate] AS [d]
ON [f].[OrderDateKey] = [d].[DateKey]
INNER JOIN dbo.[FactProductInventory] AS fi
ON [fi].[DateKey] = [d].[DateKey]
WHERE [f].[SalesTerritoryKey] = 8
GROUP BY [f].[ProductKey], [d].[DayNumberOfYear]; -- Estimated number of rows is 21944.5 for Distinct Sort operator
ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = Off; -- CardinalityEstimationModelVersion 130
GO
SELECT [f].[ProductKey], [d].[DayNumberOfYear]
FROM dbo.[FactInternetSales] AS [f]
INNER JOIN dbo.[DimDate] AS [d]
ON [f].[OrderDateKey] = [d].[DateKey]
INNER JOIN dbo.[FactProductInventory] AS fi
ON [fi].[DateKey] = [d].[DateKey]
WHERE [f].[SalesTerritoryKey] = 8
GROUP BY [f].[ProductKey], [d].[DayNumberOfYear]; -- Estimated number of rows is 5252.62 for Distinct Sort operator
The new CE estimate is pretty much same as the previous one based on the smallest join that contains the GROUP BY columns. However, the legacy CE significantly overestimates the number of rows because of the presence of a many-to-many join to the FactProductInventory table.
Actual Rows: 4723
New CE Estimated Rows: 5252.62
Legacy CE Estimated Rows: 21944.5
Comments