top of page
Writer's pictureGeorge Lin

How CE Change Should Impact SQL Queries By Examples Part 8: Distinct Value Count

Updated: Nov 16, 2021


Case 7: New CE changes distinct value count estimation


For queries where many-to-many join operations are not involved, distinct value count estimation differences between the legacy and new CE are minimal. For example, the following query joins two tables and groups using one column from each table:

USE [AdventureWorksDW2012];
GO
 
-- Enable "Include Actual Execution Plan"
 
ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = On;  -- CardinalityEstimationModelVersion 70
GO
 
SELECT  [f].[ProductKey], [d].[DayNumberOfYear]
FROM    dbo.[FactInternetSales] AS [f]
INNER JOIN dbo.[DimDate] AS [d]
    ON [f].[OrderDateKey] = [d].[DateKey]
WHERE   [f].[SalesTerritoryKey] = 8
GROUP BY [f].[ProductKey], [d].[DayNumberOfYear];  -- Estimated number of rows is 3811.19 for Distinct Sort operator
 
ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = Off;  -- CardinalityEstimationModelVersion 130
GO
 
SELECT  [f].[ProductKey], [d].[DayNumberOfYear]
FROM    dbo.[FactInternetSales] AS [f]
INNER JOIN dbo.[DimDate] AS [d]
    ON [f].[OrderDateKey] = [d].[DateKey]
WHERE   [f].[SalesTerritoryKey] = 8
GROUP BY [f].[ProductKey], [d].[DayNumberOfYear]; -- Estimated number of rows is 5287.73 for Distinct Sort operator

There were minimal estimation differences for a Sort (Distinct Sort) operator, and the plan shape was identical for both the legacy and new CE.

Actual Rows: 4723

New CE Estimated Rows: 5287.73

Legacy CE Estimated Rows: 3811.19

If a join condition amplifies the cardinality of either side of the join using a many-to-many join operation, the legacy CE may provide inaccurate estimates. The new CE establishes common-sense boundaries based on the chosen distinct values and the participating join or filter predicates defined in a query. The new CE uses “ambient cardinality”, which is the cardinality of the smallest set of joins that contains the GROUP BY or DISTINCT columns. This effectively reduces the distinct count when the overall join cardinality itself is large.

The following example extends the previous example by adding an additional INNER JOIN to the FactProductInventory table (estimates are for the Distinct Sort operator).

USE [AdventureWorksDW2012];
GO
 
-- Enable "Include Actual Execution Plan"
 
ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = On;  -- CardinalityEstimationModelVersion 70
GO
 
SELECT [f].[ProductKey], [d].[DayNumberOfYear]
FROM    dbo.[FactInternetSales] AS [f]
INNER JOIN dbo.[DimDate] AS [d]
    ON [f].[OrderDateKey] = [d].[DateKey]
INNER JOIN dbo.[FactProductInventory] AS fi
       ON  [fi].[DateKey] = [d].[DateKey]
WHERE [f].[SalesTerritoryKey] = 8
GROUP BY [f].[ProductKey], [d].[DayNumberOfYear];  -- Estimated number of rows is 21944.5 for Distinct Sort operator
 
ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = Off;  -- CardinalityEstimationModelVersion 130
GO
 
SELECT [f].[ProductKey], [d].[DayNumberOfYear]
FROM    dbo.[FactInternetSales] AS [f]
INNER JOIN dbo.[DimDate] AS [d]
    ON [f].[OrderDateKey] = [d].[DateKey]
INNER JOIN dbo.[FactProductInventory] AS fi
       ON  [fi].[DateKey] = [d].[DateKey]
WHERE [f].[SalesTerritoryKey] = 8
GROUP BY [f].[ProductKey], [d].[DayNumberOfYear]; -- Estimated number of rows is 5252.62 for Distinct Sort operator

The new CE estimate is pretty much same as the previous one based on the smallest join that contains the GROUP BY columns. However, the legacy CE significantly overestimates the number of rows because of the presence of a many-to-many join to the FactProductInventory table.

Actual Rows: 4723

New CE Estimated Rows: 5252.62

Legacy CE Estimated Rows: 21944.5


0 comments

Recent Posts

See All

Azure SQL How To - SQL Managed Instance

Task 1: Copy encryption certificate on SQL Server to SQL MI On the source SQL Server machine, in a query window, run script below to...

Comments


bottom of page