Case 3: New CE calculates simple join differently
For joins with a single equality or inequality predicate, the legacy CE joins the histograms on the join columns by aligning the two histograms step-by-step using linear interpolation. This method could result in inconsistent cardinality estimates. Therefore, the new CE now uses a simpler join estimate algorithm that aligns histograms using only minimum and maximum histogram boundaries.
In the query example below, the following join estimates are nearly identical in the new and legacy CE:
USE [master]
GO
ALTER DATABASE [AdventureWorksDW2012] SET COMPATIBILITY_LEVEL = 130 --SQL Server 2016
GO
USE [AdventureWorksDW2012]
GO
ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = On; -- CardinalityEstimationModelVersion 70
GO
SELECT [e1].[EmployeeKey],
[e1].[ParentEmployeeKey],
[e2].[EmployeeKey],
[e2].[ParentEmployeeKey],
[e2].[StartDate],
[e2].[EndDate]
FROM dbo.[DimEmployee] AS [e1]
INNER JOIN dbo.[DimEmployee] AS [e2]
ON [e1].[StartDate] < [e2].[StartDate];
-- Run the same query after enabling the new CE
ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = Off; -- CardinalityEstimationModelVersion 130
GO
SELECT [e1].[EmployeeKey],
[e1].[ParentEmployeeKey],
[e2].[EmployeeKey],
[e2].[ParentEmployeeKey],
[e2].[StartDate],
[e2].[EndDate]
FROM dbo.[DimEmployee] AS [e1]
INNER JOIN dbo.[DimEmployee] AS [e2]
ON [e1].[StartDate] < [e2].[StartDate];
Actual Rows: 43450
New CE Estimated Rows: 43807.5
Legacy CE Estimated Rows: 43481
While the estimates vary slightly using the new CE, the Query Optimizer selects the identical plan shape in both cases.
Although potentially less consistent, the legacy CE may produce slightly better simple-join condition estimates because of the step-by-step histogram alignment. The new CE uses a coarse alignment. However, the difference in estimates may be small enough that it will be less likely to cause a plan quality issue.
The following is an example where the join estimate changes result in a different query execution plan shape. (This example assumes that parallelism is available and enabled for the SQL Server instance). The following query uses a simple-join condition in the context of the new CE.
USE [AdventureWorksDW2012]
GO
ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = Off; -- CardinalityEstimationModelVersion 130
GO
SELECT [fs].[ProductKey], [fs].[OrderDateKey], [fs].[DueDateKey],
[fs].[ShipDateKey], [fc].[DateKey],
[fc].[AverageRate], [fc].[EndOfDayRate], [fc].[Date]
FROM dbo.[FactResellerSales] AS [fs]
INNER JOIN dbo.[FactCurrencyRate] AS [fc]
ON [fs].[CurrencyKey] = [fc].[CurrencyKey]
Using the new CE, the Query Optimizer estimated rows = 58,953,400, well Actual rows = 70,470,090 and it selects a serial query execution plan:
In contrast, the legacy cardinality estimate for the join is 70,470,100, which is very close to the actual row count of 70,470,090. The query plan shape also differs from the new CE; it uses a Merge Join physical algorithm and parallelism operators.
USE [AdventureWorksDW2012]
GO
ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = On; -- CardinalityEstimationModelVersion 70
GO
SELECT [fs].[ProductKey], [fs].[OrderDateKey], [fs].[DueDateKey],
[fs].[ShipDateKey], [fc].[DateKey],
[fc].[AverageRate], [fc].[EndOfDayRate], [fc].[Date]
FROM dbo.[FactResellerSales] AS [fs]
INNER JOIN dbo.[FactCurrencyRate] AS [fc]
ON [fs].[CurrencyKey] = [fc].[CurrencyKey]
Even though the query execution duration is almost same(new CE is 00:14:28, legacy CE is 00:14:33), in this specific case, the estimate is worse for the new CE because the FactResellerSales table’s CurrencyKey column has a jagged data distribution. For example, the key with the value of 100 has a higher occurrence than other values as showed below.
USE [AdventureWorksDW2012]
GO
select CurrencyKey,count(*) from dbo.FactResellerSales group by CurrencyKey
USE [AdventureWorksDW2012]
GO
SELECT [s].[object_id], [s].[name], [s].[auto_created] FROM sys.[stats] AS s
INNER JOIN sys.[stats_columns] AS [sc] ON [s].[stats_id] = [sc].[stats_id] AND [s].[object_id] = [sc].[object_id]
WHERE [s].[object_id] = OBJECT_ID('dbo.FactResellerSales') AND COL_NAME([s].[object_id], [sc].[column_id]) = 'CurrencyKey';
GO
DBCC SHOW_STATISTICS('dbo.FactResellerSales', _WA_Sys_00000008_2A4B4B5E);
With the new CE, jagged distributions are evened out by coarse alignment of the joined table histograms. Because of the coarse alignment of minimum and maximum histogram steps, we get a less accurate join estimate in this example. In contrast, the step-by-step join alignment algorithm the legacy CE uses includes the exact frequency of each key value. This leads to an almost perfect join estimate.
Comments