Case 2: New CE understands maximum value might be higher than when statistics were last gathered
The “ascending key problem” arises when query predicates reference newly inserted data that fall out of the range of a statistic object histogram. The following scenario illustrates the different behavior between the legacy and new CE:
USE [AdventureWorks2012]
GO
ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = On; -- CardinalityEstimationModelVersion 70
GO
-- First, verify there is no statistic for the column OrderDate in Sales.[SalesOrderHeader] table
SELECT [s].[object_id], [s].[name], [s].[auto_created] FROM sys.[stats] AS s
INNER JOIN sys.[stats_columns] AS [sc] ON [s].[stats_id] = [sc].[stats_id] AND [s].[object_id] = [sc].[object_id]
WHERE [s].[object_id] = OBJECT_ID('Sales.SalesOrderHeader') AND COL_NAME([s].[object_id], [sc].[column_id]) = 'OrderDate';
GO
-- Run this query below and this triggers automatic creation of statistics
SELECT [SalesOrderID], [OrderDate] FROM Sales.[SalesOrderHeader] WHERE [OrderDate] = '2005-07-01 00:00:00.000';
GO
-- Verify that there are now automatic stats generated for the OrderDate column
SELECT [s].[object_id], [s].[name], [s].[auto_created] FROM sys.[stats] AS s
INNER JOIN sys.[stats_columns] AS [sc] ON [s].[stats_id] = [sc].[stats_id] AND [s].[object_id] = [sc].[object_id]
WHERE [s].[object_id] = OBJECT_ID('Sales.SalesOrderHeader') AND COL_NAME([s].[object_id], [sc].[column_id]) = 'OrderDate';
GO
Checking the histogram for this statistics object shows that the highest step RANGE_HI_KEY value is 2014-06-30 00:00:00.000
DBCC SHOW_STATISTICS('Sales.SalesOrderHeader', _WA_Sys_00000003_4B7734FF);
The following statement inserts 50 new rows into the Sales.SalesOrderHeader table using a more recent value for the OrderDate column value:
USE [AdventureWorks2012]
GO
INSERT INTO Sales.[SalesOrderHeader] ( [RevisionNumber], [OrderDate],
[DueDate], [ShipDate], [Status],
[OnlineOrderFlag],
[PurchaseOrderNumber],
[AccountNumber], [CustomerID],
[SalesPersonID], [TerritoryID],
[BillToAddressID], [ShipToAddressID],
[ShipMethodID], [CreditCardID],
[CreditCardApprovalCode],
[CurrencyRateID], [SubTotal],
[TaxAmt], [Freight], [Comment] )
VALUES ( 3, '2021-02-02 00:00:00.000', '5/1/2021', '4/1/2021', 5, 0, 'SO43659', 'PO522145787',29825, 279, 5, 985, 985, 5, 21, 'Vi84182', NULL, 250.00,
25.00, 10.00, '' );
GO 50 -- INSERT 50 rows, representing very recent data, with a current OrderDate value
50 rows inserted in this example were not enough to cross the change threshold value for automatic statistics updates. Therefore, the existing statistics object histogram is unchanged when the following query executes against more recent data:
USE [AdventureWorks2012]
GO
SELECT [SalesOrderID], [OrderDate] FROM Sales.[SalesOrderHeader] WHERE [OrderDate] = '2021-02-02 00:00:00.000'
GO
The query execution plan for the legacy CE shows an estimate of 1 row versus the actual 50 returned rows. The reference to 2021-02-02 00:00:00.000 falls out of the range of the histogram.
The new CE assumes that the queried values do exist in the dataset even if the value falls out of the range of the histogram. The new CE in this example uses an average frequency that is calculated by multiplying the table cardinality by the density. the screenshot below shows the statistic header and density vector of the OrderDate statistics object. Based on this recent statistics update, the table row count is now 31,465. Multiplying 31,465 by the “All density” value of 0.0008896797 results in our value of 27.9938 (rounded up). it's not 100% accurate but much better than 1.
Commentaires