Posting the details of this situation so that others (including Microsoft)
are aware...
We had a simple query such as this on a multi-processor SQL Server:
SELECT count(*) FROM [Northwind].[dbo].[Orders] WHERE ShipRegion IS NULL
That would generate various severe errors (3624, 5180, 823), some of which
at first appeared to be hardware related. The errors were not generated if
the query were run with OPTION (MAXDOP 1). In the end a DBCC REINDEX of a
particular index on the table involved in the query fixed the problem,
although DBCC checks never indicated any problem with indexes.
Symptoms:
--
1. A stack dump indicating a failed assertion in file recbase.cpp
2. One of the following before the stack dump:
A. Error 3624 (retail assertion) of severity 20 (fatal error in current
process) that has follow-up message indicating the cause of the error, i.e.,
just "Error: 3624, Severity: 20, State: 1".
B. Error 5180 (could not open FCB for invalid file) of severity 22 (fatal
error, table integrity suspect) such as the following: "Error: 5180,
Severity: 22, State: 1 <next log entry> Could not open FCB for invalid file
ID 768 in database 'Northwind' ".
C. Error 823 (I/O error) of severity 24 (hardware error) such as the
following: "Error: 823, Severity: 24, State: 2 <new log entry> I/O error
38(Reached the end of the file.) detected during read at offset
0x00002000600000 in file 'C:\Program Files\Microsoft SQL
Server\MSSQL\Northwind.MDF' ".
3. The offending command that generated the initial failed assertion
continues to throw one of three errors listed above on subsequent
executions.
4. There are no hardware-related events in the Windows/NT Event Log.
5. The offending command uses parallelism.
6. When the offending command is run with OPTION (MAXDOP 1) to force serial
execution, it does not throw an error.
7. The following commands do not detect any integrity errors: DBCC
CHECKDB, DBCC CHECKCATALOG, DBCC CHECKTABLE (table only or table plus index
arguments), and DBCC CHECKFILEGROUP.
Investigation Notes:
--
1. Issue came up with the following statement sql statement
SELECT count(*) FROM [Northwind].[dbo].[Orders] WHERE ShipRegion IS NULL
2. Adding OPTION (MAXDOP 1) brought back results
3. Adding WITH (NOLOCK) had no effect.
4. Individual column names were substituted for the * in the SELECT clause.
The results were mixed: some executions brought back results and others
returned one of the errors listed in the Symptoms above. Checking the
execution plans for all the variants using the MAXDOP option (so that the
actual plan could be retrieved, not just the planned one) revealed that the
failing ones all used a particular nonclustered index while the successful
ones used the clustered index.
5. When running the DBCC CHECK commands listed in the symptoms to try to
rectify the issue, the filegroup and indid of the problem index where used
where possible (in addition to the more generic executions with just a table
name). No DBCC commands revealed any integrity errors, though, regardless
of whether they used the table name or the table name plus the
filegroup/index.
Resolution:
--
DBCC REINDEX of the problematic index resolved the issue without
serialization of the query plan (MAXDOP 1). Once working, the actual
execution plan was checked to ensure it used parallelism; it did.
Subsequent tests with OPTION (MAXDOP 1) continued to work, as did the
inclusion/omission of locking hints like (NOLOCK).Hi Frank,
Thans for sharing your experience with MSDN Newsgroup!
Sincerely yours,
Mingqing Cheng
Microsoft Online Support
---
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
Please reply to newsgroups only, many thanks!
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment