SQL Server 2008 delete large amount of data

1, Write it in the front. - it's not easy to say love you ...

1, Write it in the front. - it's not easy to say love you

In order to upgrade the database to SQL Server 2008 R2, an existing PC was used for testing. The database was restored from the official database (the size of three databases is exaggerated to 100G +), and the machine memory is only 4G, so it not only has to assume the role of DB Server, but also as a Web Server. It can be imagined that the fate of this machine is extremely tragic, as long as MS SQL As soon as the server is started, the memory usage rate soars to 99%. There's no way to do this. It's only possible to upgrade the memory. When two 8G 16G memories are replaced, the result is still the same. The memory is suddenly killed by seconds (CPU utilization is around 0%). Because it's a PC, there are two memory slots. At present, the largest single memory on the market is 16G (price 1K +). Even if you buy it back, the memory is still not enough (horizontal slot, PC can't afford to hurt it). It seems that there is no other way -- delete data!!!

DELETE data - easy to say, is not DELETE? By the way, if I do this, I think XXX can "know what Shanghai looks like at 4am" (KB, Sorry, who let me be the Programmer of XXX, brother is definitely better than you in this respect), and I think it will burst the database (the disk space is insufficient, the generated log file is too large).

2, Battle spot - find him in the crowd

In order to better explain the difficulties and problems I have encountered, it is necessary to do some necessary tests and explanations, which is also a kind of exploration on how to solve the problem. After all, the fundamental problem is how to operate data better and faster. In the final analysis, it is the optimized operation combination of DELETE, UPDATE, INSERT, TRUNCATE, DROP, etc. Our purpose is to find the best and fastest way. In order to facilitate the test, a test table tmp is prepared_ employee

--Create table tmp_employee CREATE TABLE [dbo].[tmp_employee] ( [EmployeeNo] INT PRIMARY KEY, [EmployeeName] [nvarchar](50) NULL, [CreateUser] [nvarchar](50) NULL, [CreateDatetime] [datetime] NULL );

1. Data insertion PK

One point one . loop insertion, execution time 38026 MS
--Circular insertion SET STATISTICS TIME ON; DECLARE @Index INT = 1; DECLARE @Timer DATETIME = GETDATE(); WHILE @Index <= 100000 BEGIN INSERT [dbo].[tmp_employee](EmployeeNo, EmployeeName, CreateUser, CreateDatetime) VALUES(@Index, 'Employee_' + CAST(@Index AS CHAR(6)), 'system', GETDATE()); SET @Index = @Index + 1; END SELECT DATEDIFF(MS, @Timer, GETDATE()) AS [execution time(millisecond)]; SET STATISTICS TIME OFF;
One point two . transaction loop insertion, execution time 6640 MS
--Transaction cycle BEGIN TRAN; SET STATISTICS TIME ON; DECLARE @Index INT = 1; DECLARE @Timer DATETIME = GETDATE(); WHILE @Index <= 100000 BEGIN INSERT [dbo].[tmp_employee](EmployeeNo, EmployeeName, CreateUser, CreateDatetime) VALUES(@Index, 'Employee_' + CAST(@Index AS CHAR(6)), 'system', GETDATE()); SET @Index = @Index + 1; END SELECT DATEDIFF(MS, @Timer, GETDATE()) AS [execution time(millisecond)]; SET STATISTICS TIME OFF; COMMIT;
One point three . batch insert, execution time 220 ms
SET STATISTICS TIME ON; DECLARE @Timer DATETIME = GETDATE(); INSERT [dbo].[tmp_employee](EmployeeNo, EmployeeName, CreateUser, CreateDatetime) SELECT TOP(100000) EmployeeNo = ROW_NUMBER() OVER (ORDER BY C1.[OBJECT_ID]), 'Employee_', 'system', GETDATE() FROM SYS.COLUMNS AS C1 CROSS JOIN SYS.COLUMNS AS C2 ORDER BY C1.[OBJECT_ID] SELECT DATEDIFF(MS, @Timer, GETDATE()) AS [execution time(millisecond)]; SET STATISTICS TIME OFF;
One point four . CTE insertion, execution time 220 ms
SET STATISTICS TIME ON; DECLARE @Timer DATETIME = GETDATE(); ;WITH CTE(EmployeeNo, EmployeeName, CreateUser, CreateDatetime) AS( SELECT TOP(100000) EmployeeNo = ROW_NUMBER() OVER (ORDER BY C1.[OBJECT_ID]), 'Employee_', 'system', GETDATE() FROM SYS.COLUMNS AS C1 CROSS JOIN SYS.COLUMNS AS C2 ORDER BY C1.[OBJECT_ID] ) INSERT [dbo].[tmp_employee] SELECT EmployeeNo, EmployeeName, CreateUser, CreateDatetime FROM CTE; SELECT DATEDIFF(MS, @Timer, GETDATE()) AS [execution time(millisecond)]; SET STATISTICS TIME OFF;

Summary:

  • According to the execution time, the efficiency is as follows: CTE has the same efficiency as batch insertion, with the fastest speed, followed by transaction insertion, and single loop insertion is the slowest;
  • The slowest single loop insertion speed is due to the fact that there are logs in INSERT every time. Transaction insertion greatly reduces the number of log writes. Batch insertion has only one log. The basis of CTE is CLR, and the best use speed is the fastest.

2. Data deletion PK

Two point one . loop delete with 1240 MS execution time
SET STATISTICS TIME ON; DECLARE @Timer DATETIME = GETDATE(); DELETE FROM [dbo].[tmp_employee]; SELECT DATEDIFF(MS, @Timer, GETDATE()) AS [execution time(millisecond)]; SET STATISTICS TIME OFF;
Two point two . batch deletion, execution time 106 MS
SET STATISTICS TIME ON; DECLARE @Timer DATETIME = GETDATE(); SET ROWCOUNT 100000; WHILE 1 = 1 BEGIN BEGIN TRAN DELETE FROM [dbo].[tmp_employee]; COMMIT IF @@ROWCOUNT = 0 BREAK; END SET ROWCOUNT 0; SELECT DATEDIFF(MS, @Timer, GETDATE()) AS [execution time(millisecond)]; SET STATISTICS TIME OFF;
Two point three . TRUNCATE delete, execution time 0 Ms
SET STATISTICS TIME ON; DECLARE @Timer DATETIME = GETDATE(); TRUNCATE TABLE [dbo].[tmp_employee]; SELECT DATEDIFF(MS, @Timer, GETDATE()) AS [execution time(millisecond)]; SET STATISTICS TIME OFF;

Summary:

  • TRUNCATE is too fast. There is no pressure to clear 10W data. Batch deletion is the second, and the last delete is too slow;
  • TRUNCATE is fast because it belongs to DDL statement, and only a few logs will be generated. Ordinary DELETE will not only generate logs, but also lock records.

3, Sharpening the sabre, Huohuo, holding the lute, half covering the face

As we know from the second point above, the fastest way to insert and delete is batch insert and TRUNCATE, respectively. So in order to delete big data, we will also adopt the combination of these two ways. The central idea is to first store the data to be retained in the new table, then TRUNCATE the data in the original table, and finally insert the data back in batch. Of course, it is true The present method can also be changed at will.

1. Keep the required data in the new table - > truncate the original table data - > restore the original table of the previously reserved data

The script looks like this

SELECT * INTO #keep FROM Original WHERE CreateDate > '2011-12-31' TRUNCATE TABLE Original INSERT Original SELECT * FROM #keep

The first statement will store all the data to be reserved in the table "keep" (table "keep" does not need to be created manually, but is effective by SELECT INTO), "keep" will Copy the Original table structure. PS: if you only want to create table structure, but do not Copy data, the corresponding script is as follows

SELECT * INTO #keep FROM Original WHERE 1 = 2

The second statement is used to clear the data in the whole table, and the generated log file can be ignored basically; the third statement is used to restore the reserved data.

Several notes:

  • You can create a "keep" by writing a Script (or copying an existing table) without selecting into, but the latter has one disadvantage, that is, you can't get the corresponding table generated Script through SQL Script (I mean the Script completely consistent with the original table, i.e. basic column, attribute, index, constraint, etc.), and when there are many tables to operate, you must be crazy;
  • Since the first point is not right, how about creating a new database? You can use existing scripts, and the generated database is basically the same, but I tell you it's better not to do so, because the first is to cross the library, and the second is to prepare enough disk space.

2. Create a new table structure - > batch insert the data to be retained - > drop the original table - > rename the new table to the original table

Create table ා keep as (xxx) XXX -- use the method mentioned above (use the creation script of the existing table), but it cannot guarantee complete consistency;

INSERT #keep SELECT * FROM Original where clause

DROP TBALE Original

EXEC SP_RENAME '#keep','Original'

This way is as like as two peas in the first way, because the data recovery is omitted, that is, the last step of data recovery, but it is a little troublesome, because you need to create a table structure that is exactly the same as before, including basic columns, attributes, constraints, and so on.

3, Data shrinkage - autumn wind leaves less

After data deletion, it is found that the space occupied by the database has not changed. At this time, we use the powerful data shrinking function. The script is as follows. The running time is uncertain, depending on the size of your database. It can take dozens of minutes more or seconds less

DBCC SHRINKDATABASE(DB_NAME)

source: https://www.cnblogs.com/panchunting/archive/2013/04/27/SQL_Tech_001.html

=====================================================================================

In SQL Server, how to delete data in large tables quickly? Before answering this question, we must make clear the context and the actual and specific needs, different scenarios have different ways to deal with it.

1: Delete all data in the whole table

If the data of the whole table is cleared and deleted, this scenario is very simple, and TRUNCATE TABLE is certainly the fastest. Instead, DELETE is a bad strategy.

2: Delete part of data in large table

For scenario 1, it is very simple, but in many real business scenarios, TRUNCATE cannot be used. In reality, it may only delete part of the data in the table or delete after data archiving. Assuming that the table we encountered is TEST, we need to delete some data in the TEST table. First of all, we need to make a summary statistics on the data quantity of the table and the deleted data quantity. Specifically, we should adopt the following methods:

·Check the amount of data in the table and the amount of data to delete. Then calculate the deleted ratio,

sp_spaceused'dbo.TEST';

    SELECT COUNT(*) AS DELETE_ RCD where test where... < delete condition >

two point one Delete most of the data in the large table, but how to define most of the data is not easy to quantify, so let's quantify it as 60%. If more than 60% of the data is deleted, the following methods are used:

1: New table TEST_TMP

2: Transfer data to be retained to TEST_TMP

3: Rename the original table TEST to TEST_OLD, but TEST_TMP renamed TEST

4: Check related triggers and constraints, and rename them

5: After checking whether the operation is correct, the original table (TEST_OLD) or true cate, and then DROP. Or keep it for a while, on the safe side.

Note: as for how this proportion of 60% comes from. This is an empirical value, with simple tests, but not very accurate and scientific probability statistical verification.

In addition, we should also consider the business situation. If there are applications accessing this table all the time, in fact, this method is also troublesome, because it involves data consistency, business interruption and many other situations. However, if the program has less access, or there is no access in a certain period of time, this method can be fully adopted.

two point two Delete part of the data in the large table, if the proportion does not exceed 60%

1: First DELETE or disable the irrelevant index (irrelevant index, which refers to the index not used by the execution plan, which refers to the index useless for the current DELETE statement). Because DELETE operation belongs to DML operation, and the index of large table is generally very large, a large number of DELETE operations will maintain the index, resulting in a large number of additional IO operations.

2: Delete in small batches (batch deletion is much faster than one-time deletion). Do not delete large amounts of data at once. Delete a large number of records at once. It will lead to a wide range of lock granularity, and the locking time is very long, and it may also cause blocking, seriously affect the business and so on. And the transaction log of the database becomes very large. The execution time becomes too long and the performance is very poor.

When deleting in batch, how many records are deleted at one time? Is SQL the most efficient? There is really no rule calculation, personal test comparison, delete 10000 or 100000 at a time, no special rule found. (some of the "rules" you find, in another case, find different results. This is related to the environment, and sometimes it may be an experience value.). However, 10000 is generally used. In the actual operation process, I suggest that you can choose a suitable value after several experiments.

Case 1:

DECLARE @delete_rows INT;
DECLARE @delete_sum_rows INT =0;
DECLARE @row_count INT=100000
WHILE 1 = 1
BEGIN
DELETE TOP ( @row_count )
FROM dbo.[EmployeeDayData]
WHERE WorkDate < CONVERT(DATETIME, '2012-01-01 00:00:00',120);
SELECT @delete_rows = @@ROWCOUNT;
SET @delete_sum_rows +=@delete_rows
IF @delete_rows = 0
BREAK;
END;
SELECT @delete_sum_rows;

Case 2:

DECLARE @r INT;
DECLARE @Delete_ROWS BIGINT;
SET @r = 1;
SET @Delete_ROWS =0
WHILE @r > 0
BEGIN
BEGIN TRANSACTION;
DELETE TOP (10000) -- this will change
YourSQLDba..YdYarnMatch
WHERE Remark='Not entered today' and Operation_Date<CONVERT(datetime, '2019-05-30',120);
SET @r = @@ROWCOUNT;
SET @Delete_ROWS += @r;
COMMIT TRANSACTION;
PRINT(@Delete_ROWS);
END

The table has the following two indexes

USE [YourSQLDba]
GO
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[YdYarnMatch]') AND name = N'IX_YdYarnMatch_N2')
DROP INDEX [IX_YdYarnMatch_N2] ON [dbo].[YdYarnMatch] WITH ( ONLINE = OFF )
GO
USE [YourSQLDba]
GO
CREATE NONCLUSTERED INDEX [IX_YdYarnMatch_N2] ON [dbo].[YdYarnMatch]
(
[Job_No] ASC,
[GK_No] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
USE [YourSQLDba]
GO
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[YdYarnMatch]') AND name = N'IX_YdYarnMatch_N1')
DROP INDEX [IX_YdYarnMatch_N1] ON [dbo].[YdYarnMatch] WITH ( ONLINE = OFF )
GO
USE [YourSQLDba]
GO
CREATE NONCLUSTERED INDEX [IX_YdYarnMatch_N1] ON [dbo].[YdYarnMatch]
(
[Operation_Date] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO

Key point: practice has proved that if you create a new index, you can avoid executing the plan to scan the whole table in the process of batch deletion, and also greatly accelerate the speed of deletion. I tested and verified this case. It is found that after adding the appropriate index and letting the DELETE statement go through Index Seek, the deletion efficiency is indeed greatly improved.

Delete index IX_YdYarnMatch_N2, index IX reserved_ YdYarnMatch_ N1, but it is found that the SQL execution plan goes through full table scanning. When executing SQL, the deletion is very slow

Delete index IX_YdYarnMatch_N1, recreate index IX_ YdYarnMatch_ After N1, the execution plan goes to Index Seek, which greatly prompts the deletion efficiency.

CREATENONCLUSTEREDINDEX [IX_YdYarnMatch_N1] ON [dbo].[YdYarnMatch]

(

[Operation_Date] ASC,

Remark

)

Note: the index name is the same here, but the corresponding fields of the index are different.

So the right way is to:

1: First, DELETE or disable the irrelevant indexes (indexes that are useless for the current DELETE statement), and generate the corresponding SQL before deletion, so as to re create the indexes after data deletion. Note that this operation does not affect the application during the operation phase. Otherwise, it should be reconsidered.

2: Check and test the execution plan of the current SQL, whether the appropriate index can be created, and speed up the DELETE operation. As shown in the above example

3: Batch cycle delete records.

4: In ORACLE database, some table settings can reduce the log generation of corresponding DML operations, but SQL Server does not have these functions, but it is necessary to pay attention to or adjust the backup of transaction logs in time.

If we can set the recovery mode of database to SIMPLE, we can reduce the extra IO overhead caused by log backup. However, many production environments cannot switch the recovery mode of user database.

In fact, there are so many ways to quickly delete an index in a large table in SQL Server, that is, to batch delete and submit it one by one. Other ways are just auxiliary ways. In addition, if you want to do some detailed tests in person, please refer to the blog https://sqlperformance.com/2013/03/io-subsystem/chunk-deletes

==

source: https://www.cnblogs.com/kerrycode/p/12448322.html

19 May 2020, 03:59 | Views: 6741

Add new comment

For adding a comment, please log in
or create account

0 comments