Performance comparison of several batch inserts in MyBatis

There are three main methods for batch data processing:
1. Execute a single insert statement repeatedly
2. foreach splicing sql
3. Batch processing

1, Preliminary preparation

Based on Spring Boot + Mysql, lombok is used to omit get/set. See pom.xml for details.

1.1 table structure

id is self incremented using the database.

DROP TABLE IF EXISTS `user_info_batch`;
CREATE TABLE `user_info_batch` (
                           `id` bigint(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary key id',
                           `user_name` varchar(100) NOT NULL COMMENT 'title of account',
                           `pass_word` varchar(100) NOT NULL COMMENT 'Login password',
                           `nick_name` varchar(30) NOT NULL COMMENT 'nickname',
                           `mobile` varchar(30) NOT NULL COMMENT 'cell-phone number',
                           `email` varchar(100) DEFAULT NULL COMMENT 'e-mail address',
                           `gmt_create` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Creation time',
                           `gmt_update` timestamp NULL DEFAULT NULL COMMENT 'Update time',
                           PRIMARY KEY (`id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC COMMENT 'Mybatis Batch';

1.2 project configuration file

Carefully, you may have found that the database url is followed by a paragraph rewritebackedstatements = true. What's the use? Don't worry, I'll introduce it later.

Database configuration
spring:
  datasource:
    url: jdbc:mysql://47.111.118.152:3306/mybatis?rewriteBatchedStatements=true
    username: mybatis
    password: password
    driver-class-name: com.mysql.cj.jdbc.Driver
mybatis
mybatis:
  mapper-locations: classpath:mapper/*.xml
  type-aliases-package: cn.van.mybatis.batch.entity

1.3 entity class

@Data
@Accessors(chain = true)
public class UserInfoBatchDO implements Serializable {
    private Long id;
    
    private String userName;

    private String passWord;

    private String nickName;

    private String mobile;

    private String email;

    private LocalDateTime gmtCreate;

    private LocalDateTime gmtUpdate;
}

1.4 UserInfoBatchMapper

public interface UserInfoBatchMapper {
    /** Single insert
     * @param info
     * @return
     */
    int insert(UserInfoBatchDO info);

    /**
     * foreach insert
     * @param list
     * @return
     */
    int batchInsert(List<UserInfoBatchDO> list);
}

1.5 UserInfoBatchMapper.xml

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd">
<mapper namespace="cn.van.mybatis.batch.mapper.UserInfoBatchMapper">


  <insert id="insert" parameterType="cn.van.mybatis.batch.entity.UserInfoBatchDO">
    insert into user_info_batch (user_name, pass_word, nick_name, mobile, email, gmt_create, gmt_update)
    values (#{userName,jdbcType=VARCHAR}, #{passWord,jdbcType=VARCHAR},#{nickName,jdbcType=VARCHAR}, #{mobile,jdbcType=VARCHAR}, #{email,jdbcType=VARCHAR}, #{gmtCreate,jdbcType=TIMESTAMP}, #{gmtUpdate,jdbcType=TIMESTAMP})
  </insert>

  <insert id="batchInsert">
    insert into user_info_batch (user_name, pass_word, nick_name, mobile, email, gmt_create, gmt_update)
    values
    <foreach collection="list" item="item" separator=",">
      (#{item.userName,jdbcType=VARCHAR}, #{item.passWord,jdbcType=VARCHAR}, #{item.nickName,jdbcType=VARCHAR}, #{item.mobile,jdbcType=VARCHAR}, #{item.email,jdbcType=VARCHAR}, #{item.gmtCreate,jdbcType=TIMESTAMP}, #{item.gmtUpdate,jdbcType=TIMESTAMP})
    </foreach>
  </insert>
</mapper>

1.6 preliminary data

In order to facilitate the test, several variables are extracted and loaded in advance.

private List<UserInfoBatchDO> list = new ArrayList<>();
private List<UserInfoBatchDO> lessList = new ArrayList<>();
private List<UserInfoBatchDO> lageList = new ArrayList<>();
private List<UserInfoBatchDO> warmList = new ArrayList<>();
// Counting tool
private StopWatch sw = new StopWatch();

In order to facilitate the assembly of data, a common method is extracted.

private List<UserInfoBatchDO> assemblyData(int count){
    List<UserInfoBatchDO> list = new ArrayList<>();
    UserInfoBatchDO userInfoDO;
    for (int i = 0;i < count;i++){
        userInfoDO = new UserInfoBatchDO()
                .setUserName("Van")
                .setNickName("Dust blog")
                .setMobile("17098705205")
                .setPassWord("password")
                .setGmtUpdate(LocalDateTime.now());
        list.add(userInfoDO);
    }
    return list;
}

Preheating data

@Before
public void assemblyData() {
    list = assemblyData(200000);
    lessList = assemblyData(2000);
    lageList = assemblyData(1000000);
    warmList = assemblyData(5);
}

2, Execute a single insert statement repeatedly

Maybe lazy programmers will do this. It is very simple to directly nest a for loop on the original single insert statement.

2.1 corresponding mapper interface

int insert(UserInfoBatchDO info);

2.2 test method

Because this method is too slow, the data is reduced to 2000

@Test
public void insert() {
    log.info("[[program warm-up]");
    for (UserInfoBatchDO userInfoBatchDO : warmList) {
        userInfoBatchMapper.insert(userInfoBatchDO);
    }
    log.info("[[end of warm-up]");
    sw.start("Execute a single insert statement repeatedly");
    // It's too slow to insert 20w here, so I only inserted 2000
    for (UserInfoBatchDO userInfoBatchDO : lessList) {
        userInfoBatchMapper.insert(userInfoBatchDO);
    }
    sw.stop();
    log.info("all cost info:{}",sw.prettyPrint());
}

2.3 execution time

  • for the first time
-----------------------------------------
ms     %     Task name
-----------------------------------------
59887  100%  Execute a single insert statement repeatedly
  • The second time
-----------------------------------------
ms     %     Task name
-----------------------------------------
64853  100%  Execute a single insert statement repeatedly
  • third time
-----------------------------------------
ms     %     Task name
-----------------------------------------
58235  100%  Execute a single insert statement repeatedly

In this way, 2000 pieces of data are inserted, and the average time of execution for three times is 60991 ms.

3, foreach splicing SQL

3.1 corresponding mapper interface

int batchInsert(List<UserInfoBatchDO> list);

3.2 test method

Both this method and the next method are tested with 20w data.

@Test
public void batchInsert() {
    log.info("[[program warm-up]");
    for (UserInfoBatchDO userInfoBatchDO : warmList) {
        userInfoBatchMapper.insert(userInfoBatchDO);
    }
    log.info("[[end of warm-up]");
    sw.start("foreach Splicing sql");
    userInfoBatchMapper.batchInsert(list);
    sw.stop();
    log.info("all cost info:{}",sw.prettyPrint());
}

3.3 execution time

  • for the first time
-----------------------------------------
ms     %     Task name
-----------------------------------------
18835  100%  foreach Splicing sql
  • The second time
-----------------------------------------
ms     %     Task name
-----------------------------------------
17895  100%  foreach Splicing sql
  • third time
-----------------------------------------
ms     %     Task name
-----------------------------------------
19827  100%  foreach Splicing sql

In this way, 20w pieces of data are inserted, and the average time of execution for three times is 18852 ms.

4, Batch processing

In this way, mapper and xml reuse 2.1.

4.1 rewritebackedstatements parameter

At the beginning of the test, I found that the method submitted by Mybatis Batch did not work. In fact, when inserting, it was still inserting records one by one, and the speed was far lower than the original foreach splicing SQL method, which was very unscientific.

Later, it was found that for batch execution, a new parameter needs to be added to the connection URL string: rewritebackedstatements = true

Introduction to rewritebackedstatements parameter

The rewritebackedstatements parameter should be added to the url of the JDBC connection of MySql, and the driver of version 5.1.13 or above should be guaranteed to realize high-performance batch insertion. By default, the MySql JDBC driver ignores the executeBatch() statement and breaks up a group of SQL statements that we expect to execute in batches and sends them to the MySql database one by one. In fact, batch insertion is a single insertion, which directly leads to low performance. Only when the rewritebackedstatements parameter is set to true, the driver will help you execute SQL in batches. This option is valid for both INSERT/UPDATE/DELETE.

4.2 batch preparation

Manually inject SqlSessionFactory
@Resource
private SqlSessionFactory sqlSessionFactory;
Test code
@Test
public void processInsert() {
    log.info("[[program warm-up]");
    for (UserInfoBatchDO userInfoBatchDO : warmList) {
        userInfoBatchMapper.insert(userInfoBatchDO);
    }
    log.info("[[end of warm-up]");
    sw.start("Batch execution insert");
    // Open batch
    SqlSession session = sqlSessionFactory.openSession(ExecutorType.BATCH);
    UserInfoBatchMapper mapper = session.getMapper(UserInfoBatchMapper.class);
    for (int i = 0,length = list.size(); i < length; i++) {
        mapper.insert(list.get(i));
        //Commit every 20000 to prevent memory overflow
        if(i%20000==19999){
            session.commit();
            session.clearCache();
        }
    }
    session.commit();
    session.clearCache();
    sw.stop();
    log.info("all cost info:{}",sw.prettyPrint());
}

4.3 execution time

  • for the first time
-----------------------------------------
ms     %     Task name
-----------------------------------------
09346  100%  Batch execution insert
  • The second time
-----------------------------------------
ms     %     Task name
-----------------------------------------
08890  100%  Batch execution insert
  • third time
-----------------------------------------
ms     %     Task name
-----------------------------------------
09042  100%  Batch execution insert

In this way, 20w pieces of data are inserted, and the average time of execution for three times is 9092 ms.

4.4 if the data is larger

When I expanded the data to 100w, the sql splicing method of foreach could not complete the insertion, so I could only test the insertion time of batch processing.

When testing, just cut the list in the [4.2] test code into lageList for testing.

  • for the first time
-----------------------------------------
ms     %     Task name
-----------------------------------------
32419  100%  Batch execution insert
  • The second time
-----------------------------------------
ms     %     Task name
-----------------------------------------
31935  100%  Batch execution insert
  • third time
-----------------------------------------
ms     %     Task name
-----------------------------------------
33048  100%  Batch execution insert

In this way, 100w pieces of data are inserted, and the average time of execution for three times is 32467 ms.
5, Summary

Batch insert method Data volume Average time to execute three times
Circular insertion of single piece of data 2000 60991 ms
foreach splicing sql 20w 18852 ms
Batch processing 20w 9092 ms
Batch processing 100w 32467 ms
  1. Although the efficiency of circular inserting a single piece of data is very low, the amount of code is very small, and it can be used when the amount of data is small, but it is forbidden to use because of the large amount of data, which is too inefficient;

  2. foreach is a way to splice sql. When using it, there are large sections of xml and sql statements to write, which is easy to make mistakes. Although the efficiency is fair, it still can not be used when dealing with a large amount of data, so it is not recommended;

  3. Batch execution is recommended when there is a large amount of data insertion, and it is also convenient to use.

[Sample code for this article]

Author: Van fan
Link: https://juejin.cn/post/7007608714093920286
Source: rare earth Nuggets

Tags: Mybatis

Posted on Tue, 23 Nov 2021 13:27:38 -0500 by pbarney