Start a classification task using swing transformer in MMClassification

Recently, Swin Transformer became the best paper of ICCV2021. As a basic model, it has achieved SOTA results in downstream tasks such as classification, detection and segmentation. MMClassification(MMCls) is an open source image classification toolbox and a member of the open source algorithm library of OpenMMLab. This article mainly introduces how to use swing transformer to start a classification task in MMCls. The specific code is downloaded as follows: google drive .  


Related tutorial documents: Welcome to the Chinese tutorial of MMClassification! - MMClassification 0.16.0 documentation


1. MMClassification installation

2. Data set preparation

3. Use MMCls to fine tune the model

Preparing to modify the configuration file


test model

Visualization results


1. MMClassification installation

Before using MMClassification, we need to configure the environment. The steps are as follows:

You can refer to links and online tutorials for installing python, cuda, torch, etc. After installation:

Check nvcc version

nvcc -V

Check gcc version

gcc --version

Check torch version

pip list | grep "torch"

Install MMCV:

MMCV is the basic library of OpenMMLab code base. The installation whl package for Linux environment has been packaged in advance. You can download and install it directly using pip. The format is as follows:

pip install mmcv -f{CUDA_v}/{Torch_v}/index.html

Pay attention to PyTorch and CUDA versions to ensure normal installation.

In the previous steps, we output the versions of CUDA and PyTorch in the environment, which are 11.1 and 1.9.0 respectively. We need to select the corresponding MMCV version.

In addition, you can also install the full version of mmcv full, which contains all the features and rich CUDA operators out of the box. The full version may take longer to compile.

pip install mmcv -f
# pip install mmcv-full -f

To install MMCls:

python install:     The package is relatively stable after installation. If you need to modify the code, you need to reinstall it after modification to take effect.
python develop:   After installation, the package needs to be modified continuously. The modified code can take effect without reinstallation.

git clone
cd mmclassification
python develop    # Install in developer mode
# python install  # Install in normal mode

2. Data set preparation

Cat dog classification dataset

The cat dog classification dataset is used as an example

# Download the classification dataset file in the directory $mmclassification.
wget -O
mkdir data
unzip -q -d ./data/

After downloading and decompressing, the file structure under the "Cats and Dogs Dataset" folder is as follows:

├── classes.txt
├── test.txt
├── val.txt
├── training_set
│   ├── training_set
│   │   ├── cats
│   │   │   ├── cat.1.jpg
│   │   │   ├── cat.2.jpg
│   │   │   ├── ...
│   │   ├── dogs
│   │   │   ├── dog.2.jpg
│   │   │   ├── dog.3.jpg
│   │   │   ├── ...
├── val_set
│   ├── val_set
│   │   ├── cats
│   │   │   ├── cat.3.jpg
│   │   │   ├── cat.5.jpg
│   │   │   ├── ...
│   │   ├── dogs
│   │   │   ├── dog.1.jpg
│   │   │   ├── dog.6.jpg
│   │   │   ├── ...
├── test_set
│   ├── test_set
│   │   ├── cats
│   │   │   ├── cat.4001.jpg
│   │   │   ├── cat.4002.jpg
│   │   │   ├── ...
│   │   ├── dogs
│   │   │   ├── dog.4001.jpg
│   │   │   ├── dog.4002.jpg
│   │   │   ├── ...

You can use the shell command ` tree data/cats_dogs_dataset ` view the file structure.

Support for new datasets

MMClassification requires that the dataset must place images and labels under the same level directory. There are two ways to support custom datasets.

The simplest way is to convert the dataset to an existing dataset format (such as ImageNet). Another way is to create a new dataset class. Details can be viewed   file.

In this tutorial, in order to facilitate learning, we have sorted the "cat and dog classification dataset" according to the dataset format of ImageNet.

Standard documents include:

1. Category list. Each row represents a category. The first line of cats category is marked as 0, and the second line of dogs category is marked as 1


2. Training / verification / test label.
Each line includes a file name and its corresponding label.  

    cats/cat.3769.jpg 0
    cats/cat.882.jpg 0
    dogs/dog.3881.jpg 1
    dogs/dog.3377.jpg 1

3. Use MMCls to fine tune the model

The steps of fine tuning the model through the command line are as follows:

1. Prepare custom dataset
2. Data set adaptation MMCls requirements
3. Modify the configuration file in the py script
4. Use the command line tool to fine tune the model

Steps 1 and 2 are consistent with the previous introduction. We will introduce the contents of the next two steps.

Preparing to modify the configuration file

In order to reuse the common parts of different configuration files, we support multi profile inheritance. For example, for model fine tuning swin transformer tiny, the new configuration file can inherit "configurations / _base_ / Models / swin_transformer / tiny_224. Py"   To create the basic structure of the model. Inherit "configs / _base_ / datasets / imagenet_bs64_swing_224. Py"   To use the previously defined dataset. Inherit "configs / _base_ / schedules / imagenet_bs1024_adamw_swing. Py" to define the learning rate policy. In order to run the set learning rate policy, you also need to inherit   “configs/_base_/”.

The beginning of the configuration file should appear as follows

_base_ = [
    '../_base_/models/swin_transformer/', '../_base_/datasets/',

First, modify the model configuration. This new configuration file needs to adjust the model head according to the category of classification problems   Num of_ classes. In addition to the last linear layer, the weight of the pre training model is generally reused.

model = dict(
        init_cfg = dict(
        topk = (1, )
        dict(type='BatchMixup', alpha=0.8, num_classes=2, prob=0.5),
        dict(type='BatchCutMix', alpha=1.0, num_classes=2, prob=0.5)

The second is data configuration. Pay attention to adjusting samples according to the existing size of your GPU_ per_ GPU, which specifies the path of the data set. Each epoch is evaluated once

img_norm_cfg = dict(
     mean=[124.508, 116.050, 106.438],
     std=[58.577, 57.310, 57.437],

data = dict(
    # batch size and num on each gpu_ Workers setting, which is set according to the situation of the computer
    samples_per_gpu = 32,
    # Specifies the training set path
    train = dict(
        data_prefix = 'data/cats_dogs_dataset/training_set/training_set',
        classes = 'data/cats_dogs_dataset/classes.txt'
    # Specify the validation set path
    val = dict(
        data_prefix = 'data/cats_dogs_dataset/val_set/val_set',
        ann_file = 'data/cats_dogs_dataset/val.txt',
        classes = 'data/cats_dogs_dataset/classes.txt'
    # Specify test set path
    test = dict(
        data_prefix = 'data/cats_dogs_dataset/test_set/test_set',
        ann_file = 'data/cats_dogs_dataset/test.txt',
        classes = 'data/cats_dogs_dataset/classes.txt'
# Modify evaluation indicator settings
evaluation = dict(interval=1, metric='accuracy', metric_options={'topk': (1, )})

The third is the learning rate strategy. The fine tuning strategy of the model is very different from the default strategy. Fine tuning generally requires less learning rate and less training cycle.

optimizer = dict(lr=0.00025)

# learning policy
lr_config = dict(

runner = dict(max_epochs=2)

Finally, run environment configuration. Use the default configuration directly.

Save the above inheritance and modifications in "configurations / swin_transformer / swin tiny_cats dogs. Py"

_base_ = [
    '../_base_/models/swin_transformer/', '../_base_/datasets/',

model = dict(....)  # Copy the contents of the above code box

img_norm_cfg = dict(...)
data = dict(...)
evaluation = dict(...)

optimizer = dict(...)
lr_config = dict(...)
runner  = dict(...)

To view complete profile information:

python ./tools/misc/ ./configs/swin_transformer/


We use tools/   Fine tune the model:

python tools/ ${CONFIG_FILE} [optional arguments]

If you want to specify the storage location of relevant files during training, you can add a parameter -- work_dir ${YOUR_WORK_DIR}.

By adding the parameter -- seed ${SEED}, set the random seed to ensure the repeatability of the results, while the parameter -- deterministic will enable the certainty option of cudnn to further ensure the repeatability, but may reduce some efficiency.

The training code of this example is as follows:

python tools/ \
  configs/swin_transformer/ \
  --work-dir work_dirs/swin-tiny_cats-dogs \
  --seed 0 \

test model

Use tools/ to test the model:

python tools/ ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] [--out ${RESULT_FILE}]

Here are some optional parameters to configure:

--metrics: evaluation method, which depends on the data set, such as accuracy acc

--Metric options: user defined operations for the evaluation process, such as topk=1

--out: the file name of the output result. If not specified, the calculation results will not be saved. Supported formats include json, pkl, and yml

The test code of this example is as follows:

python tools/ ./configs/swin_transformer/ work_dirs/swin-tiny_cats-dogs/latest.pth --metrics=accuracy --metric-options=topk=1

Visualization results

We use the following command to infer a single picture and visualize the calculation results.

python demo/ ${Image_Path} ${Config_Path} ${Checkpoint_Path} --device {cuda or cpu}

The example code of this article is as follows:  

python demo/ ./data/cats_dogs_dataset/training_set/training_set/cats/cat.1.jpg ./configs/swin_transformer/ work_dirs/


For relevant codes and operation process, please refer to Google online disk:

Related links:

Author's academic lecture:   Researcher Hu Han: Swin Transformer and five reasons to embrace Transformer | a series of academic lectures of Institute of automation_ Beep beep beep_ bilibili this report will introduce a new visual backbone network, swing Transformer. Compared with the ViT network mainly designed by Google for image classification, swing Transformer is widely effective for various visual tasks, including image classification, detection and segmentation. This report will also sort out the development context of gradually excavating the advantages of Transformer in the visual field over the past four years, and describe the five reasons for embracing Transformer. It is hoped that through this report, the audience will have an overall understanding of the application of Transformer in vision×tamp=1634388085&unique_k=HUIs9U

Thesis address:

MMClassification :  GitHub - open-mmlab/mmclassification: OpenMMLab Image Classification Toolbox and Benchmark

MMClassification documentation: Welcome to the Chinese tutorial of MMClassification! - MMClassification 0.16.0 documentation

Tags: Pytorch Deep Learning CV Transformer

Posted on Tue, 19 Oct 2021 22:08:38 -0400 by weekender