Maskrcnn implementation notes - data processing


In deep learning, data set generally refers to the data set used for network training. The data set includes two parts: input and ground truth. In visual deep learning, the input is pictures, and the output is classification results, prediction frame and segmentation results.
The data set is generally divided into three parts: train dataset, valid dataset and test dataset. The training set is used for network training, and the verification set is generally used to verify the training effect during training. The best training model is saved according to the training effect, and the test set is used to test the training effect.

1, labelme calibration

The input required by maskrcnn is pictures, and the output is label, box and mask. Normally, label, box and mask should be calibrated during calibration, but in fact, box can be launched by mask, so only mask and label need to be calibrated.
1. Install labelme
Create a new environment named lableme in anaconda and install labelme.

# python3
conda create --name=labelme python=3.6
source activate labelme
# install pyqt
pip install pyqt5  # pyqt5 can be installed via pip on python3
# instal labelme
pip install labelme

2. Marking
cmd or anaconda: activate labelme,labeleme. Open labelme and create polygons to segment the objects to be detected and labeled.

After labeling one by one, the json file will be generated.

2, Further processing of data

1. Data enhancement

If you have less data, you'd better enhance the data in order to avoid over fitting and other problems. Can add noise, flip and other operations.


Convert it into dataset in batch
Batch json to dataset , four files will be generated and placed in one folder.
You can also put the Json inside_ to_ Copy the file into your own project and modify it. Directly specify the path as the Json folder marked by yourself. If the output path is further modified, you can generate the generated dataset into the project folder. Amend as follows:

import argparse
import base64
import json
import os
import os.path as osp

import imgviz
import PIL.Image

from labelme.logger import logger
from labelme import utils

def main():

    # enter path
    json_file = "./json_files"

    # In order to batch process change 1, obtain all. json suffix files in the directory
    path = []
    file_name = []
    for root, dirs, files in os.walk(json_file):  # Get all files
        for file in files:  # Traverse all file names
            if os.path.splitext(file)[1] == '.json':  # Specify suffix
                file_name.append(file.split('.')[0])   # To get the**
                path.append(os.path.join(root, file))  # Splice absolute paths and place them in the list
    print('Total number of documents:', len(path))

    # For batch processing, change 2 is put into a loop
    for i in range(len(path)):
        data = json.load(open(path[i]))
        imageData = data.get("imageData")

        if not imageData:
            imagePath = os.path.join(os.path.dirname(json_file), data["imagePath"])
            with open(imagePath, "rb") as f:
                imageData =
                imageData = base64.b64encode(imageData).decode("utf-8")
        img = utils.img_b64_to_arr(imageData)

        label_name_to_value = {"_background_": 0}
        for shape in sorted(data["shapes"], key=lambda x: x["label"]):
            label_name = shape["label"]
            if label_name in label_name_to_value:
                label_value = label_name_to_value[label_name]
                label_value = len(label_name_to_value)
                label_name_to_value[label_name] = label_value
        lbl, _ = utils.shapes_to_label(
            img.shape, data["shapes"], label_name_to_value

        label_names = [None] * (max(label_name_to_value.values()) + 1)
        for name, value in label_name_to_value.items():
            label_names[value] = name

        lbl_viz = imgviz.label2rgb(
            label=lbl, img=imgviz.asgray(img), label_names=label_names, loc="rb"

        # Output path
        if not os.path.exists("train_dataset"):
        mask_path = "train_dataset/mask"
        if not os.path.exists(mask_path):
        img_path = "train_dataset/imgs"
        if not os.path.exists(img_path):
        class_path = "train_dataset/classes"
        if not os.path.exists(class_path):
        label_viz_path = "train_dataset/label_viz"
        if not os.path.exists(label_viz_path):

        # Change 3: four filenames are added below. 1 and 2 are all for name change during output
        PIL.Image.fromarray(img).save(osp.join(img_path, file_name[i]+".png"))
        utils.lblsave(osp.join(mask_path, file_name[i]+"_label.png"), lbl)
        PIL.Image.fromarray(lbl_viz).save(osp.join(label_viz_path, file_name[i]+"_label_viz.png"))

        with open(osp.join(class_path, file_name[i]+"label_names.txt"), "w") as f:
            for lbl_name in label_names:
                f.write(lbl_name + "\n")"Saved {0} files".format(i+1))

if __name__ == "__main__":

My folder structure:

The preparation task of the above data set is completed. The following is the function used in the actual training.

3, Actual training data set processing

Define your own dataset class for the processed train_ Read and parse the dataset.
Mapped dataset, need to be defined__ init__ () and__ getitem__ () method, and then load it with torch. Utils. Dataloader (), which is divided into batch es for training.

class PennFudanDataset(object):
    def __init__(self, root, transforms):
        self.root = root
        self.transforms = transforms
        # load all image files, sorting them to
        # ensure that they are aligned
        self.imgs = list(sorted(os.listdir(os.path.join(root, "images"))))
        self.jsons = list(sorted(os.listdir(os.path.join(root, "jsons"))))

    def __getitem__(self, idx):
        # load images and masks
        img_path = os.path.join(self.root, "images", self.imgs[idx])
        json_path = os.path.join(self.root, "jsons", self.jsons[idx])
        img ="RGB")

        mask =

        mask = np.array(mask)
        # instances are encoded as different colors
        obj_ids = np.unique(mask)  # The original image is processed as: the background part is 0, the first pedestrian is 1, the second is 2, and so on.
        # first id is the background, so remove it
        obj_ids = obj_ids[1:]

        # split the color-encoded mask into a set
        # of binary masks
        masks = mask == obj_ids[:, None, None]

        # get bounding box coordinates for each mask
        num_objs = len(obj_ids)
        boxes = []
        """box It is obtained through mask, so you can directly mask Change to grab description, which can also be obtained from grab description boxes bar"""
        for i in range(num_objs):
            pos = np.where(masks[i])
            xmin = np.min(pos[1])
            xmax = np.max(pos[1])
            ymin = np.min(pos[0])
            ymax = np.max(pos[0])
            boxes.append([xmin, ymin, xmax, ymax])

        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        # there is only one class
        labels = torch.ones((num_objs,), dtype=torch.int64)
        masks = torch.as_tensor(masks, dtype=torch.uint8)

        image_id = torch.tensor([idx])
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        # suppose all instances are not crowd
        iscrowd = torch.zeros((num_objs,), dtype=torch.int64)

        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["masks"] = masks
        target["image_id"] = image_id
        target["area"] = area
        target["iscrowd"] = iscrowd

        if self.transforms is not None:
            img, target = self.transforms(img, target)

        return img, target

    def __len__(self):
        return len(self.imgs)

Tags: Python Computer Vision Deep Learning

Posted on Wed, 03 Nov 2021 20:27:25 -0400 by Allenport