preface
In deep learning, data set generally refers to the data set used for network training. The data set includes two parts: input and ground truth. In visual deep learning, the input is pictures, and the output is classification results, prediction frame and segmentation results.
The data set is generally divided into three parts: train dataset, valid dataset and test dataset. The training set is used for network training, and the verification set is generally used to verify the training effect during training. The best training model is saved according to the training effect, and the test set is used to test the training effect.
1, labelme calibration
The input required by maskrcnn is pictures, and the output is label, box and mask. Normally, label, box and mask should be calibrated during calibration, but in fact, box can be launched by mask, so only mask and label need to be calibrated.
1. Install labelme
Create a new environment named lableme in anaconda and install labelme.
# python3 conda create --name=labelme python=3.6 source activate labelme # install pyqt pip install pyqt5 # pyqt5 can be installed via pip on python3 # instal labelme pip install labelme
2. Marking
cmd or anaconda: activate labelme,labeleme. Open labelme and create polygons to segment the objects to be detected and labeled.
After labeling one by one, the json file will be generated.
2, Further processing of data
1. Data enhancement
If you have less data, you'd better enhance the data in order to avoid over fitting and other problems. Can add noise, flip and other operations.
2.json_to_dataset
Convert it into dataset in batch
Batch json to dataset , four files will be generated and placed in one folder.
You can also put the Json inside_ to_ Copy the dataset.py file into your own project and modify it. Directly specify the path as the Json folder marked by yourself. If the output path is further modified, you can generate the generated dataset into the project folder. Amend as follows:
import argparse import base64 import json import os import os.path as osp import imgviz import PIL.Image from labelme.logger import logger from labelme import utils def main(): # enter path json_file = "./json_files" # In order to batch process change 1, obtain all. json suffix files in the directory path = [] file_name = [] for root, dirs, files in os.walk(json_file): # Get all files for file in files: # Traverse all file names if os.path.splitext(file)[1] == '.json': # Specify suffix file_name.append(file.split('.')[0]) # To get the** path.append(os.path.join(root, file)) # Splice absolute paths and place them in the list print('Total number of documents:', len(path)) # For batch processing, change 2 is put into a loop for i in range(len(path)): data = json.load(open(path[i])) imageData = data.get("imageData") if not imageData: imagePath = os.path.join(os.path.dirname(json_file), data["imagePath"]) with open(imagePath, "rb") as f: imageData = f.read() imageData = base64.b64encode(imageData).decode("utf-8") img = utils.img_b64_to_arr(imageData) label_name_to_value = {"_background_": 0} for shape in sorted(data["shapes"], key=lambda x: x["label"]): label_name = shape["label"] if label_name in label_name_to_value: label_value = label_name_to_value[label_name] else: label_value = len(label_name_to_value) label_name_to_value[label_name] = label_value lbl, _ = utils.shapes_to_label( img.shape, data["shapes"], label_name_to_value ) label_names = [None] * (max(label_name_to_value.values()) + 1) for name, value in label_name_to_value.items(): label_names[value] = name lbl_viz = imgviz.label2rgb( label=lbl, img=imgviz.asgray(img), label_names=label_names, loc="rb" ) # Output path if not os.path.exists("train_dataset"): os.mkdir("train_dataset") mask_path = "train_dataset/mask" if not os.path.exists(mask_path): os.mkdir(mask_path) img_path = "train_dataset/imgs" if not os.path.exists(img_path): os.mkdir(img_path) class_path = "train_dataset/classes" if not os.path.exists(class_path): os.mkdir(class_path) label_viz_path = "train_dataset/label_viz" if not os.path.exists(label_viz_path): os.mkdir(label_viz_path) # Change 3: four filenames are added below. 1 and 2 are all for name change during output PIL.Image.fromarray(img).save(osp.join(img_path, file_name[i]+".png")) utils.lblsave(osp.join(mask_path, file_name[i]+"_label.png"), lbl) PIL.Image.fromarray(lbl_viz).save(osp.join(label_viz_path, file_name[i]+"_label_viz.png")) with open(osp.join(class_path, file_name[i]+"label_names.txt"), "w") as f: for lbl_name in label_names: f.write(lbl_name + "\n") logger.info("Saved {0} files".format(i+1)) if __name__ == "__main__": main()
My folder structure:
The preparation task of the above data set is completed. The following is the function used in the actual training.
3, Actual training data set processing
Define your own dataset class for the processed train_ Read and parse the dataset.
Mapped dataset, need to be defined__ init__ () and__ getitem__ () method, and then load it with torch. Utils. Dataloader (), which is divided into batch es for training.
class PennFudanDataset(object): def __init__(self, root, transforms): self.root = root self.transforms = transforms # load all image files, sorting them to # ensure that they are aligned self.imgs = list(sorted(os.listdir(os.path.join(root, "images")))) self.jsons = list(sorted(os.listdir(os.path.join(root, "jsons")))) def __getitem__(self, idx): # load images and masks img_path = os.path.join(self.root, "images", self.imgs[idx]) json_path = os.path.join(self.root, "jsons", self.jsons[idx]) img = Image.open(img_path).convert("RGB") mask = Image.open(json_path) mask = np.array(mask) # instances are encoded as different colors obj_ids = np.unique(mask) # The original image is processed as: the background part is 0, the first pedestrian is 1, the second is 2, and so on. # first id is the background, so remove it obj_ids = obj_ids[1:] # split the color-encoded mask into a set # of binary masks masks = mask == obj_ids[:, None, None] # get bounding box coordinates for each mask num_objs = len(obj_ids) boxes = [] """box It is obtained through mask, so you can directly mask Change to grab description, which can also be obtained from grab description boxes bar""" for i in range(num_objs): pos = np.where(masks[i]) xmin = np.min(pos[1]) xmax = np.max(pos[1]) ymin = np.min(pos[0]) ymax = np.max(pos[0]) boxes.append([xmin, ymin, xmax, ymax]) boxes = torch.as_tensor(boxes, dtype=torch.float32) # there is only one class labels = torch.ones((num_objs,), dtype=torch.int64) masks = torch.as_tensor(masks, dtype=torch.uint8) image_id = torch.tensor([idx]) area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]) # suppose all instances are not crowd iscrowd = torch.zeros((num_objs,), dtype=torch.int64) target = {} target["boxes"] = boxes target["labels"] = labels target["masks"] = masks target["image_id"] = image_id target["area"] = area target["iscrowd"] = iscrowd if self.transforms is not None: img, target = self.transforms(img, target) return img, target def __len__(self): return len(self.imgs)