Image semantic segmentation and object detection

Among the trained image target detection provided by Pytorch, they are R-CNN series networks, and provide easy to call methods for target detection and human key point detection respectively. For the network of target detection, the input images are required to use the same preprocessing method, that is, the pixel value of each image is preprocessed to 0 ~ 1, and the input image size is not very small. The available network models that have been pre trained are as follows:

 

Network classdescribe
detection.fasterrcnn_resnet50_fpnFast R-CNN network model with ResNet-50-FPN structure
detection.maskrcnn_resnet50_fpnMask R-CNN network model with ResNet-50-FPN structure
detection.keypointrcnn_resnet50_fpnKeypoint with ResNet-50-FPN structure   R-CNN network model

These networks are also trained on COC O2017 data. The following shows how to use the trained network for image target detection and human key point detection. First, import relevant libraries and modules. The program is as follows:

import numpy as np
import torchvision
import torch
import torchvision.transforms as transforms
from PIL import  Image,ImageDraw
import matplotlib.pyplot as plt
import matplotlib.image as mping

In image target detection, the pre trained Fast-R-CNN model with ResNet-50-FPN structure is used. The network is also trained through COCO data set and imported into the trained network. The program is as follows:

model=torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

Next, read a photo from the folder and convert it into tensor. The pixel value is between 0 and 1, and then use the import model to predict it. The program is as follows:

image=Image.open("data/chap10/2012_004308.jpg")
transform_d=transforms.Compose([transforms.ToTensor()])
image_t=transform_d(image)
pred=model([image_t])
# print(pred)

The operation results are as follows:

  The results of pred output mainly include the bounding boxes (boxes coordinates) of each target detected. Next, visualize the detected target and observe the specific results of detection.

First, define the label coco corresponding to each category_ INSTANCE_ CATEGORY_ Names, the procedure is as follows:

COCO_INSTANCE_CATEGORY_NAMES=[
    '__background__','person','bicycle','car','motorcycle',
    'airplane','bus','train','truck','boat','traffic light',
    'fire hydrant','N/A','stop sign','parking meter','bench',
    'bird','cat','dog','horse','sheep','cow','elephant',
    'bear','zebra','giraffe','N/A','backpack','umbrella','N/A',
    'N/A','handbag','tie','suitcase','frisbee','skis','snowboard',
    'sports ball','kite','baseball bat','baseball glove','skateboard',
    'surfboard','tennis racket','bottle','N/A','wine glass',
    'cup','fork','knife','spoon','bowl','banana','apple',
    'sandwich','orange','broccoli','carrot','hot dog','pizza',
    'donut','cake','chair','couch','potted plant','bed','N/A',
    'dining table','N/A','N/A','toilet','N/A','tv','laptop',
    'mouse','remote','keyboard','cell phone','microwave','oven',
    'toaster','sink','refrigerator','N/A','book','clock',
    'vase','scissors','teddy bear','hair drier','toothbrush'

]

For the prediction results, before visualization, it is necessary to analyze and interpret the effective prediction target data. The information to be extracted includes the location, category and score of a target. Then, take the target with a score greater than 0.5 as the detected effective target, and display the detected target on the image. The procedure is as follows:

pred_class=[COCO_INSTANCE_CATEGORY_NAMES[ii] for ii in list(pred[0]['labels'].numpy())]
pred_score=list(pred[0]['scores'].detach().numpy())
pred_boxes=[[ii[0],ii[1],ii[2],ii[3]] for ii in list (pred[0] ['boxes'].detach().numpy())]
pred_index=[pred_score.index(x) for x in pred_score if x >0.5]
# fontsize=np.int16(image.size[1]/30)
# font1=ImageFont.truetype
draw=ImageDraw.Draw(image)
for index in pred_index:
    box=pred_boxes[index]
    draw.rectangle(box,outline="red")
    texts=pred_class[index]+":"+str(np.round(pred_score[index],2))
    draw.text((box[0],box[1]),texts,fill="red")
image.show()

When visualizing an image, the above program uses the ImageDraw.Draw(image) method to indicate that some elements should be added at the corresponding position on the original image, draw.rectangle() indicates that a rectangular box should be added, and draw.text() indicates that text should be added at the specified position on the image. The operation results are as follows:

 

Tags: Python neural networks Pytorch Deep Learning Object Detection

Posted on Sat, 20 Nov 2021 16:34:29 -0500 by iloveny