pytorch data set processing target detection (classification) data set processing

Preface

Either in the task of classification or in the task of target detection, data set processing is required. One is to save the label information in a txt file, and the other is only in the form of pictures as shown in the figure below. This step is also the key to learn fast RCNN

Photos divided into training and verification
|
Category of each category

One is a picture of a cat, the other is a picture of a dog. This is our own data set. In fact, the official data set is also placed in this way. For example, CIFAR10, which has 10 folders, under each folder are many photos of one kind of number. Under normal circumstances, the way we introduce the official data set is as follows

transform = transforms.Compose([
transforms.RandomHorizontalFlip(),  # On small datasets, data enhancement is realized by random horizontal flipping
transforms.RandomGrayscale(),  # Convert image to gray image with certain probability
transforms.ToTensor(),  # When the data set is loaded, the default image format is numpy, so it is converted to Tensor through transforms, and the image range is [0, 255] - > [0.0, 1.0]
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5,
0.5))])

trainset = torchvision.datasets.CIFAR10('data/cifar-10', train=True,
transform=transform)
testset = torchvision.datasets.CIFAR10('data/cifar-10', train=False,
transform=transform)

shuffle=True,
shuffle=True,
num_workers=0)


The key is the form of trainset and trainloder. Trainset loads all the photos. Trainloder is an iterative form, such as for i, data in enumerate(testloader): in this way, the data of each photo is available for training, and the core of each photo's label without disorderly training data is that there are photos and labels. For example, when classifying, take the photos just now as an example. The labels are 0 and 1, which are generated automatically. You need to write a dictionary later, where 0 and 1 represent your own Category. Obviously, my 0 and 1 here represent cat and dog

The dataset label is saved in the csv file

When we do object detection, we give the xml file. We still need to extract the coordinate points and classification information of the frame in the xml file. I prefer to save the information in the csv file. Now I'll teach you how to make a csv file first


import pandas as pd
import os
PATH = 'G:/trainshibie/55/val/'
xml = []
i =1
for (path, dirnames, filenames) in os.walk(PATH):
for filename in filenames:
Path = os.path.join(path, filename)
if i < 11:
value = (Path, 0)
xml.append(value)
else:
value = (Path, 1)
xml.append(value)
i = i + 1
column_name =['path','label']
xml = pd.DataFrame(xml,columns=column_name)
print(xml)
xml.to_csv('G:/trainshibie/55/ee.csv',index=None)


This step is simple. What we need to do is to save the absolute path and label of each photo. My if condition is that the number of photos is only 10, the first 10 represent cat label=0, and the last 10 represent dog label=1.

import pandas as pd
import numpy as np
path = []
c=data.shape[0]
label=np.zeros(c,dtype=np.int32)
for index,row in data.iterrows():
path.append(row['path'])
label[index] = row['label']


Just remember this, label=np.zeros(c,dtype=np.int32). It's easy to create

from torchvision.datasets import ImageFolder
a = 'G:/trainshibie/55/val/'

transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.ToTensor(), #Convert image to Tensor and normalize to [0,1]
transforms.Normalize(mean=[.5,.5,.5],std=[.5,.5,.5])
])

dataset=ImageFolder(a,transform=transform)


Basically, it is processed into dataset to contain all the photos, and in the form of trainloader iterator. This kind of label is not saved with csv file, so it is not classified. Target detection and semantic segmentation are to extract the information from xml. Save to cvs file

Data set processing of saving label information in csv

I talked about the generation and reading of csv file earlier. This step is to make use of the data in the csv file to form an iterator. Its main step is to write its own dataset class

from PIL import Image
import pandas as pd
import numpy as np
import torchvision.transforms as transforms
root = 'G:/trainshibie/55/val/'

# Define the format of the read file
return Image.open(path).convert('RGB')
class Mydataset(Dataset):
super(Mydataset,self).__init__()
self.path = [] #Save the
self.num = int(self.data.shape[0]) #How many photos are there
self.label = np.zeros(self.num, dtype=np.int32)
for index, row in self.data.iterrows():
self.path.append(row['path'])
self.label[index] = row['label']  #Read out all the data
self.transform = transform
self.target_transform = target_transform

def __getitem__(self, index):
labels = self.label[index]
if self.transform is not None:
img = self.transform(img)  #Convert sensor type
return img,labels

def __len__(self):
return len(self.data)
val_data=Mydataset(csv='G:/trainshibie/55/ee.csv', transform=transforms.ToTensor())



xml data read into csv file

The first step is to learn how to use ET, especially to read xml files

import xml.etree.ElementTree as ET
tree = ET.parse('G:\picture\label_cat/000005.xml')
root = tree.getroot()
xml_list = []
for member in root.findall('object'):
value = (root.find('filename').text,
int(root.find('size')[0].text),
int(root.find('size')[1].text),
member[0].text,
int(member[4][0].text),
int(member[4][1].text),
int(member[4][2].text),
int(member[4][3].text))
xml_list.append(value)
print(xml_list)


It's easy to understand, not to say much

import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET

def xml_to_csv(path):
xml_list = []
for xml_file in glob.glob(path + '/*.xml'):
print(xml_file)
tree = ET.parse(xml_file)
root = tree.getroot()
for member in root.findall('object'):
try:
value = (root.find('filename').text,
int(root.find('size')[0].text),
int(root.find('size')[1].text),
member[0].text,
int(member[4][0].text),
int(member[4][1].text),
int(member[4][2].text),
int(member[4][3].text)
)
except ValueError:
value = (root.find('filename').text,
int(root.find('size')[0].text),
int(root.find('size')[1].text),
member[0].text,
int(member[4][1][0].text),
int(member[4][1][1].text),
int(member[4][1][2].text),
int(member[4][1][3].text)
)
xml_list.append(value)
column_name = ['filename','width','height','class','xmin','ymin','xmax','ymax']
xml_df = pd.DataFrame(xml_list,columns=column_name)
return xml_df
def main():
image_path = os.path.join(os.getcwd(),'E://pytorch/Annotations')
xml_df = xml_to_csv(image_path)
xml_df.to_csv('E://pytorch/labes.csv',index=None)  #With path 'E: / /'
print('finish')
main()



This step can be used directly. You can change the path to deal with xml files

Read the csv file of target detection

import os.path as osp
import os
import pandas as pd
import numpy as np

c=data.shape[0]
boxes = np.zeros((c,4), dtype=np.uint16)
gt_classes=np.zeros(c,dtype=np.int32)
_classes = ('__background__',  # always index 0
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor')
_class_to_ind = dict(zip(_classes, range(21)))
_ind_to_class = dict(zip(range(21), _classes))

for index,row in data.iterrows():
boxes[index,:]=[row['xmin'],row['ymin'],row['xmax'],row['ymax']]
gt_classes[index] = _class_to_ind[row['class']]
print(boxes[1])
print(gt_classes[1])
print(_ind_to_class[gt_classes[1]])



After reading all the data processing, it's convenient to master almost all of them. It can handle all kinds of data processing

Published 11 original articles, won praise 12, visited 888

Tags: xml

Posted on Sun, 08 Mar 2020 05:42:17 -0400 by fubowl