Preface:
In a bright and windy day, thunder and lightning intersect.. Okay, let's leave it alone ~ 😊
Because this semester there are classes on artificial intelligence, and then there's a little model like you made.
During his communication with a fellow researcher, he asked me to run down the model to do a comparative experiment and send a paper with him.
I thought: I am in the field of machine learning, belonging to pure white - try it! Then start a long way to learn ~
What can you learn by reading this article?
- Resume the journey of setting up the environment 🦌~
- Get to know what each parameter means, folder 📁 Meaning~
- base training for major basic models to find a sense of accomplishment 🚀~
- Necessary summary of experience ✈️~
Setting up Pytorch environment
Before I started, I was a fainting chicken 🐤 Feel at a loss. What, what~
I only know how painful it was when I first put my scalp on it, looked for blogs that were N years old, what to do, and did a revert
I was trying to get an SSD and got a good start (when the environment was ready).
Step by step for half an hour, change the source code, find one by one according to the error, it will be painful 😖
Return to the topic!
Set up environment: https://tangshusen.me/Dive-into-DL-PyTorch/#/
👆 The website above is an open source machine learning tutorial translated. I looked through the warehouse first...
Set up the environment directly in the 2.1 on the left, this article introduces 2 articles that you know, point in and teach you by hand~
Mac's words will change when it's set up (2021-11), that is, the latest update is a little different, but Google will be able to come out.
Windows has a graphics card. To download Cuda and configure the environment, what's the Internet speed?
Better to replace with a mirror, download the mirror wheel and install it so that it is the fastest ~
When you can print it out 🖨️ It's time for torch to be successful.
Make your own dataset
The download of Mac label files is easy. Google searches labellmg directly. Here I post Website
It's quick to combine brew, I haven't tried Windows yet
How to use it, a large online tutorial, not to mention more ~ 😊
YoloV4 for Getting Started
While I was at a loss, my brother recommended a blogger
B-Station Search 🔍: Bubbliiiing Search pytorch dynamically
Here's what you'll find: Hand-on training of your own datasets
The blogger's ideas for teaching are:
Briefly introduce the training ideas of network to how each layer of network works
And how to train your own datasets.
I started watching YoloV4's instructional video from start to finish.
Add your own class
Labels listed based on your own dataset 🏷️
Add to/model_data/voc_classes.txt
Same format, one to one line
Make your own dataset
Here you can download datasets shared by bloggers data set Extraction Code: uack
You can also make it yourself, but make it yourself to keep up with your quantity
Directory structure: VOC Dataset Resolution VOC2007 Resolution
My remark is that my brother has already cleaned it (not really cleaned it all)
I took it and deleted the files directly from main under ImageSet
Because this blogger has voc_ The annotation.py file is randomly and automatically divided into 9:1
Then 2007_appears in the home directory Val.txt and 2007_train.txt
So if you did it on your computer, remember to put it on and regenerate it, because the path has changed to
Get the pre-training weight and prepare for training.
Be careful ⚠️: The following are done in a graphics card environment
Under the corresponding warehouse, select the corresponding pre-training weight. download ⏬ upload ⏫ To/model/Down
Start editing train.py and learn more:
First, configure the class, model path
Next is starting to configure parameters manually
Batch_ The size is the number of samples selected for a training session.
lr is: learning rate
num_workers are threads
- batch_size Matching
If you're better configured:
Freeze_batch_size: 16;Unfreeze_batch_size: 8
Nearly:
Freeze_batch_size: 8;Unfreeze_batch_size: 4
Freeze_batch_size: 4;Unfreeze_batch_size: 2
- The combination of learning rates:
Freeze_lr : 1e-3(0.001);Unfreeze_lr: 1e-4
Freeze_lr : 1e-4(0.0001);Unfreeze_lr: 1e-5
There is the knowledge of Alchemy.
-
First: UnFreeze_Epoch=100; Freeze_Epoch=50;
It will run 100 Epochs, so make sure your space doesn't explode.
-
Second: loss will gradually shrink and stabilize to a range
The mAP that runs out at this time is not satisfied.
There are several ways:
- Add iterations: UnFreeze_Epoch=150 loss possibilities
- Debugging lr, batch_size
-
Last:
When your loss is Nan, you can stop and debug the parameters.
How to refine, experience and luck, recommendation: Github, look at the article
If you have a problem 🤨:
Solution:
- View Up's list of frequently asked questions
- According to error ⚠️ Google search
- Github starts an Issue and asks a question
One final multiple cuda errors: no graphics card error specified ❌:
# Add these sentences at train.py # Number is the card you want to use import os os.environ['KMP_DUPLICATE_LIB_OK']='True' os.environ["CUDA_VISIBLE_DEVICES"] = "1,4,7"
Here your train.py should just run or sit.
Test got mAP
When you have finished 100 iterations, you will start testing.
Your training weights are all under logs~
Edit yolo.py and get_map.py can run by changing a few path s.
Then map_is output Out file, get map_out/reslut/mAP.png is fine.
But is a run unrealistic?
My initial solution was to run 10, 20, 30... out of 10.
But I find that mAP will only rise after loss stabilizes.
So you can delete the first ten and leave about 20 to test.
Whoever looks good anyway! Test him!
image validation~
I don't use this much ~see how it works
You're still happy when you see frames appear 😄
Eat more fast, ssd
The same thing fast, SSD does the same thing
- fater-rcnn-pytorch https://github.com/bubbliiiing/faster-rcnn-pytorch
- SSD - Pytorch https://github.com/bubbliiiing/ssd-pytorch
Advanced faster
When I saw Fasr trained with that blogger above, it wasn't ideal.
I'm ready to change Libraries
I'll try the github star ✨ Most Libraries
Warehouse address: https://github.com/jwyang/faster-rcnn.pytorch
Let me just follow the steps:
Deploy Base Environment
# Clone $ git clone https://github.com/jwyang/faster-rcnn.pytorch.git # Change Version $ git checkout pytorch-1.0 # You should have an Anaconda package. There are almost all $ cd faster-rcnn.pytorch && mkdir data $ cd data && mkdir pretrained_model $ cd ../ $ cd lib $ python setup.py build develop
Download corresponding weights
Put in/data/pretrained_ Below model
Just grab the dataset
Appear 2007_in addition to the dataset in the home directory Val.txt and 2007_train.txt Do not
Take it all and rename it under data: VOCdevkit2007
Adjust class and note case
Special attention here ⚠️ Case
Please take your label class as standard!!! Otherwise test mAP is all 0
Edit lib/datasets/pascal_voc.py replaced with its own
self._classes = ('__background__', # always index 0 'Dark shoes','Light shoes')
If an error occurs when you start running ❌ There is no keyvalue.
Looking at his error stack, you'll see a lower method - directly to pascal_ Just delete it from voc.py
Training dataset
$ CUDA_VISIBLE_DEVICES=0 python trainval_net.py --dataset pascal_voc --net vgg16 --epochs 30 --bs 8 --nw 4 --lr 1e-3 --lr_decay_step 6 --use_tfb --mGPUs --cuda
- CUDA_VISIBLE_DEVICES: Which GPU to use
- - dataset pascal_voc: using the Pascal dataset
- - net res101: use res101 as feature extraction network, optional vgg16
- - epochs 30: Each epoch passes through all the training maps.
- - bs 8: see above
- - nw 4:num_works=4,4 threads for reading
- - lr: initial learning rate
- - lr_decay_step: the learning rate decays every few epoch s
- - use_tfb: use tensorboardX to record and visualize, without writing; Optional parameters
- - mGPUs: multi-GPU training without writing the command; Optional parameters
- - cuda: use cuda;
The result is models, if you want to retrain
Remember to delete the cache and logs, cached under data
mAP for test datasets
test_net.py --dataset pascal_voc --net res101 --checksession 1 --checkepoch 30 --checkpoint 610 --cuda
- checksession 1--checkepoch 30--checkpoint 610 How do I choose?
To model/res101/pascal_ See below VOC for example faster_rcnn_1_30_610.pth understand
He won't generate the corresponding file png, so watch out for the output
Flops
When I privately trusted the blogger of Station B ~he said he would search for it
Then I write a corresponding version based on my own understanding
SSD - Pytorch
from nets.ssd import SSD300 import torchvision.models as models from ptflops import get_model_complexity_info def get_classes(classes_path): # Remove name and quantity with open(classes_path, encoding='utf-8') as f: class_names = f.readlines() class_names = [c.strip() for c in class_names] return class_names, len(class_names) if __name__ == "__main__": classes_path = 'model_data/voc_classes.txt' class_names, num_classes = get_classes(classes_path) # Corresponding model creation myNet = SSD300(num_classes + 1, 'vgg') # Set according to the number of layers and the size of the running picture flops, params = get_model_complexity_info(myNet, (3,416,416), as_strings=True, print_per_layer_stat=True) print("Flops: {}".format(flops)) print("Params: " + params)
FLOPs-yolov4
from nets.yolo import YoloBody from ptflops import get_model_complexity_info def get_classes(classes_path): # Remove name and quantity with open(classes_path, encoding='utf-8') as f: class_names = f.readlines() class_names = [c.strip() for c in class_names] return class_names, len(class_names) if __name__ == "__main__": classes_path = 'model_data/voc_clothes_classes.txt' class_names, num_classes = get_classes(classes_path) # Corresponding model creation anchors_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] myNet = YoloBody(anchors_mask, num_classes) # Set according to the number of layers and the size of the running picture flops, params = get_model_complexity_info(myNet, (3,416,416), as_strings=True, print_per_layer_stat=True) print("Flops: {}".format(flops)) print("Params: " + params)
Faster-rcnn
from nets.frcnn import FasterRCNN import torchvision.models as models from ptflops import get_model_complexity_info def get_classes(classes_path): with open(classes_path, encoding='utf-8') as f: class_names = f.readlines() class_names = [c.strip() for c in class_names] return class_names, len(class_names) if __name__ == "__main__": # print("Load model.") # ssd = SSD(confidence = 0.01, nms_iou = 0.5) # print("Load model done.") # net = models.vgg16() #Models you can build for yourself classes_path = 'model_data/voc_classes.txt' class_names, num_classes = get_classes(classes_path) myNet = FasterRCNN(num_classes, "predict", anchor_scales = [8, 16, 32], backbone = 'vgg') flops, params = get_model_complexity_info(myNet, (3,416,416), as_strings=True, print_per_layer_stat=True) print("Flops: {}".format(flops)) print("Params: " + params)
summarize experience
Alchemy is dull!
More experience to search, more try!
Basic Linux commands
Snap the graphics card with the Screen command and watch-n 0.5 nvidia-smi name 😁
get_map is very dull with its own magic script
First of all, you'll decide that you can't 🙅 Delete, avoid wasting time ⌚ Neodymium
- First you get_map wrapped as main function transfer (model_path)
def main(model_path): ####
- In get_ Subfix under map (multi-threaded write it down)
if __name__ == "__main__": Path = os.path.abspath(os.path.abspath(os.path.dirname(__file__))) logs_path = Path + '/logs' files= os.listdir(logs_path) want = [] for each in files: if '.pth' in each: want.append(each) # want = ['ep144-loss2.422-val_loss3.563.pth'] # Individual Test for each in want: path = 'logs/' + each with open(Path + '/log.txt', "a+", encoding='utf-8') as f: f.write(each + '*****') main(path)
- Changing yolo.py will default to self.model_path changed to accepted
#---------------------------------------------------# # Initialize YOLO #---------------------------------------------------# def __init__(self, model_path, **kwargs): # Manually add acceptance model_path self.__dict__.update(self._defaults) for name, value in kwargs.items(): setattr(self, name, value) #---------------------------------------------------# # Number of Get Kinds and Prior Boxes #---------------------------------------------------# self.class_names, self.num_classes = get_classes(self.classes_path) self.anchors, self.num_anchors = get_anchors(self.anchors_path) self.bbox_util = DecodeBox(self.anchors, self.num_classes, (self.input_shape[0], self.input_shape[1]), self.anchors_mask) #---------------------------------------------------# # Frame set different colors #---------------------------------------------------# hsv_tuples = [(x / self.num_classes, 1., 1.) for x in range(self.num_classes)] self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples)) self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors)) self.generate(model_path) # Incoming model_path #---------------------------------------------------# # Generate model #---------------------------------------------------# def generate(self, model_path): #---------------------------------------------------# # Build a yolo model, load the weights of the yolo model #---------------------------------------------------# self.net = YoloBody(self.anchors_mask, self.num_classes) device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') self.net.load_state_dict(torch.load(model_path, map_location=device)) # Don't self self.net = self.net.eval() print('{} model, anchors, and classes loaded.'.format(model_path)) # Don't self if self.cuda: self.net = nn.DataParallel(self.net) self.net = self.net.cuda()
- Change utils/utils_map.py starts at line 638
if show_animation: cv2.destroyAllWindows() results_file.write("\n# mAP of all classes\n") mAP = sum_AP / n_classes text = "mAP = {0:.2f}%".format(mAP*100) # Write in PATH = os.path.abspath(os.path.dirname(os.path.dirname(__file__))) with open(PATH + '/log.txt', "a+", encoding='utf-8') as f: f.write(text + '\n') results_file.write(text + "\n") print(text)
Large increase in mAP 🔝
Just because there are too few data sets! Plus random assignment
Your training turned out to be your test, of course, high
Modify voc_annotation.py starts on line 63
for xml in temp_xml: if xml.endswith(".xml"): total_xml.append(xml) total_xml.sort() num = len(total_xml) list = range(num) tv = int(num*trainval_percent) tr = int(tv*train_percent) # trainval= random.sample(list,tv) trainval= list[:tv] # train = random.sample(trainval,tr) train = trainval[:tr]
Last
I'm starting to fill in the basics. Tired~ 🥱
Practice and study again, I think it's less abstract.
That's all for now!