Data set: it should support indexing and fetching data
Data loader: mainly used to take out mini_batch
The previous sections use data to directly load the data into a file, and then put all the data into it. like this......
- All the data is put in and called batch. It can maximize the advantages of vector computing (parallel) and improve the computing speed.
- Using only one sample, random gradient descent helps overcome the saddle point problem. But the training time is too long
- Common Mini_batch balances training time and performance requirements.
Mini batch recycling:
Nested loop, the number of outer loops is controlled, and the inner loop executes one mini at a time_ batch. Run all minis on one side of each cycle_ batch.
Several nouns..
- Epoch: feed forward and back propagation shall be conducted for all samples. Call an epoch.
- Batch_Size: the number of samples used in each training.
- Iteration:Batch_ How many did size divide.
For example, if there are 10000 samples and a Batch has 1000 (Batch_Size), then Iteration is 10.
Disrupt - > group into several batches. After that, you can take out each batch through iteration, and then traverse the batch to take out the data.
Use of dataset and dataloader classes:
Dataset is an abstract class and cannot be instantiated. It can only be inherited and used by subclasses.
Dataloader can be instantiated
The magic method getitem implements the subscript index function.
The magic method len obviously realizes the function of finding the length or the number of data pieces.
Parameters in DataLoader:
dataset=dataset passes a dataset object.
batch_size = 32 specify batch_size.
shuffle = True scrambles the sample order.
num_workers = 2 read data to form Mini_batch, several processes are used for multithreading.
Pytoch version 0.4 may encounter the error reporting problem of multithreaded system kernel call in window. Solution: put the two-layer loop into the main function.
Overall implementation of dataset:
filepath file path
xy is a matrix. The shape method returns several rows and columns (tuples) of xy. Take [0], then self.len gets the number of rows in xy, that is, how many data samples there are.
Construct a data object and pass it to the path.
Finally, add the data loader.
Changes in later training cycles:
epoch runs all data 100 times.
The inner loop iterates over the previous data loader object. Using enumerate just wants to see how many iterations it is now.
From train_ Take out x[i] y[i] data from loader and put it into inputs and labels. Loader automatically changes xy to Tensor type. So inputs and labels are tensors.
Later invariant, loss sum function, gradient zeroing, back propagation, update weight.
Data are still available for last diabetes patients.
Full code:
import torch from torch.utils.data import Dataset from torch.utils.data import DataLoader import numpy as np import time # Data preparation filepath = './diabetes.csv' class DiabetesDataset(Dataset): def __init__(self,filepath): xy = np.loadtxt(filepath,delimiter = ',',dtype=np.float32) self.len = xy.shape[0] self.x_data = torch.from_numpy(xy[:,:-1]) self.y_data = torch.from_numpy(xy[:,[-1]]) def __getitem__(self, index): return self.x_data[index] , self.y_data[index] def __len__(self): return self.len dataset = DiabetesDataset(filepath) train_loader = DataLoader(dataset=dataset, batch_size=32, shuffle=True, num_workers=0) # Training model class Model(torch.nn.Module): def __init__(self): super(Model,self).__init__() # Dimensional change self.linear1 = torch.nn.Linear(8,6) self.linear2 = torch.nn.Linear(6,4) self.linear3 = torch.nn.Linear(4,1) # Add activator self.sigmoid = torch.nn.Sigmoid() def forward(self,x): # Multilayer neural network transfer x = self.sigmoid(self.linear1(x)) x = self.sigmoid(self.linear2(x)) x = self.sigmoid(self.linear3(x)) return x time_ll = [] loss_ll = [] model = Model() # loss function criterion = torch.nn.BCELoss(reduction='mean') # optimizer optimizer = torch.optim.SGD(model.parameters(),lr=0.01) if __name__ == '__main__': start_time = time.time() for epoch in range(10): for i,data in enumerate(train_loader): x, y = data # feedforward y_pre = model.forward(x) # loss loss = criterion(y_pre,y) loss_ll.append(loss.item()) # Gradient clearing optimizer.zero_grad() # Back propagation loss.backward() # to update optimizer.step() # print("currently training" + str(i) + "block batch") end_time = time.time() print("Program running time is:",end_time-start_time) print("magnitude of the loss",loss_ll[-1])
Mini is used in_ After the batch blocking operation, you can see that the cpu multi-core is working.
In the previous model training, all the data were directly thrown into the function, and only a few cpu cores were working.
Change num continuously_ After the workers, it is found that the time is the shortest only when 0 is used, that is, when only the main process is used. The training time of about 10 times is 1 second, and num_ When workers is changed to 1, the time increases to seven seconds. The time after changing to 2 is about 8 seconds.
When adding three cores, the paging area reports an error. You can directly go to the disk where you installed pycharm and set the virtual memory allocation of the disk.
After setting, I set num directly from 0-8_ Workers found that each additional core process took about one second to run.
About num_ How workers work:
- Open num_workers sub process (worker).
- Each worker obtains the ids they need to collect through the main process.
The order of ids is obtained by sampler or shuffle. Then each worker starts collecting data from a batch. (therefore, increasing the number of num_workers will increase the memory consumption, because each worker needs to cache a batch of data.) - After the data collection of the first worker is completed, it will be stuck here, waiting for the main process to take away the batch, and then collect the next batch.
- After the operation of the main process is completed, collect the second batch from the second worker, and so on.
- After the main process collects the batch of the last worker, it needs to go back to collect the second batch generated by the first worker. If the first worker is not collected at this time, the main thread will be stuck here. (this is why the main process will be stuck here every num_worker batch when data loading is time-consuming.)
So:
- If memory is limited, excessive num_workers can easily lead to memory overflow.
- You can judge whether it is necessary to continue to increase num_workers by observing whether there is a long wait after every num_workers batch. If there is no obvious delay, it indicates that the reading speed has been saturated and does not need to continue to increase. On the contrary, it can be alleviated by increasing num_workers.
- If the performance bottleneck is io, then exceeding the number of num_workers (cpu cores * 2) will accelerate the performance. However, if the performance bottleneck is cpu computing, increasing num_workers will reduce the performance. (because most CPUs now support 2 threads per core at the hardware level. After exceeding the limit, each process is scheduled by the operating system and takes longer.)