Take this as a record of your study, urge yourself to strive to promote the task, study hard and keep it updated.
train.py
It has been roughly straightened out and omitted
model_concat_upsa.py
Here are some specific operations. What seems annoying is a lot of names and their respective dimension s.
l0_xyz_f1 = point_cloud[:, :num_point, 0:3] # dim: b n 3 l0_points_f1 = point_cloud[:, :num_point, 3:] # dim: b n channel l0_xyz_f2 = point_cloud[:, num_point:, 0:3] # dim: b n 3 l0_points_f2 = point_cloud[:, num_point:, 3:] # dim: b n channel
The above code is established from the most original data (the socalled original data remains to be studied), l0_xyz_f1 means the coordinates of the point cloud data in the first frame of the zero layer, and its dimension is b n 3.
Research on what I call "raw data":
point_cloud is the first parameter passed in. In train.py, its source is the following batch_data, in short, is in
batch_data, batch_label, batch_mask = get_batch(TRAIN_DATASET, train_idxs, start_idx, end_idx)
In batch_ A key step in data formation:
batch_data[i, :NUM_POINT, :3] = pc1[shuffle_idx] batch_data[i, :NUM_POINT, 3:] = color1[shuffle_idx] batch_data[i, NUM_POINT:, :3] = pc2[shuffle_idx] batch_data[i, NUM_POINT:, 3:] = color2[shuffle_idx]
l0_xyz_f1 = point_cloud[:, :num_point, 0:3] # dim: b n 3 l0_points_f1 = point_cloud[:, :num_point, 3:] # dim: b n channel l0_xyz_f2 = point_cloud[:, num_point:, 0:3] # dim: b n 3 l0_points_f2 = point_cloud[:, num_point:, 3:] # dim: b n channel
Several radii are defined to describe how large a neighborhood is near a point, which is mentioned in the original paper. However, it is not clear which layer uses how large neighborhood.
RADIUS1 = 0.5 RADIUS2 = 1.0 RADIUS3 = 2.0 RADIUS4 = 4.0
Let's start processing the data layer by layer.
# Frame 1, Layer 1 # npoint is the number of points sampled at the farthest point # Frame 1, Layer 1 # npoint is the number of points sampled at the farthest point l1_xyz_f1, l1_points_f1, l1_indices_f1 = pointnet_sa_module(l0_xyz_f1, l0_points_f1, npoint=1024, radius=RADIUS1, nsample=16, mlp=[32, 32, 64], mlp2=None, group_all=False, is_training=is_training, bn_decay=bn_decay, scope='layer1') end_points['l1_indices_f1'] = l1_indices_f1
Get the first layer of the first frame,
l1_ xyz_ Dimension of F1: b * npoint(1024) * 3;
l1_ points_ Dimension of F1: b * npoint(1024) * mlp[1] (64);
l1_ indices_ Dimension of F1: batch_size, npoint, nsample(16).
# Frame 1, Layer 2 l2_xyz_f1, l2_points_f1, l2_indices_f1 = pointnet_sa_module(l1_xyz_f1, l1_points_f1, npoint=256, radius=RADIUS2, nsample=16, mlp=[64, 64, 128], mlp2=None, group_all=False, is_training=is_training, bn_decay=bn_decay, scope='layer2') end_points['l2_indices_f1'] = l2_indices_f1
Get the second layer of the first frame,
l2_ xyz_ Dimension of F1: b * npoint(256) * 3;
l2_ points_ Dimension of F1: b * npoint(256) * mlp[1] = 128;
l2_ indices_ Dimension of F1: batch_size, npoint, nsample(16).
Similarly, the same operation is performed on the second frame, and the results are listed as follows:
l1_ xyz_ Dimension of F2: b * npoint(1024) * 3;
l1_ points_ Dimension of F2: b * npoint(1024) * mlp[1] = 64;
l1_ indices_ Dimension: batch_size, npoint, nsample(16).
l2_ xyz_ Dimension of F2: b * npoint(256) * 3;
l2_ points_ Dimension of F2: b * npoint(256) * mlp[1] = 128;
l2_ indices_ Dimension: batch_size, npoint, nsample(16).
Then embedding.
pointnet_util.py
pointnet_sa_module
pointnet_sa_module should be used for down sampling. The three parts (up sampling, flow embedding and down sampling) written in the paper are probably used for feature learning of the original point set.
Before you look at the code carefully, you can see that this method changes the number of points in the input point set (obviously), because the farthest point sampling is used in the algorithm, and npoint is the number of output points obtained by this sampling method.
This layer mainly goes through the following steps:
(1) sampling and grouping: it is used to disperse the whole point cloud into local groups. For each group, PointNet can be used to extract local global features separately
(2) Point Feature Embedding:
Explain in detail below the code.
def pointnet_sa_module(xyz, points, npoint, radius, nsample, mlp, mlp2, group_all, is_training, bn_decay, scope, bn=True, pooling='max', knn=False, use_xyz=True, use_nchw=False): ''' PointNet Set Abstraction (SA) Module Input: xyz: (batch_size, ndataset, 3) TF tensor points: (batch_size, ndataset, channel) TF tensor npoint: int32  #points sampled in farthest point sampling radius: float32  search radius in local region nsample: int32  how many points in each local region mlp: list of int32  output size for MLP on each point mlp2: list of int32  output size for MLP on each region group_all: bool  group all points into one PC if set true, OVERRIDE npoint, radius and nsample settings use_xyz: bool, if True concat XYZ with local point features, otherwise just use point features use_nchw: bool, if True, use NCHW data format for conv2d, which is usually faster than NHWC format Return: new_xyz: (batch_size, npoint, 3) TF tensor new_points: (batch_size, npoint, mlp[1] or mlp2[1]) TF tensor idx: (batch_size, npoint, nsample) int32  indices for local regions '''
(thank you for your comments)
Part I Sample and Grouping
# Sample and Grouping if group_all: nsample = xyz.get_shape()[1].value new_xyz, new_points, idx, grouped_xyz = sample_and_group_all(xyz, points, use_xyz) else: new_xyz, new_points, idx, grouped_xyz = sample_and_group(npoint, radius, nsample, xyz, points, knn, use_xyz)
Where all authors use this method_ All is false, that is, all points are not regarded as a group.
Sample is used here_ and_ Group method:
knn and use used in the article_ XYZ adopts the default value, that is, knn = false, use_xyz = true.
This is an important function of sampling & grouping.
def sample_and_group(npoint, radius, nsample, xyz, points, knn=False, use_xyz=True): ''' Input: npoint: int32 radius: float32 nsample: int32 xyz: (batch_size, ndataset, 3) TF tensor points: (batch_size, ndataset, channel) TF tensor, if None will just use xyz as points knn: bool, if True use kNN instead of radius search use_xyz: bool, if True concat XYZ with local point features, otherwise just use point features Output: new_xyz: (batch_size, npoint, 3) TF tensor new_points: (batch_size, npoint, nsample, 3+channel) TF tensor idx: (batch_size, npoint, nsample) TF tensor, indices of local points as in ndataset points grouped_xyz: (batch_size, npoint, nsample, 3) TF tensor, normalized point XYZs (subtracted by seed point XYZ) in local regions ''' new_xyz = gather_point(xyz, farthest_point_sample(npoint, xyz)) # (batch_size, npoint, 3) if knn: # It's no use not looking _,idx = knn_point(nsample, xyz, new_xyz) else: # idx:[B, npoint, nsample] idx, pts_cnt = query_ball_point(radius, nsample, xyz, new_xyz) grouped_xyz = group_point(xyz, idx) # (batch_size, npoint, nsample, 3) grouped_xyz = tf.tile(tf.expand_dims(new_xyz, 2), [1,1,nsample,1]) # translation normalization if points is not None: grouped_points = group_point(points, idx) # (batch_size, npoint, nsample, channel) if use_xyz: new_points = tf.concat([grouped_xyz, grouped_points], axis=1) # (batch_size, npoint, nample, 3+channel) else: new_points = grouped_points else: new_points = grouped_xyz return new_xyz, new_points, idx, grouped_xyz
 Use gather first_ The point function obtains the new of the farthest point sampling_ xyz.
 Then through query_ ball_ The point function obtains the index idx of nsample sampling point sets in each spherical domain of each sample (with new_xyz as the center of the spherical neighborhood). (detailed description at the end of this section)
idx: [B, npoint, nsample] represents the index of nsample sampling points in each of the npoint spherical regions.  Then through group_point(xyz, idx) gets grouped_points, that is, the coordinates of each nSample of each npoint of each batch# (batch_size, npoint, nsample, 3)
 The next step is to comment on something called translation normalization, which is the following line of code. Its purpose is to group_ Subtract the center point from XYZ (new_xyz obtained by farthest point sampling).
grouped_xyz = tf.tile(tf.expand_dims(new_xyz, 2), [1,1,nsample,1]) # translation normalization #new_xyz:(b, npoint, 3) > (b, npoint, 1, 3) > (b, npoint, nsample, 3)
 Finally, if there is a new feature dimension on each point, the new feature is spliced with the old feature, otherwise the old feature is returned directly. Here because of use_xyz is true, so splice XYZ 3D coordinates and channel local point features.
reference resources

About tf.tile and tf.expand_dims
About tf.tile and tf.expand_dims
Where tf.expand_dims extends the dimension of the tensor. The first parameter is the processed tensor, and the second parameter is where to add a dimension to the original tensor.
tf.tile copies the data in the current tensor according to certain rules. The final output tensor dimension remains unchanged.

About query_ball_point
query_ ball_ For the point function, refer to the following:
Thanks for xd's introduction
The feeling is very detailed
Probably:
In this layer, the Ball query method is used to generate N 'local regions. According to the meaning in the paper, there are two variables, one is the number of midpoint K in each region, and the other is the radius of the ball. Here, the radius should be dominant. I will find a point in the ball with a certain radius, and the upper limit is K. The radius of the ball and the number of midpoints in each area are specified.
query_ ball_ The point function is used to find points in the spherical field. Radius in the input is the radius of the spherical field, nsample is the point to be sampled in each field, new_xyz is the center of S spherical fields (obtained from the farthest point sampling in front), xyz is all point clouds; The output is the index [B, S, nsample][B,S,nsample] of each sampling point set in each spherical field of each sample. The detailed analysis is in the remarks.
Part II Point Feature Embedding
# Point Feature Embedding if use_nchw: new_points = tf.transpose(new_points, [0,3,1,2]) for i, num_out_channel in enumerate(mlp): new_points = tf_util.conv2d(new_points, num_out_channel, [1,1], padding='VALID', stride=[1,1], bn=bn, is_training=is_training, scope='conv%d'%(i), bn_decay=bn_decay, data_format=data_format) if use_nchw: new_points = tf.transpose(new_points, [0,2,3,1])
use_ The default value of nchw is false, which is not passed in by the author. The default value is used.
mlp 3D, which is passed in every call. Take frame 1 and layer 1 as examples, mlp=[32, 32, 64].
 Using tf_util.conv2d updated new_ Points (integrates the coordinates and channels of the points sampled in each neighborhood).
tf.nn.cov
batch_norm_for_conv2d
Others' explanation
_variable_with_weight_decay
Part III Pooling in Local Regions
Maximum pooling is adopted
if pooling=='max': new_points = tf.reduce_max(new_points, axis=[2], keep_dims=True, name='maxpool')
flying_things_dataset.py
A class related to a dataset with various paths, so I find it a little difficult.
Because I think the code of this part is relatively short and easy to understand. The main reason is that a lot of reserve knowledge is insufficient, so I need some background knowledge first. (the text is behind the background knowledge)
background knowledge
glob
glob module is a file operation related module brought by python. It can be used to find files that meet your purpose.
The main method of glob is glob. This method returns a list of all matching file paths. This method requires a parameter to specify the matching path string (the string can be an absolute path or a relative path). The returned file name only includes the file name in the current directory, not the files in the subfolder.
For example:
glob.glob(r'c:*.txt')
Get all txt files under drive C
glob.glob(r'E:\pic**.jpg')
Obtain all jpg files in the specified directory
Detailed examples
Read and write of. npz file
First, introduce the. npy file, which is a special binary format for Numpy. When used, the array will be saved in the file with the extension. npy in the uncompressed original binary format.
npz file is a compressed file that can save multiple arrays to the same file.
For. npz, the main functions used are:
 np.savez()  save multiple data to one file
 np.load()  read the file and return an object similar to a dictionary
reference resources: numpy . npy and. npz files
import numpy as np # Save multiple arrays to disk a = np.arange(5) b = np.arange(6) c = np.arange(7) np.savez('test', a, b, c_array=c) # c_array is the name of array C # Read array data = np.load('test.npz') #Similar to dictionary {'arr_0': a, 'arr_1': b, 'c_array': C} print('arr_0 : ', data['arr_0']) print('arr_1 : ', data['arr_1']) print('c_array : ', data['c_array'])  arr_0 : [0 1 2 3 4] arr_1 : [0 1 2 3 4 5] c_array : [0 1 2 3 4 5 6]
shape method in numpy
The shape method in numpy returns the size of an array.
 a.shape()
 a.shape.[i] returns the size of dimension I of A
from numpy import * a = array([[1, 2], [3, 4], [5, 6], [7, 8]]) print(a) print(a.shape) print(a.shape[0])  [[1 2] [3 4] [5 6] [7 8]] (4, 2) 4
np.random.choice
numpy.random.choice(a, size=None, replace=True, p=None)
Randomly extract numbers from a (as long as it is ndarray, but it must be onedimensional) and form an array of a specified size
replace:True means the same number can be taken, False means the same number cannot be taken
Array p: corresponding to array a, indicating the probability of taking each element in array A. by default, the probability of selecting each element is the same.
cr: np.random.choice
Why normalize RGB of color
In the neural network, when inputting RGB pictures, it is usually divided by 255 to correspond the pixel value to between 0 and 1.
reference resources: Why should images be normalized in deep learning?
Grayscale data representation (why divide 255)
text
In this way, the meaning of this part of the code is relatively clear. npoints represents the number of points retained from the original data, root represents the directory where the data folder is stored, datapath represents finding all TRAIN.npz or TEST.npz from the data folder, and datapath[index] is used to represent the number of npz files.
import os import os.path import json import numpy as np import sys import pickle import glob class SceneflowDataset(): def __init__(self, root='data_preprocessing/data_processed_maxcut_35_both_mask_20k_2k', npoints=2048, train=True): self.npoints = npoints self.train = train self.root = root if self.train: self.datapath = glob.glob(os.path.join(self.root, 'TRAIN*.npz')) # read file else: self.datapath = glob.glob(os.path.join(self.root, 'TEST*.npz')) self.cache = {} self.cache_size = 30000 ###### deal with one bad datapoint with nan value self.datapath = [d for d in self.datapath if 'TRAIN_C_0140_left_00060' not in d] # ??? ###### def __getitem__(self, index): if index in self.cache: pos1, pos2, color1, color2, flow, mask1 = self.cache[index] else: fn = self.datapath[index] # 'rb': open a file in binary format for readonly. The file pointer will be placed at the beginning of the file with open(fn, 'rb') as fp: data = np.load(fp) pos1 = data['points1'] pos2 = data['points2'] # Here, the RGB value is normalized color1 = data['color1'] / 255 color2 = data['color2'] / 255 flow = data['flow'] mask1 = data['valid_mask1'] if len(self.cache) < self.cache_size: self.cache[index] = (pos1, pos2, color1, color2, flow, mask1) # Therefore, the value of cache[index] is the above tuple # cache = {index: (pos1, pos2, color1, color2, flow, mask1)} # If this one is training data if self.train: # n1 is the size of the first dimension of pos1 n1 = pos1.shape[0] # Select npoints randomly from n1 without putting them back sample_idx1 = np.random.choice(n1, self.npoints, replace=False) n2 = pos2.shape[0] sample_idx2 = np.random.choice(n2, self.npoints, replace=False) # A new set of data after sampling pos1_ = np.copy(pos1[sample_idx1, :]) pos2_ = np.copy(pos2[sample_idx2, :]) color1_ = np.copy(color1[sample_idx1, :]) color2_ = np.copy(color2[sample_idx2, :]) flow_ = np.copy(flow[sample_idx1, :]) mask1_ = np.copy(mask1[sample_idx1]) # If it is not training data, directly take out the first npoints else: pos1_ = np.copy(pos1[:self.npoints, :]) pos2_ = np.copy(pos2[:self.npoints, :]) color1_ = np.copy(color1[:self.npoints, :]) color2_ = np.copy(color2[:self.npoints, :]) flow_ = np.copy(flow[:self.npoints, :]) mask1_ = np.copy(mask1[:self.npoints]) return pos1_, pos2_, color1_, color2_, flow_, mask1_ def __len__(self): return len(self.datapath)