# Python-Point Cloud Series--Reproduction of PointNet Papers

Preface:
Reference Blog 1 Reference:
Reference Blog 2 Reference:

# Introduction to Point Net

## network structure

### Rotation transformation matrix

A 3 x\times x 3 matrix is used to rotate single point cloud data.The parameters of the matrix are automatically adjusted during the network training process, that is, during the training process, the network will automatically rotate the point cloud objects to be classified to achieve an appropriate effect.In fact, it can also be regarded as a feature transformation layer, only a geometric rotation change from a physical level.
In fact, the original data stream is divided into two shares, one to train a network, the output of which is a matrix of 3 \times *3, the other is
Multiply this matrix directly to rotate it.
There are not too many explanations in the main body of the paper, but they are put in supplementary materials Supplementary Materials

In addition, inspired by the low dimension (3dim), the author also rotates the converted high-dimensional features, such as 64-dimensional features multiplied by a 64 \times *64 matrix to achieve the effect of high-dimensional spatial rotation.To keep the transformation matrix close to the orthogonal matrix, the author adds a regular term to the matrix parameter.

This part is still partially incomprehensible
1. Why is it necessary to rotate the calibration in the converted high-dimensional feature space when data is entered (experimentally wrong?)
2. I read a lot of articles saying to go to the front, but I still don't understand where this front came from (possibly related to orthogonal matrices)?This part may also require knowledge of linearity.
3. Why do regular terms need to be added to the subsequent 64-dimensional transformation matrix to achieve the orthogonal matrix?
Reference Blog Refer to other blogs to learn that rotation matrices are one of the orthogonal matrices.
4. Why don't regular terms be added to the 3-D rotation transformation matrix to ensure its orthogonality?Same as high dimension?

Orthogonal transformations in Euclidean space V only contain:
(1) Rotation
(2) Reflection
(3) combination of rotation and reflection (i.e. flaw rotation)

# Source Parsing (Detailed)

## Source directory structure

The unzipped file directory is shown below.The code for the PointNet principal is in the folder "models".

The "models" folder contains the code for the original pointnet as well as the code for classification, segmentation, and detection, with pointnet.py containing the most basic model code.
As follows

## PointNet.py

import torch
import torch.nn as nn
import torch.nn.parallel
import torch.utils.data
import numpy as np
import torch.nn.functional as F
"""
STN: Spatial Transformer Networks  Spatial Conversion Network
"""
# This side implements a three-dimensional spatial transformation network.
class STN3d(nn.Module):
def __init__(self, channel):
super(STN3d, self).__init__()
self.conv1 = torch.nn.Conv1d(channel, 64, 1)
self.conv2 = torch.nn.Conv1d(64, 128, 1)
self.conv3 = torch.nn.Conv1d(128, 1024, 1)
self.fc1 = nn.Linear(1024, 512)
self.fc2 = nn.Linear(512, 256)
self.fc3 = nn.Linear(256, 9)
self.relu = nn.ReLU()

self.bn1 = nn.BatchNorm1d(64)
self.bn2 = nn.BatchNorm1d(128)
self.bn3 = nn.BatchNorm1d(1024)
self.bn4 = nn.BatchNorm1d(512)
self.bn5 = nn.BatchNorm1d(256)

def forward(self, x):
batchsize = x.size()[0]  # The first dimension is the number of batch es
x = F.relu(self.bn1(self.conv1(x)))
x = F.relu(self.bn2(self.conv2(x)))
x = F.relu(self.bn3(self.conv3(x)))
x = torch.max(x, 2, keepdim=True)[0]
x = x.view(-1, 1024)  # Convert to data listed as 1024 but with indefinite rows

x = F.relu(self.bn4(self.fc1(x)))
x = F.relu(self.bn5(self.fc2(x)))
x = self.fc3(x)

iden = Variable(torch.from_numpy(np.array([1, 0, 0, 0, 1, 0, 0, 0, 1]).astype(np.float32))).view(1, 9).repeat(
batchsize, 1)  # Generates a 3x3 unit matrix, but is a one-line form for easy calculation
if x.is_cuda:
iden = iden.cuda()
x = x + iden  # What does this add up to?Why add the unit matrix? This is the diagonal unit matrix initialized by the corresponding paper.
x = x.view(-1, 3, 3)  # Matrix converted to 3x3
return x

class STNkd(nn.Module):
def __init__(self, k=64):
super(STNkd, self).__init__()
self.conv1 = torch.nn.Conv1d(k, 64, 1)
self.conv2 = torch.nn.Conv1d(64, 128, 1)
self.conv3 = torch.nn.Conv1d(128, 1024, 1)
self.fc1 = nn.Linear(1024, 512)
self.fc2 = nn.Linear(512, 256)
self.fc3 = nn.Linear(256, k * k)
self.relu = nn.ReLU()

self.bn1 = nn.BatchNorm1d(64)
self.bn2 = nn.BatchNorm1d(128)
self.bn3 = nn.BatchNorm1d(1024)
self.bn4 = nn.BatchNorm1d(512)
self.bn5 = nn.BatchNorm1d(256)

self.k = k

def forward(self, x):
batchsize = x.size()[0]
x = F.relu(self.bn1(self.conv1(x)))
x = F.relu(self.bn2(self.conv2(x)))
x = F.relu(self.bn3(self.conv3(x)))
x = torch.max(x, 2, keepdim=True)[0]
x = x.view(-1, 1024)

x = F.relu(self.bn4(self.fc1(x)))
x = F.relu(self.bn5(self.fc2(x)))
x = self.fc3(x)

iden = Variable(torch.from_numpy(np.eye(self.k).flatten().astype(np.float32))).view(1, self.k * self.k).repeat(
batchsize, 1)
if x.is_cuda:
iden = iden.cuda()
x = x + iden
x = x.view(-1, self.k, self.k)
return x

"""High-dimensional mapping network, a network that maps a single point cloud point to a multidimensional space to avoid excessive loss of information due to subsequent maximum pooling"""

class PointNetEncoder(nn.Module):
def __init__(self, global_feat=True, feature_transform=False, channel=3):
super(PointNetEncoder, self).__init__()
self.stn = STN3d(channel)  # 3-D Spatial Conversion Matrix
self.conv1 = torch.nn.Conv1d(channel, 64, 1)
self.conv2 = torch.nn.Conv1d(64, 128, 1)
self.conv3 = torch.nn.Conv1d(128, 1024, 1)
self.bn1 = nn.BatchNorm1d(64)
self.bn2 = nn.BatchNorm1d(128)
self.bn3 = nn.BatchNorm1d(1024)
self.global_feat = global_feat  # Global Special Detection Mark
self.feature_transform = feature_transform  # Whether to calibrate high-dimensional features by rotational transformation
if self.feature_transform:
self.fstn = STNkd(k=64)  # High-dimensional spatial transformation matrix

def forward(self, x):
# B: One batch size of sample, batch; D: Dimension 3 (x, y, z) dim of point; Number of N: points (1024) number
# That is, 24 samples are input at a time, one sample contains 1024 point cloud points and one point cloud point is 3-dimensional (x,y,z)
B, D, N = x.size()  # [24, 3, 1024]
trans = self.stn(x)  # Get a 3-D rotation transformation matrix
x = x.transpose(2, 1)  # Align 2 and 1 axes, equivalent to [24,1024,3]
if D > 3:  # If this side is a feature point, it is not only 3-D (x,y,z), it may be multidimensional
x, feature = x.split(3, dim=2)  # Separate by 3 blocks from dimension 2.Is to separate high-dimensional features into three parts
x = torch.bmm(x, trans)  # Rotation transformation of 3-D point cloud data
if D > 3:
x = torch.cat([x, feature], dim=2)
x = x.transpose(2, 1)  # Re-align the 2 and 1 axes,??
x = F.relu(self.bn1(self.conv1(x)))  # Conduct the first convolution, standardization, activation, and get 64-dimensional data
"""--------(2020/1/18)---------"""
# Below is the second layer convolution processing
if self.feature_transform:  # If rotational calibration of intermediate features is required
trans_feat = self.fstn(x)  # Getting the Rotation Matrix of the Eigenspace
x = x.transpose(2, 1)  # Align 1 and 2 axes
x = torch.bmm(x, trans_feat)  # Rotation transformation of feature data
x = x.transpose(2, 1)  # Align 2 axes with 1 again
else:
trans_feat = None
pointfeat = x  # Characteristics after rotation correction
x = F.relu(self.bn2(self.conv2(x)))  # Second convolution output dimension 128
x = self.bn3(self.conv3(x))  # Third convolution output dimension 1024
x = torch.max(x, 2, keepdim=True)[0]  # Maximum pooling, returning only the largest number, not the index ([0] is a number, [1] is an index)
x = x.view(-1, 1024)  # If x reshape is an indefinite matrix with 1024 columns, the -1 on this side means indefinite rows.
if self.global_feat:  # Is it a global feature
return x, trans, trans_feat  # Returns the feature data x, 3-dimensional rotation matrix, multi-dimensional rotation matrix
else:
x = x.view(-1, 1024, 1).repeat(1, 1, N)  # An extra dimension is extended to unify the dimension with the local features, facilitate subsequent connections, and then copy to the same number as the local features
return torch.cat([x, pointfeat], 1), trans, trans_feat # In this side of the point cloud segmentation algorithm, global features are connected to local features.

"""This side is the regular term for the proof of high-dimensional feature space transformation, which roughly means multiplying the transformation matrix by its transposition matrix and subtracting the unit matrix, and taking the mean of the remaining difference as a loss function"""
def feature_transform_reguliarzer(trans):
d = trans.size()[1]  # Matrix Dimension
I = torch.eye(d)[None, :, :]  # Generating Diagonal Unit Matrix of Same Dimension
if trans.is_cuda:  # Whether to use Cuda acceleration
I = I.cuda()
# Loss function, which multiplies the transformation matrix by itself and then subtracts the unit matrix, takes the element mean of the result as the loss function, because the orthogonal matrix multiplies it by the unit matrix.Does not this side need an absolute value or L2?
# A* (A'-I) = A*A'-A*I = I - A*I | A': Transition of matrix A
loss = torch.mean(torch.norm(torch.bmm(trans, trans.transpose(2, 1) - I), dim=(1, 2)))
return loss


# Other

The more you learn the source code and the paper, the more you find that the basic unit of pointnet is by encoding (x,y,z) 1024 high-dimensional redundant data, decoding 256 data, and then connecting the corresponding full connection layers according to the target requirements, such as T-net, if you want to get a 3x3 matrix, he connects the full connection layers with 9 nodes, and then connects the corresponding full connection layers with the corresponding classifications if you want to classify them.