Understanding of FlowNet3D paper code

Take this as a record of your study, urge yourself to strive to promote the task, study hard and keep it updated.


It has been roughly straightened out and omitted


Here are some specific operations. What seems annoying is a lot of names and their respective dimension s.

	l0_xyz_f1 = point_cloud[:, :num_point, 0:3]  # dim: b n 3
    l0_points_f1 = point_cloud[:, :num_point, 3:]  # dim: b n channel
    l0_xyz_f2 = point_cloud[:, num_point:, 0:3]  # dim: b n 3
    l0_points_f2 = point_cloud[:, num_point:, 3:]  # dim: b n channel

The above code is established from the most original data (the so-called original data remains to be studied), l0_xyz_f1 means the coordinates of the point cloud data in the first frame of the zero layer, and its dimension is b n 3.

Research on what I call "raw data":
point_cloud is the first parameter passed in. In train.py, its source is the following batch_data, in short, is in

batch_data, batch_label, batch_mask = get_batch(TRAIN_DATASET, train_idxs, start_idx, end_idx)

In batch_ A key step in data formation:

		batch_data[i, :NUM_POINT, :3] = pc1[shuffle_idx]
        batch_data[i, :NUM_POINT, 3:] = color1[shuffle_idx]
        batch_data[i, NUM_POINT:, :3] = pc2[shuffle_idx]
        batch_data[i, NUM_POINT:, 3:] = color2[shuffle_idx]
    l0_xyz_f1 = point_cloud[:, :num_point, 0:3]  # dim: b n 3
    l0_points_f1 = point_cloud[:, :num_point, 3:]  # dim: b n channel
    l0_xyz_f2 = point_cloud[:, num_point:, 0:3]  # dim: b n 3
    l0_points_f2 = point_cloud[:, num_point:, 3:]  # dim: b n channel

Several radii are defined to describe how large a neighborhood is near a point, which is mentioned in the original paper. However, it is not clear which layer uses how large neighborhood.

	RADIUS1 = 0.5
    RADIUS2 = 1.0
    RADIUS3 = 2.0
    RADIUS4 = 4.0

Let's start processing the data layer by layer.

# Frame 1, Layer 1    # npoint is the number of points sampled at the farthest point
# Frame 1, Layer 1                                                                  # npoint is the number of points sampled at the farthest point
   l1_xyz_f1, l1_points_f1, l1_indices_f1 = pointnet_sa_module(l0_xyz_f1, l0_points_f1, npoint=1024,
                                                                    radius=RADIUS1, nsample=16, mlp=[32, 32, 64],
                                                                    mlp2=None, group_all=False, is_training=is_training,
                                                                    bn_decay=bn_decay, scope='layer1')
	end_points['l1_indices_f1'] = l1_indices_f1

Get the first layer of the first frame,
l1_ xyz_ Dimension of F1: b * npoint(1024) * 3;
l1_ points_ Dimension of F1: b * npoint(1024) * mlp[-1] (64);
l1_ indices_ Dimension of F1: batch_size, npoint, nsample(16).

# Frame 1, Layer 2
	l2_xyz_f1, l2_points_f1, l2_indices_f1 = pointnet_sa_module(l1_xyz_f1, l1_points_f1, npoint=256, radius=RADIUS2,
                                                                    nsample=16, mlp=[64, 64, 128], mlp2=None,
                                                                    group_all=False, is_training=is_training,
                                                                    bn_decay=bn_decay, scope='layer2')
	end_points['l2_indices_f1'] = l2_indices_f1

Get the second layer of the first frame,
l2_ xyz_ Dimension of F1: b * npoint(256) * 3;
l2_ points_ Dimension of F1: b * npoint(256) * mlp[-1] = 128;
l2_ indices_ Dimension of F1: batch_size, npoint, nsample(16).

Similarly, the same operation is performed on the second frame, and the results are listed as follows:

l1_ xyz_ Dimension of F2: b * npoint(1024) * 3;
l1_ points_ Dimension of F2: b * npoint(1024) * mlp[-1] = 64;
l1_ indices_ Dimension: batch_size, npoint, nsample(16).

l2_ xyz_ Dimension of F2: b * npoint(256) * 3;
l2_ points_ Dimension of F2: b * npoint(256) * mlp[-1] = 128;
l2_ indices_ Dimension: batch_size, npoint, nsample(16).

Then embedding.




pointnet_sa_module should be used for down sampling. The three parts (up sampling, flow embedding and down sampling) written in the paper are probably used for feature learning of the original point set.

Before you look at the code carefully, you can see that this method changes the number of points in the input point set (obviously), because the farthest point sampling is used in the algorithm, and npoint is the number of output points obtained by this sampling method.

This layer mainly goes through the following steps:
(1) sampling and grouping: it is used to disperse the whole point cloud into local groups. For each group, PointNet can be used to extract local global features separately
(2) Point Feature Embedding:

Explain in detail below the code.

def pointnet_sa_module(xyz, points, npoint, radius, nsample, mlp, mlp2, group_all, is_training, bn_decay, scope, bn=True, pooling='max', knn=False, use_xyz=True, use_nchw=False):
    ''' PointNet Set Abstraction (SA) Module
            xyz: (batch_size, ndataset, 3) TF tensor
            points: (batch_size, ndataset, channel) TF tensor
            npoint: int32 -- #points sampled in farthest point sampling
            radius: float32 -- search radius in local region
            nsample: int32 -- how many points in each local region
            mlp: list of int32 -- output size for MLP on each point
            mlp2: list of int32 -- output size for MLP on each region
            group_all: bool -- group all points into one PC if set true, OVERRIDE
                npoint, radius and nsample settings
            use_xyz: bool, if True concat XYZ with local point features, otherwise just use point features
            use_nchw: bool, if True, use NCHW data format for conv2d, which is usually faster than NHWC format
            new_xyz: (batch_size, npoint, 3) TF tensor
            new_points: (batch_size, npoint, mlp[-1] or mlp2[-1]) TF tensor
            idx: (batch_size, npoint, nsample) int32 -- indices for local regions

(thank you for your comments)

Part I Sample and Grouping

# Sample and Grouping
        if group_all:
            nsample = xyz.get_shape()[1].value
            new_xyz, new_points, idx, grouped_xyz = sample_and_group_all(xyz, points, use_xyz)
            new_xyz, new_points, idx, grouped_xyz = sample_and_group(npoint, radius, nsample, xyz, points, knn, use_xyz)

Where all authors use this method_ All is false, that is, all points are not regarded as a group.
Sample is used here_ and_ Group method:
knn and use used in the article_ XYZ adopts the default value, that is, knn = false, use_xyz = true.
This is an important function of sampling & grouping.

def sample_and_group(npoint, radius, nsample, xyz, points, knn=False, use_xyz=True):
        npoint: int32
        radius: float32
        nsample: int32
        xyz: (batch_size, ndataset, 3) TF tensor
        points: (batch_size, ndataset, channel) TF tensor, if None will just use xyz as points
        knn: bool, if True use kNN instead of radius search
        use_xyz: bool, if True concat XYZ with local point features, otherwise just use point features
        new_xyz: (batch_size, npoint, 3) TF tensor
        new_points: (batch_size, npoint, nsample, 3+channel) TF tensor
        idx: (batch_size, npoint, nsample) TF tensor, indices of local points as in ndataset points
        grouped_xyz: (batch_size, npoint, nsample, 3) TF tensor, normalized point XYZs
            (subtracted by seed point XYZ) in local regions

    new_xyz = gather_point(xyz, farthest_point_sample(npoint, xyz)) # (batch_size, npoint, 3)
    if knn:  # It's no use not looking
        _,idx = knn_point(nsample, xyz, new_xyz)
        # idx:[B, npoint, nsample] 
        idx, pts_cnt = query_ball_point(radius, nsample, xyz, new_xyz)
    grouped_xyz = group_point(xyz, idx) # (batch_size, npoint, nsample, 3)
    grouped_xyz -= tf.tile(tf.expand_dims(new_xyz, 2), [1,1,nsample,1]) # translation normalization
    if points is not None:
        grouped_points = group_point(points, idx) # (batch_size, npoint, nsample, channel)
        if use_xyz:
            new_points = tf.concat([grouped_xyz, grouped_points], axis=-1) # (batch_size, npoint, nample, 3+channel)
            new_points = grouped_points
        new_points = grouped_xyz

    return new_xyz, new_points, idx, grouped_xyz

  • Use gather first_ The point function obtains the new of the farthest point sampling_ xyz.
  • Then through query_ ball_ The point function obtains the index idx of nsample sampling point sets in each spherical domain of each sample (with new_xyz as the center of the spherical neighborhood). (detailed description at the end of this section)
    idx: [B, npoint, nsample] represents the index of nsample sampling points in each of the npoint spherical regions.
  • Then through group_point(xyz, idx) gets grouped_points, that is, the coordinates of each nSample of each npoint of each batch# (batch_size, npoint, nsample, 3)
  • The next step is to comment on something called translation normalization, which is the following line of code. Its purpose is to group_ Subtract the center point from XYZ (new_xyz obtained by farthest point sampling).
grouped_xyz -= tf.tile(tf.expand_dims(new_xyz, 2), [1,1,nsample,1]) # translation normalization
#new_xyz:(b, npoint, 3) --> (b, npoint, 1, 3) --> (b, npoint, nsample, 3)
  • Finally, if there is a new feature dimension on each point, the new feature is spliced with the old feature, otherwise the old feature is returned directly. Here because of use_xyz is true, so splice XYZ 3D coordinates and channel local point features.


reference resources

  • About tf.tile and tf.expand_dims
    About tf.tile and tf.expand_dims
    Where tf.expand_dims extends the dimension of the tensor. The first parameter is the processed tensor, and the second parameter is where to add a dimension to the original tensor.
    tf.tile copies the data in the current tensor according to certain rules. The final output tensor dimension remains unchanged.

  • About query_ball_point
    query_ ball_ For the point function, refer to the following:
    Thanks for xd's introduction
    The feeling is very detailed
         In this layer, the Ball query method is used to generate N 'local regions. According to the meaning in the paper, there are two variables, one is the number of midpoint K in each region, and the other is the radius of the ball. Here, the radius should be dominant. I will find a point in the ball with a certain radius, and the upper limit is K. The radius of the ball and the number of midpoints in each area are specified.
         query_ ball_ The point function is used to find points in the spherical field. Radius in the input is the radius of the spherical field, nsample is the point to be sampled in each field, new_xyz is the center of S spherical fields (obtained from the farthest point sampling in front), xyz is all point clouds; The output is the index [B, S, nsample][B,S,nsample] of each sampling point set in each spherical field of each sample. The detailed analysis is in the remarks.


Part II Point Feature Embedding

 # Point Feature Embedding
        if use_nchw: new_points = tf.transpose(new_points, [0,3,1,2])
        for i, num_out_channel in enumerate(mlp):
            new_points = tf_util.conv2d(new_points, num_out_channel, [1,1],
                                        padding='VALID', stride=[1,1],
                                        bn=bn, is_training=is_training,
                                        scope='conv%d'%(i), bn_decay=bn_decay,
        if use_nchw: new_points = tf.transpose(new_points, [0,2,3,1])

use_ The default value of nchw is false, which is not passed in by the author. The default value is used.
mlp 3D, which is passed in every call. Take frame 1 and layer 1 as examples, mlp=[32, 32, 64].

Part III Pooling in Local Regions

Maximum pooling is adopted

        if pooling=='max':
            new_points = tf.reduce_max(new_points, axis=[2], keep_dims=True, name='maxpool')


A class related to a dataset with various paths, so I find it a little difficult.
Because I think the code of this part is relatively short and easy to understand. The main reason is that a lot of reserve knowledge is insufficient, so I need some background knowledge first. (the text is behind the background knowledge)

background knowledge


glob module is a file operation related module brought by python. It can be used to find files that meet your purpose.
The main method of glob is glob. This method returns a list of all matching file paths. This method requires a parameter to specify the matching path string (the string can be an absolute path or a relative path). The returned file name only includes the file name in the current directory, not the files in the subfolder.

For example:
Get all txt files under drive C
Obtain all jpg files in the specified directory
Detailed examples

Read and write of. npz file

First, introduce the. npy file, which is a special binary format for Numpy. When used, the array will be saved in the file with the extension. npy in the uncompressed original binary format.
npz file is a compressed file that can save multiple arrays to the same file.
For. npz, the main functions used are:

  • np.savez() - save multiple data to one file
  • np.load() - read the file and return an object similar to a dictionary

reference resources: numpy -. npy and. npz files

import numpy as np

# Save multiple arrays to disk
a = np.arange(5)
b = np.arange(6)
c = np.arange(7)
np.savez('test', a, b, c_array=c)  # c_array is the name of array C
# Read array
data = np.load('test.npz')  #Similar to dictionary {'arr_0': a, 'arr_1': b, 'c_array': C}
print('arr_0 : ', data['arr_0'])
print('arr_1 : ', data['arr_1'])
print('c_array : ', data['c_array'])

arr_0 :  [0 1 2 3 4]
arr_1 :  [0 1 2 3 4 5]
c_array :  [0 1 2 3 4 5 6]

shape method in numpy

The shape method in numpy returns the size of an array.

  • a.shape()
  • a.shape.[i] returns the size of dimension I of A
from numpy import *
a = array([[1, 2], [3, 4], [5, 6], [7, 8]])
[[1 2]
 [3 4]
 [5 6]
 [7 8]]
(4, 2)



numpy.random.choice(a, size=None, replace=True, p=None)
Randomly extract numbers from a (as long as it is ndarray, but it must be one-dimensional) and form an array of a specified size
replace:True means the same number can be taken, False means the same number cannot be taken
Array p: corresponding to array a, indicating the probability of taking each element in array A. by default, the probability of selecting each element is the same.
cr: np.random.choice

Why normalize RGB of color

In the neural network, when inputting RGB pictures, it is usually divided by 255 to correspond the pixel value to between 0 and 1.
reference resources: Why should images be normalized in deep learning?
          Grayscale data representation (why divide 255)


In this way, the meaning of this part of the code is relatively clear. npoints represents the number of points retained from the original data, root represents the directory where the data folder is stored, datapath represents finding all TRAIN.npz or TEST.npz from the data folder, and datapath[index] is used to represent the number of npz files.

import os
import os.path
import json
import numpy as np
import sys
import pickle
import glob

class SceneflowDataset():
    def __init__(self, root='data_preprocessing/data_processed_maxcut_35_both_mask_20k_2k', npoints=2048, train=True):
        self.npoints = npoints
        self.train = train
        self.root = root
        if self.train:
            self.datapath = glob.glob(os.path.join(self.root, 'TRAIN*.npz'))  # read file
            self.datapath = glob.glob(os.path.join(self.root, 'TEST*.npz'))
        self.cache = {}
        self.cache_size = 30000

        ###### deal with one bad datapoint with nan value
        self.datapath = [d for d in self.datapath if 'TRAIN_C_0140_left_0006-0' not in d]  # ???

    def __getitem__(self, index):
        if index in self.cache:
            pos1, pos2, color1, color2, flow, mask1 = self.cache[index]
            fn = self.datapath[index]
            # 'rb': open a file in binary format for read-only. The file pointer will be placed at the beginning of the file
            with open(fn, 'rb') as fp:
                data = np.load(fp)
                pos1 = data['points1']
                pos2 = data['points2']
                # Here, the RGB value is normalized
                color1 = data['color1'] / 255
                color2 = data['color2'] / 255
                flow = data['flow']
                mask1 = data['valid_mask1']

            if len(self.cache) < self.cache_size:
                self.cache[index] = (pos1, pos2, color1, color2, flow, mask1)
                # Therefore, the value of cache[index] is the above tuple
                # cache = {index: (pos1, pos2, color1, color2, flow, mask1)}

        # If this one is training data
        if self.train:
            # n1 is the size of the first dimension of pos1
            n1 = pos1.shape[0]
            # Select npoints randomly from n1 without putting them back
            sample_idx1 = np.random.choice(n1, self.npoints, replace=False)
            n2 = pos2.shape[0]
            sample_idx2 = np.random.choice(n2, self.npoints, replace=False)
            # A new set of data after sampling
            pos1_ = np.copy(pos1[sample_idx1, :])
            pos2_ = np.copy(pos2[sample_idx2, :])
            color1_ = np.copy(color1[sample_idx1, :])
            color2_ = np.copy(color2[sample_idx2, :])
            flow_ = np.copy(flow[sample_idx1, :])
            mask1_ = np.copy(mask1[sample_idx1])
        # If it is not training data, directly take out the first npoints
            pos1_ = np.copy(pos1[:self.npoints, :])
            pos2_ = np.copy(pos2[:self.npoints, :])
            color1_ = np.copy(color1[:self.npoints, :])
            color2_ = np.copy(color2[:self.npoints, :])
            flow_ = np.copy(flow[:self.npoints, :])
            mask1_ = np.copy(mask1[:self.npoints])

        return pos1_, pos2_, color1_, color2_, flow_, mask1_

    def __len__(self):
        return len(self.datapath)

Tags: Python Deep Learning 3d

Posted on Sun, 05 Dec 2021 11:43:03 -0500 by dheeraj