There are too many details in this article.
https://github.com/ParikaGoel/KinectFusion
This library is a real-time environment rebuild from a Kinect camera
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ismar2011.pdf
Paper on implementation.
RGBD dataset shared by the Technical University of Munich, Germany.
Here's what the format looks like:
1. rgb.txt and depth.txt record the collection time of each file and the corresponding file name.
2. The rgb/depth/directory stores the collected png format image files. Color image is eight-bit, three-channel, deep
The degree map is a 16-bit single-channel image. The file name is the collection time.
3. groundtruth.txt is the camera position captured by the external motion capture system in the format
(time, t x , t y , t z , q x , q y , q z , q w ),
https://vision.in.tum.de/data/datasets/rgbd-dataset/download
This link is the download location for all datasets.
Download a minimal look
Downloaded
This is the content
This is what happens when you unzip
Ground Track
Depth map
Diagram of RGB
ROSbang data, I think Intel's tools can fight
Download
ROS package format, a logging format used to store ROS messages in files. Files in this format are called packages and have a.Bag extension. Packages are rosbag and rqt_ Tool recording, playback, and general operations in the bag package.
Unzipped file
- Color images are stored in PNG format as 640x4808-bit RGB images.
- Depth maps are stored as 640x480 16-bit monochrome images in PNG format.
- Color and depth images have been pre-registered using PrimeSense's OpenNI driver, where the pixels in the color and depth images correspond to 1:1.
- The depth image is scaled by a factor of 5000, that is, the pixel value of 5000 in the depth image corresponds to a distance of 1 meter from the camera, 10,000 to 2 meters, and so on. A pixel value of 0 indicates a missing value/no data.
Has been processed
Provide the real track as a text file containing the camera's translation and direction in a fixed coordinate system
- Each line in the text file contains a posture.
- The format of each line is'timestamp tx ty tz qx qy qz qw'
- The time stamp (floating point number) gives the number of seconds since the Unix era.
- TX ty TZ (three floating point numbers) gives the position of the optical center of a color camera relative to the world origin defined by the motion capture system.
- qx qy qz qw (four floating point numbers) gives the direction of the optical center of a color camera relative to the world origin defined by the motion capture system in the form of unit quaternions.
- The file may contain comments that must begin with'#'.
The conversion from 2D images to 3D point clouds is described below. Note that the focal length (fx/fy), optical center (cx/cy), distortion parameter (d0-d4), and depth correction factor are different for each camera.
The following Python code shows how to calculate 3D points based on pixel coordinates and depth values:
Notes in Chinese
English
https://dev.intelrealsense.com/docs
Documentation for Intelrealsense
https://github.com/IntelRealSense/librealsense/releases
Download playback tools here
So big
My R200 plugged in did not respond and bang could not read it
ROS Package Error
But it recognizes its own camera, so RealSense's camera is also UVC-powered
https://www.mrpt.org/tutorials/programming/maths-and-geometry/2d_3d_geometry/
Beyond melancholy, find a link to learn mathematics:
No cause-and-effect learning, of course
https://svncvpr.in.tum.de/cvpr-ros-pkg/trunk/rgbd_benchmark/rgbd_benchmark_tools/src/rgbd_benchmark_tools/
Inside this address are some PY scripts for handling RGBD.
Kinect provides color and depth images in an asynchronous manner. This means that the time stamp set from the color image does not intersect with the time stamp from the depth image. Therefore, we need some way to associate color images with depth images.
import argparse import sys import os import numpy def read_file_list(filename): """ Reads a trajectory from a text file. File format: The file format is "stamp d1 d2 d3 ...", where stamp denotes the time stamp (to be matched) and "d1 d2 d3.." is arbitary data (e.g., a 3D position and 3D orientation) associated to this timestamp. Input: filename -- File name Output: dict -- dictionary of (stamp,data) tuples """ file = open(filename) data = file.read() lines = data.replace(",", " ").replace("\t", " ").split("\n") list = [[v.strip() for v in line.split(" ") if v.strip() != ""] for line in lines if len(line) > 0 and line[0] != "#"] list = [(float(l[0]), l[1:]) for l in list if len(l) > 1] return dict(list) def associate(first_list, second_list, offset, max_difference): """ Associate two dictionaries of (stamp,data). As the time stamps never match exactly, we aim to find the closest match for every input tuple. Input: first_list -- first dictionary of (stamp,data) tuples second_list -- second dictionary of (stamp,data) tuples offset -- time offset between both dictionaries (e.g., to model the delay between the sensors) max_difference -- search radius for candidate generation Output: matches -- list of matched tuples ((stamp1,data1),(stamp2,data2)) """ first_keys = first_list.keys() second_keys = second_list.keys() potential_matches = [(abs(a - (b + offset)), a, b) for a in first_keys for b in second_keys if abs(a - (b + offset)) < max_difference] potential_matches.sort() matches = [] for diff, a, b in potential_matches: if a in first_keys and b in second_keys: first_keys.remove(a) second_keys.remove(b) matches.append((a, b)) matches.sort() return matches if __name__ == '__main__': # parse command line parser = argparse.ArgumentParser(description=''' This script takes two data files with timestamps and associates them ''') parser.add_argument( 'first_file', help='first text file (format: timestamp data)') parser.add_argument( 'second_file', help='second text file (format: timestamp data)') parser.add_argument( '--first_only', help='only output associated lines from first file', action='store_true') parser.add_argument( '--offset', help='time offset added to the timestamps of the second file (default: 0.0)', default=0.0) parser.add_argument( '--max_difference', help='maximally allowed time difference for matching entries (default: 0.02)', default=0.02) args = parser.parse_args() first_list = read_file_list(args.first_file) second_list = read_file_list(args.second_file) matches = associate(first_list, second_list, float( args.offset), float(args.max_difference)) if args.first_only: for a, b in matches: print("%f %s" % (a, " ".join(first_list[a]))) else: for a, b in matches: print("%f %s %f %s" % (a, " ".join( first_list[a]), b-float(args.offset), " ".join(second_list[b])))
Registration Code
The depth image is already registered with the color image, so the pixels in the depth image correspond one to one with the pixels in the color image. Therefore, generating color point clouds is easy.
How to use scripts
Just one function
All of the above can be found from:
https://svncvpr.in.tum.de/cvpr-ros-pkg/trunk/rgbd_benchmark/rgbd_benchmark_tools/src/rgbd_benchmark_tools/
Get
https://vision.in.tum.de/data/datasets/rgbd-dataset/tools#evaluation
These are the documents:
What's more, after the RGB and depth images are captured, one step of fusion is like assembling and writing more information into one image.
Then the merged results are evaluated according to the rules, but not here. Then you can do what the high-level point clouds show.
Anyway, I'm doing some research. Here's how the interpolation algorithm works to generate point clouds. Note that an rgb and a depth map generate a point cloud.
Format above: png+png=PLY format
Last of all, I'll supplement the format requirements
Prepare for the experiment before you do it
Because RGB and Depth are one-to-one correspondences, the resolution is the same. Depth Image = Normal RGB 3-channel color image + Depth Map.
An image channel that contains information about the distance from the surface of the object in a view point scene. The channel itself is similar to a grayscale image, and each pixel value is the actual distance from the object measured by the sensor.
classification
- Proportional to camera distance: Closer surfaces are darker; Other surfaces are lighter
- Depending on the distance of the nominal plane: the surface near the focal plane is darker; Lighter surface away from focal plane
Actually, the picture comes from a Book
depth map is obtained with depth camera, which corresponds to two channels of black and white image, gray value and depth value, and four channels of RGBD for color image.
Image depth refers to the number of bits used to store each pixel and is also used to measure the color resolution of an image.
Image depth determines the possible number of colors per pixel of a color image, or the possible number of gray levels per pixel of a gray-scale image. It determines the maximum number of colors that can appear in a color image or the maximum gray level in a gray image. For example, if a monochrome image has 8 bits per pixel, the maximum number of gray scales is 2 to the eighth power, or 256. A color image with RGB three channels has 4,4,2 pixel bits. The maximum number of colors is 2 to 4+4+2 power, or 1024, which means the depth of the pixel is 10 bits, and each pixel can be one of 1024 colors.
For example:
_A picture with a size of 1024*768 and a depth of 16 has a data volume of 1.5M.
_is calculated as follows:
1024×768×16bit = (1024×768×16)/8 Byte = [(1024×768×16)/8]/1024 KB = 1536 KB = {[(1024×768×16)/8]/1024}/1024 MB = 1.5 MB
A function is done
"""""""is a Python documentation describing inputs and outputs
Here's how to open using Image
Method of PIL Library
This can be seen as preparation for processing. Make sure the resolution is consistent. I'll write a script to verify.
Change name here first
Write a script to verify
All the same
Then use a list to put our point cloud data
focalLength = 525.0 centerX = 319.5 centerY = 239.5 scalingFactor = 5000.0
4 parameters
- The depth image is scaled by a factor of 5000, that is, the pixel value of 5000 in the depth image corresponds to a distance of 1 meter from the camera, 10,000 to 2 meters, and so on.
- A pixel value of 0 indicates a missing value/no data.
- 16bit png is 5000
This sentence is our essence
A point's pixel coordinate (u, v) and its corresponding depth value depth can be calculated from the camera parameters and the following formulas (x, y, z) to be
def depth2mi(depthValue): return depthValue * 0.001 def depth2xyz(u, v, depthValue): fx = 361.1027 fy = 361.8266 cx = 258.4545 cy = 212.1282 depth = depth2mi(depthValue) z = float(depth) x = float((u - cx) * z) / fx y = float((v - cy) * z) / fy result = [x, y, z] return result
#Depth map stored in np array arr = np.array(depth) #Depth value of a point's pixel coordinate (u, v) on the aligned depth map on a color map depthValue =float(arr3[v, u]) coordinate = depth2xyz(u, v, depthValue)
The above is another implementation.