Depth Image Data (TUM Dataset) from Obi Medium Optical Camera

There are too many details in this article.

This library is a real-time environment rebuild from a Kinect camera

Paper on implementation.

RGBD dataset shared by the Technical University of Munich, Germany.

Here's what the format looks like:

1. rgb.txt and depth.txt record the collection time of each file and the corresponding file name.

2. The rgb/depth/directory stores the collected png format image files. Color image is eight-bit, three-channel, deep

The degree map is a 16-bit single-channel image. The file name is the collection time.

3. groundtruth.txt is the camera position captured by the external motion capture system in the format

(time, t x , t y , t z , q x , q y , q z , q w ),

This link is the download location for all datasets.

Download a minimal look


This is the content

This is what happens when you unzip

Ground Track

Depth map

Diagram of RGB

ROSbang data, I think Intel's tools can fight


ROS package format, a logging format used to store ROS messages in files. Files in this format are called packages and have a.Bag extension. Packages are rosbag and rqt_ Tool recording, playback, and general operations in the bag package.

Unzipped file

  • Color images are stored in PNG format as 640x4808-bit RGB images.
  • Depth maps are stored as 640x480 16-bit monochrome images in PNG format.
  • Color and depth images have been pre-registered using PrimeSense's OpenNI driver, where the pixels in the color and depth images correspond to 1:1.
  • The depth image is scaled by a factor of 5000, that is, the pixel value of 5000 in the depth image corresponds to a distance of 1 meter from the camera, 10,000 to 2 meters, and so on. A pixel value of 0 indicates a missing value/no data.

Has been processed

Provide the real track as a text file containing the camera's translation and direction in a fixed coordinate system

  • Each line in the text file contains a posture.
  • The format of each line is'timestamp tx ty tz qx qy qz qw'
  • The time stamp (floating point number) gives the number of seconds since the Unix era.
  • TX ty TZ (three floating point numbers) gives the position of the optical center of a color camera relative to the world origin defined by the motion capture system.
  • qx qy qz qw (four floating point numbers) gives the direction of the optical center of a color camera relative to the world origin defined by the motion capture system in the form of unit quaternions.
  • The file may contain comments that must begin with'#'.

The conversion from 2D images to 3D point clouds is described below. Note that the focal length (fx/fy), optical center (cx/cy), distortion parameter (d0-d4), and depth correction factor are different for each camera.

The following Python code shows how to calculate 3D points based on pixel coordinates and depth values:

Notes in Chinese


Documentation for Intelrealsense

Download playback tools here

So big

My R200 plugged in did not respond and bang could not read it

ROS Package Error

But it recognizes its own camera, so RealSense's camera is also UVC-powered

Beyond melancholy, find a link to learn mathematics:

No cause-and-effect learning, of course

Inside this address are some PY scripts for handling RGBD.

Kinect provides color and depth images in an asynchronous manner. This means that the time stamp set from the color image does not intersect with the time stamp from the depth image. Therefore, we need some way to associate color images with depth images.

import argparse
import sys
import os
import numpy

def read_file_list(filename):
    Reads a trajectory from a text file. 

    File format:
    The file format is "stamp d1 d2 d3 ...", where stamp denotes the time stamp (to be matched)
    and "d1 d2 d3.." is arbitary data (e.g., a 3D position and 3D orientation) associated to this timestamp. 

    filename -- File name

    dict -- dictionary of (stamp,data) tuples

    file = open(filename)
    data =
    lines = data.replace(",", " ").replace("\t", " ").split("\n")
    list = [[v.strip() for v in line.split(" ") if v.strip() != ""]
            for line in lines if len(line) > 0 and line[0] != "#"]
    list = [(float(l[0]), l[1:]) for l in list if len(l) > 1]
    return dict(list)

def associate(first_list, second_list, offset, max_difference):
    Associate two dictionaries of (stamp,data). As the time stamps never match exactly, we aim 
    to find the closest match for every input tuple.

    first_list -- first dictionary of (stamp,data) tuples
    second_list -- second dictionary of (stamp,data) tuples
    offset -- time offset between both dictionaries (e.g., to model the delay between the sensors)
    max_difference -- search radius for candidate generation

    matches -- list of matched tuples ((stamp1,data1),(stamp2,data2))

    first_keys = first_list.keys()
    second_keys = second_list.keys()
    potential_matches = [(abs(a - (b + offset)), a, b)
                         for a in first_keys
                         for b in second_keys
                         if abs(a - (b + offset)) < max_difference]
    matches = []
    for diff, a, b in potential_matches:
        if a in first_keys and b in second_keys:
            matches.append((a, b))

    return matches

if __name__ == '__main__':

    # parse command line
    parser = argparse.ArgumentParser(description='''
    This script takes two data files with timestamps and associates them   
        'first_file', help='first text file (format: timestamp data)')
        'second_file', help='second text file (format: timestamp data)')
        '--first_only', help='only output associated lines from first file', action='store_true')
        '--offset', help='time offset added to the timestamps of the second file (default: 0.0)', default=0.0)
        '--max_difference', help='maximally allowed time difference for matching entries (default: 0.02)', default=0.02)
    args = parser.parse_args()

    first_list = read_file_list(args.first_file)
    second_list = read_file_list(args.second_file)

    matches = associate(first_list, second_list, float(
        args.offset), float(args.max_difference))

    if args.first_only:
        for a, b in matches:
            print("%f %s" % (a, " ".join(first_list[a])))
        for a, b in matches:
            print("%f %s %f %s" % (a, " ".join(
                first_list[a]), b-float(args.offset), " ".join(second_list[b])))

Registration Code

The depth image is already registered with the color image, so the pixels in the depth image correspond one to one with the pixels in the color image. Therefore, generating color point clouds is easy.

How to use scripts

Just one function

All of the above can be found from:


These are the documents:

What's more, after the RGB and depth images are captured, one step of fusion is like assembling and writing more information into one image.

Then the merged results are evaluated according to the rules, but not here. Then you can do what the high-level point clouds show.

Anyway, I'm doing some research. Here's how the interpolation algorithm works to generate point clouds. Note that an rgb and a depth map generate a point cloud.

Format above: png+png=PLY format

Last of all, I'll supplement the format requirements

Prepare for the experiment before you do it

Because RGB and Depth are one-to-one correspondences, the resolution is the same. Depth Image = Normal RGB 3-channel color image + Depth Map.

An image channel that contains information about the distance from the surface of the object in a view point scene. The channel itself is similar to a grayscale image, and each pixel value is the actual distance from the object measured by the sensor.


  • Proportional to camera distance: Closer surfaces are darker; Other surfaces are lighter
  • Depending on the distance of the nominal plane: the surface near the focal plane is darker; Lighter surface away from focal plane

Actually, the picture comes from a Book

depth map is obtained with depth camera, which corresponds to two channels of black and white image, gray value and depth value, and four channels of RGBD for color image.

Image depth refers to the number of bits used to store each pixel and is also used to measure the color resolution of an image.

Image depth determines the possible number of colors per pixel of a color image, or the possible number of gray levels per pixel of a gray-scale image. It determines the maximum number of colors that can appear in a color image or the maximum gray level in a gray image. For example, if a monochrome image has 8 bits per pixel, the maximum number of gray scales is 2 to the eighth power, or 256. A color image with RGB three channels has 4,4,2 pixel bits. The maximum number of colors is 2 to 4+4+2 power, or 1024, which means the depth of the pixel is 10 bits, and each pixel can be one of 1024 colors.

For example:

_A picture with a size of 1024*768 and a depth of 16 has a data volume of 1.5M.

_is calculated as follows:

  1024×768×16bit = 
  (1024×768×16)/8 Byte = 
  [(1024×768×16)/8]/1024 KB = 
  1536 KB = 
  {[(1024×768×16)/8]/1024}/1024 MB 
  = 1.5 MB

A function is done

"""""""is a Python documentation describing inputs and outputs

Here's how to open using Image

Method of PIL Library

This can be seen as preparation for processing. Make sure the resolution is consistent. I'll write a script to verify.

Change name here first

Write a script to verify

All the same

Then use a list to put our point cloud data

focalLength = 525.0
centerX = 319.5
centerY = 239.5
scalingFactor = 5000.0

4 parameters

  • The depth image is scaled by a factor of 5000, that is, the pixel value of 5000 in the depth image corresponds to a distance of 1 meter from the camera, 10,000 to 2 meters, and so on.
  • A pixel value of 0 indicates a missing value/no data.
  • 16bit png is 5000

This sentence is our essence

A point's pixel coordinate (u, v) and its corresponding depth value depth can be calculated from the camera parameters and the following formulas (x, y, z) to be

def depth2mi(depthValue):

    return depthValue * 0.001

def depth2xyz(u, v, depthValue):
    fx = 361.1027
    fy = 361.8266
    cx = 258.4545
    cy = 212.1282

    depth = depth2mi(depthValue)

    z = float(depth)
    x = float((u - cx) * z) / fx
    y = float((v - cy) * z) / fy

    result = [x, y, z]
    return result
#Depth map stored in np array
arr = np.array(depth)
#Depth value of a point's pixel coordinate (u, v) on the aligned depth map on a color map
depthValue =float(arr3[v, u])
coordinate = depth2xyz(u, v, depthValue)

The above is another implementation.

Posted on Wed, 01 Dec 2021 18:25:39 -0500 by leung