I recently read a paper on 2D and 3D post fusion based on KITTI, clocks: camera lidar object candidates fusion for 3D object detection. The author opened his own source on Github code So I wanted to reproduce the results. As a result, I spent a few days building the environment and encountered many bug s. I just wanted to record it to avoid detours when others encounter the same problems, and it is also convenient for me to have a reference when I rebuild in the future.
One of the biggest problems is the spconv library. Other libraries are small, and only this is the real pit. And I happened to encounter the most unlucky situation, taking more detours than normal people (the reasons will be introduced later).
At first I just saw the spconv library updated to 2.x It was announced that the speed was 50% faster than 1.x and there was no need to compile it myself, so I immediately installed it and prepared to use it in clocks. It was found that spconv 2.x removed all spconv.util modules, and many spconv.util modules were used in clocks. I was too lazy to change them one by one and gave up. In fact, spconv library still has many articles for KITTI data set detection. I think of the one I reproduced before OpenPCDet The framework was also used, and I successfully reproduced it, so I switched to the built environment. The following bug s are encountered when running the results:
Traceback (most recent call last): File "./pytorch/train.py", line 918, in <module> fire.Fire() File "/home/feng/anaconda3/envs/OpenPCDet/lib/python3.6/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/feng/anaconda3/envs/OpenPCDet/lib/python3.6/site-packages/fire/core.py", line 471, in _Fire target=component.__name__) File "/home/feng/anaconda3/envs/OpenPCDet/lib/python3.6/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "./pytorch/train.py", line 656, in evaluate for example in iter(eval_dataloader): File "/home/feng/anaconda3/envs/OpenPCDet/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 346, in __next__ data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/feng/anaconda3/envs/OpenPCDet/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/feng/anaconda3/envs/OpenPCDet/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/feng/CLOCs/second/pytorch/builder/input_reader_builder.py", line 18, in __getitem__ return self._dataset[idx] File "/home/feng/CLOCs/second/data/dataset.py", line 70, in __getitem__ prep_func=self._prep_func) File "/home/feng/CLOCs/second/data/preprocess.py", line 363, in _read_and_prep_v9 example = prep_func(input_dict=input_dict) File "/home/feng/CLOCs/second/data/preprocess.py", line 225, in prep_pointcloud points, max_voxels) File "/home/feng/anaconda3/envs/OpenPCDet/lib/python3.6/site-packages/spconv/utils/__init__.py", line 173, in generate or self._max_voxels, self._full_mean) File "/home/feng/anaconda3/envs/OpenPCDet/lib/python3.6/site-packages/spconv/utils/__init__.py", line 69, in points_to_voxel assert block_filtering is False AssertionError
Good guy, assert block_ What is filtering is false? I've never seen it. go SECOND-1.5 Let's see if someone has encountered this bug. The author of SECOND says that it is used spconv v1.0 (commit 8da6f96) I won't encounter this problem. Meanwhile, the author of clocks also wrote in readme that let me use spconv v1.0 (commit 8da6f96). For the first time, I saw that not only the specific version of the library is required, but also a commitment ID is required. I don't know the magic of spconv of this commitment ID. Here's another word. The open-source 3D module of clocks uses SECOND-1.5, so the environmental requirements of the two are basically the same. If you encounter any problems in building clocks, you can go to SECOND-1.5 to see if there are similar problems.
The next step is to compile spconv v1.0 (commit 8da6f96) let's introduce my environment first
System: Ubuntu 20.04 based on WSL2 (i.e. Linux subsystem under Windows Environment)
CUDA version: 11.4
cuDNN version: 8.2.4
However, WSL, Ubuntu 20.04 and CUDA11 have killed me. Normally, I won't encounter so many problems at the same time.
The installation of conda, CUDA and CUDNN will not be repeated. First, we create a new environment with conda
conda create -n CLOCs python=3.6 pytorch=1.1 torchvision
-n the following clocks is the name of your environment, which can be taken arbitrarily. The rest is the library installed by conda. The python version and the python version need to be specified because spconv v1.0 (commit 8da6f96) yes. conda will automatically help you find the torchvision matching this pytorch version, so you don't need to specify it manually (torchvision=0.3 is installed for me). If you use pip to install torch vision, you need to find a matching version. Otherwise, PIP will automatically install the latest torch vision. In order to install the latest torch vision, it will automatically uninstall your pytorch=1.1 and replace it with pytorch=1.10 matching the latest torch vision.
After installation, enter the following code to activate the environment. Note that clocks here is your name above.
conda activate CLOCs
After activating the environment, first test whether the cuda version of pytorch is installed. Input in sequence
python import torch torch.cuda.is_available() exit()
If True is output as above , Description pytorch can call cuda normally. If it is False, you may need to specify the cuda version during conda installation, such as
conda create -n CLOCs python=3.6 pytorch=1.1 torchvision cudatoolkit=9.2
After successfully installing pytorch version 1.1.0, we can start compiling spconv (stepping on the pit). First copy spconv and switch to the version with the specified commit id.
git clone https://github.com/traveller59/spconv.git --recursive cd spconv/ git checkout 8da6f96
But don't rush to compile. A third-party library is missing pybind11 , we need to clone ourselves and spconv v1.0 (commit 8da6f96) pybind11 corresponding to the commit id is specified. Although we don't know whether it will affect it, we'd better switch it. Then enter the above one by one
cd third_party/ git clone https://github.com/pybind/pybind11.git cd pybind11/ git checkout 085a294
Next, enter the following code in turn to return to the spconv root directory and start compiling
cd ../.. python setup.py bdist_wheel
If you are CUDA9.X, you should run directly without error. If you are CUDA10.X but report an error, you may need to run the following code, delete the current environment, run conda to reinstall it, and specify a lower version of CUDA Toolkit (this is successful on the Ubuntu system I borrowed).
conda remove -n CLOCs --all conda create -n CLOCs python=3.6 pytorch=1.1 torchvision cudatoolkit=9.2
If you are CUDA11, you can also install CUDA9.X or CUDA10.X. there are many online tutorials on multi version CUDA switching. You don't need to uninstall the current CUDA11.
But if you are the chosen son of Ubuntu 20.04 like me, congratulations. If you want to install a CUDA10.X or CUDA9.X, you need to reinstall a Ubuntu 16.04 system first, because the gcc version of Ubuntu 20.04 is too new, CUDA10 and CUDA9 cannot be recognized and cannot be successfully installed. Even the lowest version of gcc in Ubuntu 20.04 is not supported. I even tried to use the source of Ubuntu 16.04 to install the ultra-low version of gcc on Ubuntu 20.04 to compile CUDA9.X, but it also failed.
If you are also WSL2, I can tell you excitedly that it is impossible for you to reinstall Ubuntu 16.04. Microsoft officially announced that because the four-year long-term update of Ubuntu 16.04 is over, they also removed Ubuntu 16.04 from the app store (you really don't just want to be lazy and don't maintain???). But Lu Xun was right. "Only those who are forced to a desperate situation will go crazy to find a way" (Lu Xun said when I said this).
In the face of WSL2, which is impossible to install CUDA10 and CUDA9, I can only force myself to solve a large number of error s, but fortunately, it is not difficult. After all, I can solve this kind of vegetable chicken.
First of all, we can see that the first bug is
The CUDA compiler is not able to compile a simple test program. Joke, how can my CUDA11 brother compile even a simple test? Sure enough, he continued to look forward. Although there was no error, the real error was
I can't find the CUDA compiler. Since you can't find it, I'll tell you where it is. Add at line 48 of spconv/setup.py
'-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc',
Add as shown in the figure
It can also be seen from the path that the path he searched is / usr/bin/nvcc, but the actual nvcc path is / usr/local/cuda/bin/nvcc. I don't know whether it is caused by different CUDA paths of different versions. Rerun
python setup.py bdist_wheel
Sure enough, one wave did not level and another wave arose, but this time the problem is very similar. First, it can be seen from the first horizontal line that nvcc can be recognized normally, but it can be seen from the second horizontal line that cuDNN is not found. However, I have encountered this problem before when looking for the cuDNN version, because the new cuDNN version cancels cudnn.h and the display version is changed to cudnn_version.h displays the version.
It's easy to know the problem. Let's modify the wrong path (the third red line in the figure). Change line 137 cudnn.h to cudnn_version.h
Compile through, happy dance. It took several days to reinstall CUDA and check related bug s. Finally, it succeeded. Oh, yes, it's not over yet. I went to bed excited when I passed the first compilation. As a result, I found it unusable the next day. Finally, I checked for a long time and found that I forgot to install after compiling
cd ./dist python -m pip install spconv-1.0-cp36-cp36m-linux_x86_64.whl
spconv installation ends here. If you want to reproduce the clocks, just follow the author's readme, except that WSL2 users need to pay attention when installing numba.
If you encounter the following bug s when running clocks
Traceback (most recent call last): File "./pytorch/train.py", line 16, in <module> from second.builder import target_assigner_builder, voxel_builder File "/home/feng/CLOCs/second/builder/target_assigner_builder.py", line 3, in <module> from second.core.target_assigner import TargetAssigner File "/home/feng/CLOCs/second/core/target_assigner.py", line 1, in <module> from second.core import box_np_ops File "/home/feng/CLOCs/second/core/box_np_ops.py", line 5, in <module> from second.core.non_max_suppression.nms_gpu import rotate_iou_gpu_eval File "/home/feng/CLOCs/second/core/non_max_suppression/__init__.py", line 2, in <module> from second.core.non_max_suppression.nms_gpu import (nms_gpu, rotate_iou_gpu, File "/home/feng/CLOCs/second/core/non_max_suppression/nms_gpu.py", line 24, in <module> @cuda.jit('(int64, float32, float32[:, :], uint64[:])') File "/home/feng/anaconda3/envs/CLOCs/lib/python3.6/site-packages/numba/cuda/decorators.py", line 95, in kernel_jit return Dispatcher(func, [func_or_sig], targetoptions=targetoptions) File "/home/feng/anaconda3/envs/CLOCs/lib/python3.6/site-packages/numba/cuda/compiler.py", line 899, in __init__ self.compile(sigs[0]) File "/home/feng/anaconda3/envs/CLOCs/lib/python3.6/site-packages/numba/cuda/compiler.py", line 1102, in compile kernel.bind() File "/home/feng/anaconda3/envs/CLOCs/lib/python3.6/site-packages/numba/cuda/compiler.py", line 590, in bind self._func.get() File "/home/feng/anaconda3/envs/CLOCs/lib/python3.6/site-packages/numba/cuda/compiler.py", line 433, in get cuctx = get_context() File "/home/feng/anaconda3/envs/CLOCs/lib/python3.6/site-packages/numba/cuda/cudadrv/devices.py", line 212, in get_context return _runtime.get_or_create_context(devnum) File "/home/feng/anaconda3/envs/CLOCs/lib/python3.6/site-packages/numba/cuda/cudadrv/devices.py", line 138, in get_or_create_context return self._get_or_create_context_uncached(devnum) File "/home/feng/anaconda3/envs/CLOCs/lib/python3.6/site-packages/numba/cuda/cudadrv/devices.py", line 151, in _get_or_create_context_uncached with driver.get_active_context() as ac: File "/home/feng/anaconda3/envs/CLOCs/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 393, in __enter__ driver.cuCtxGetCurrent(byref(hctx)) File "/home/feng/anaconda3/envs/CLOCs/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 280, in __getattr__ self.initialize() File "/home/feng/anaconda3/envs/CLOCs/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 240, in initialize raise CudaSupportError("Error at driver init: \n%s:" % e) numba.cuda.cudadrv.error.CudaSupportError: Error at driver init: [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE:
You need to add a sentence to ~ /. bashrc
export NUMBA_CUDA_DRIVER=/usr/lib/wsl/lib/libcuda.so.1
It took me a long time to find this in the conversation between the two bosses. It is only useful for wsl. The boss means that it seems that the numba author is looking for the default CUDA driver, and the wsl driver is installed in another place because it is installed in Windows.