ubuntu16.04 Uninstall nvidia driver and cuda and upgrade driver and cuda

Environmental Science

ubuntu16.04
GTX1650
Kernel: 4.15.0-142-generic
Old nvidia:430
Old version cuda:10.1

step

1. Uninstall the original nvidia and cuda10.0

Uninstall nvidia

sudo apt-get remove --purge nvidia*

Uninstall cuda

sudo apt-get remove cuda
sudo apt autoremove
sudo apt-get remove cuda*

Then switch the Terminal Run Directory to / usr/local/(this is the default installation path for cuda)

cd /usr/local/
dir#You should see a "cuda" or "cuda-xxx" folder, or both
sudo rm -r cuda-10.0

2. Install new nvidia and cuda

No extra nvidia drivers!
No extra nvidia drivers!
No extra nvidia drivers!
Install via the following command line, super simple!!!
go Official Web Select the version you want and follow the command

https://developer.nvidia.com/cuda-11.1.1-download-archive

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-ubuntu1604.pin
sudo mv cuda-ubuntu1604.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.1.1/local_installers/cuda-repo-ubuntu1604-11-1-local_11.1.1-455.32.00-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604-11-1-local_11.1.1-455.32.00-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1604-11-1-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

3. Problems encountered

1.nvcc-V does not display problems

sudo gedit ~/.bashrc
 Add the following:
export PATH=/usr/local/cuda-11.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64:$LD_LIBRARY_PATH

2. Failed to initialize NVIDIA NVML Driver/library version mismatch on nvidia-smi execution

Reason analysis: Kernel-driven version is inconsistent with system-driven
View graphics card driver version

cat /proc/driver/nvidia/version

The output version is nvidia-430 (that is, the NVIDIA driver from the previous version of the system)
[The blog was written after the environment was installed, there is no picture here."

See all the drivers in your computer right now

sudo dpkg --list | grep nvidia-*

Solution 1: Restart (not valid for me)
After restart, when the card enters the ubuntu system, ctrl+alt+F1 enters the command line interface, uninstall nvidia before re-entering the ubuntu system

sudo apt-get remove --purge nvidia*

Re-install nvidia455 driver after entering the system or mismatch error will be reported

Solution 2: Uninstall the original loaded driver (temporary, computer restart, or there will be a Driver/library version mismatch problem)

# Unload loaded nvidia drivers
sudo rmmod nvidia_drm
sudo rmmod nvidia_uvm
sudo rmmod nvidia_modeset
sudo rmmod nvidia
# Execute nvidia-smi again

If you uninstall a mod, you encounter an error, such as

implement

sudo lsof /dev/nvidia*


Load kill process

sudo kill 37667
 If more than one use nvidia Process, kill Followed by related PID Number

Solution 3: Uninstall the current version of the NVIDIA driver and switch to the version displayed using the cat/proc/driver/nvidia/version command (useless, because to upgrade the cuda, you must upgrade the NVIDIA driver. If there is no requirement for the graphics card driver version, you can choose this path)
Solution 4: Uninstall the nvidia driver and use the sudo ubuntu-drivers devices command to view computer-supported cuda

Then use the command ubuntu-drivers autoinstall to automatically install the graphics card driver
CONCLUSION: No use, there will still be the same error after loading
Solution 5: After executing the steps of Solution 2, add a sudo apt-get upgrade command (done!!!)
But be careful!!
If you encounter two install package versions in between, just follow the default (press N)

Configuration file 'xxxx'
 ==> File on system created by you or by a script.
 ==> File also in package provided by package maintainer.
   What would you like to do about it ?  Your options are:
    Y or I  : install the package maintainer's version
    N or O  : keep your currently-installed version
      D     : show the differences between the versions
      Z     : start a shell to examine the situation
 The default action is to keep your current version.

3. ata1.00 failed command: READ FPDMA QUEUD

The ubuntu system will output this error when entering the normal ubuntu and recovery mode l, and the time to enter a system will be changed from 10s to 30s.

ta2.00: status: { DRDY ERR }
ata2.00: error: { UNC }
ata2.00: failed command: READ FPDMA QUEUED
ata2.00: cmd 60/28:70:28:19:89/00:00:6c:01:00/40 tag 14 ncq 20480 in

At the end, press F2 to enter BIOS, change all BIOS settings to default (EXIT page has a RESTORE option), and you can enter ubuntu smoothly

Tags: Ubuntu

Posted on Mon, 04 Oct 2021 14:58:32 -0400 by jase35750