问题描述:
nvidia-smi也有显示,显卡驱动是在的,而且nvcc显示出来的cuda版本9.0也没错,不是9.1。不知道问题所在,索性重装全部。
sudo tee /proc/acpi/bbswitch <<
显示如下:
Tue May 28 22:21:07 2019 +-----------------------------------------------------------------------------+| NVIDIA-SMI 390.67 Driver Version: 390.67 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. ||===============================+======================+======================|| 0 GeForce GTX 950M Off | 00000000:01:00.0 Off | N/A || N/A 50C P0 N/A / N/A | 0MiB / 2004MiB | 0% Default |+-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+| Processes: GPU Memory || GPU PID Type Process name Usage ||=============================================================================|| No running processes found |+-----------------------------------------------------------------------------+
nvcc --version
显示如下:
nvcc: NVIDIA (R) Cuda compiler driverCopyright (c) 2005-2017 NVIDIA CorporationBuilt on Fri_Sep__1_21:08:03_CDT_2017Cuda compilation tools, release 9.0, V9.0.176
lspci | grep -i nvidia
显示如下:
01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce GTX 950M] (rev a2)
检查pytorch调用cuda是否正常:
python -c 'import torch; print(torch.cuda.is_available())'
显示如下:
False
卸载cuda
sudo /usr/local/cuda-9.0/bin/uninstall_cuda_9.0.pl #这里之后只剩下cudnn的东西,也可以完全删了。sudo rm -rf /usr/local/cuda-9.0/
卸载nvidia驱动及大黄蜂bunmblebee
sudo apt-get remove --purge nvidia-cuda-dev nvidia-cuda-toolkit nvidia-nsight nvidia-visual-profilersudo apt autoremove --purge bumblebee-nvidia nvidia-driver nvidia-settings
安装显卡驱动和大黄蜂bumblebee
sudo apt-get install nvidia-smisudo apt-get install bumblebee-nvidia nvidia-driver nvidia-settings
安装显卡驱动测试程序
sudo apt-get install mesa-utils
显示N卡相关信息:
optirun glxinfo|grep NVIDIA
运行测试程序
optirun glxgears -info
成功调用显卡驱动,信息如下:
GL_RENDERER = GeForce GTX 950M/PCIe/SSE2GL_VERSION = 4.6.0 NVIDIA 390.67GL_VENDOR = NVIDIA Corporation
安装cuda
下载runfilesudo ./cuda_9.0.176_384.81_linux.run
安装过程只有这个选no
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?(y)es/(n)o/(q)uit: n
下载安装cudnn
<>
登录下载对应版本我是选择了
cudnn-9.0-linux-x64-v7.5.0.56
这个版本的
把对应的额外的cudnn库放入cuda对应的位置:
sudo cp lib64/* /usr/local/cuda/lib64/sudo cp include/* /usr/local/cuda/include/
然后检查环境变量并开启默认N卡
# 检查LD_LIABRARY_PATH和PATHsudo vim ~/.bashrc# 用大黄蜂开启默认N卡sudo tee /proc/acpi/bbswitch<<
再次检查pytorch是否能调用cuda
python -c "import torch;print(torch.cuda.is_available())"
显示如下:
True
检查tensorflow是否正常调用gpu
python3 -c "import tensorflow as tf;print(tf.test.is_gpu_available());print(tf.test.gpu_device_name())"
显示如下:
2019-05-28 22:52:25.862539: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA2019-05-28 22:52:26.319239: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero2019-05-28 22:52:26.319674: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce GTX 950M major: 5 minor: 0 memoryClockRate(GHz): 1.124pciBusID: 0000:01:00.0totalMemory: 1.96GiB freeMemory: 1.92GiB2019-05-28 22:52:26.319696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
都正常了,没有比我这更复杂了吧,卸了重装,有卸载过程和安装过程。