How to make Tensorflow work on RTX 20XX series GPUs

How to get Tensorflow to work on RTX 2080, RTX 2080TI, and RTX 2070 GPUs by compiling it from source.

You've just gotten your hands on one of the new RTX cards, and you want to run your neural networks, and marvel at how much faster the new card is compared to the old one.

But when you run your script, Tensorflow spits out a nasty error saying it can't communicate with your card. Or if you've managed to install CUDA 9.0, maybe it segfaults even though you're 100% sure the script works on the old card.

segfault

Then you begin to scour the developer forums for a solution. Eventually you find that:

  1. RTX GPUs don't support CUDA 9.0; only CUDA 10.0
  2. Tensorflow CUDA 10.0 support only comes with the next stable release in January 2019

What now then? You think. It's a catch 22. Maybe you could install the nightly builds, but they tend to be highly unstable, or not work at all (which was the case when I tried it).

Well, not quite.

While CUDA 10.0 is not supported by Tensorflow in version r1.12, Tensorflow r1.12 does work with CUDA 10.0, but you need to compile it from source.

Compiling Tensorflow from source

In order to install Tensorflow with CUDA 10.0 support, we need to

  1. Install CUDA and cuDNN
  2. Configure Tensorflow and compile it
  3. Install our custom built .whl package with pip

1. Installing CUDA and cuDNN

Nvidia has a quite detailed, but also rather dense guide for installing CUDA and cuDNN.

To make it a bit easier, I'll go through the process for Ubuntu 16.04.

First, make sure that the graphics card is properly installed and recognized by the system by running:

lspci | grep "NVIDIA"

You should see something like

01:00.0 VGA compatible controller: NVIDIA Corporation Device 1e87 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 10f8 (rev a1)
01:00.2 USB controller: NVIDIA Corporation Device 1ad8 (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad9 (rev a1)

Next, we need to make sure that you have the latest drivers installed. Go to Nvidia's driver search page and select your graphics card and operating system, and download the latest driver.

At the time of writing, the latest Linux driver is 415.25 which you can download from here.

Once the driver has been downloaded, you can install it by running:

sudo sh NVIDIA-Linux-x86_64-415.25.run

and following the prompts.

You can verify that the driver has been correctly installed by running:

nvidia-smi

which should look something like this.

nvidia-smi

Note: you may have to reboot the computer before it works.

Once the drivers have been installed, it's time to install CUDA 10.0.

For this, you need to download the CUDA runfile from NVIDIA's developer portal. Once you've done that, you can install it by running:

sudo sh cuda_10.0.130_410.48_linux.run

and following the command prompts.

Make sure to not install the drivers suggested by the CUDA installer.

Once it has finished installing, we need to make sure that CUDA is in your environment variables by executing:

export PATH=$PATH:/usr/local/cuda-10.0/bin
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64

echo 'PATH=$PATH:/usr/local/cuda-10.0/bin' >> ~/.bash_profile
echo 'LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64' >> ~/.bash_profile

You can verify that it works by running:

nvcc -V

If it's working properly, you should see something like

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

Now, we need to install cuDNN 7.3.

For this step, you need to create a free NVIDIA developer account.

Once you've done that, you need to download the Runtime and Developer library for Ubuntu 16.04 from the download page. Select:

cuDNN Runtime Library for Ubuntu16.04 (Deb)

cuDNN Developer Library for Ubuntu16.04 (Deb)

Once you've downloaded those, you can install the .deb files by using dpkg.

sudo dpkg -i libcudnn7_7.3.0.29–1+cuda10.0_amd64.deb
sudo dpkg -i libcudnn7-dev_7.3.0.29–1+cuda10.0_amd64.deb

To check you've installed it correctly, you can run the following script written by Sherlock on Stackoverflow.

function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep $1; }
function check() { lib_installed $1 && echo "$1 is installed" || echo "ERROR: $1 is NOT installed"; }
check libcudnn 

2. Installing Bazel

This can easily be done by running:

curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
sudo apt-get install bazel

3. Building Tensorflow

We will now build our custom Tensorflow distribution from source. If you're using another operating system, you can refer to the Tensorflow guide.

Clone the Tensorflow Github repository:

git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
git checkout r1.11

Next, we need to configure our Tensorflow build tool by running:

./configure

I selected:

XLA JIT support: Yes
CUDA support: Yes
CUDA SDK version: 10.0
NCCL version: 1.3

And either no when possible to the rest, otherwise the default option.

Finally, in order to build Tensorflow, run:

./bazel-bin/tensorflow/tools/pip_package/build_pip_package ./tensorflow_pkg

This will likely take quite some time, so grab some popcorn, and watch a movie. (if somebody asks, tell them you're working, and that Kasper says it's a very important step in installing Tensorflow)

When the movie is finished, and Bazel is finished building Tensorflow, you can run the following in your virtual environment to install Tensorflow with CUDA support.

pip install ./tensorflow_pkg/tensorflow-1.11.0-cp36-cp36m-linux_x86_64.whl

And you're finished.

Now, all there's left is to enjoy your fresh Tensorflow installation.

You can verify everything works by running the following script.

from tensorflow.python.client import device_lib

def get_devices():
    return [x.name for x in device_lib.list_local_devices()]

print (get_devices())

And you can learn more about Tensorflow by reading the Introduction to Tensorflow essay.