Building an AMD Deep Learning Machine: Part 2

Software Stack

Operating System

I used Ubuntu 19.04 partially because I wanted to try out the April release of Ubuntu and I knew that the newer kernels were more compatible with Vega (the amdgpu driver is merged into the kernel after 4.19 which reduces installation headaches) and the Ryzen CPU.

A note about docker

If you are not a fan of docker, for security or whatever reason, I don't advise that you use Ubuntu 19.04. This release has only python 3.7 and there are, at the moment, a few issues with running rocm 2.3 with python 3.7. This doesn't seem to be a problem on python 3.5 or 3.6 and with older versions of ROCm. However, the performance improvement with the newer version of ROCm is pretty substantial so I would use a version of Ubuntu where you can downgrade your python version instead.

Initial Software Stack

First, the debian repository has to be added

wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list

Then, the appropriate packages are installed.

sudo apt update
sudo apt install rocm-libs miopen-hip cxlactivitylogger
sudo apt install rocm-dev

Because we are using the amdgpu drivers in kernel 5.0 that ships with Ubuntu 19.04, we need to add the following udev rule.

echo 'SUBSYSTEM=="kfd", KERNEL=="kfd", TAG+="uaccess", GROUP="video"' | sudo tee /etc/udev/rules.d/70-kfd.rules

Docker install

I added the following line into my ~/.bash_rc file to allow for quick launching of the container:

alias drun='sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $HOME/dockerx:/dockerx'

To launch the container, you then simply run drun rocm/tensorflow to drop into your container. The first time you run this, it will pull the images from dockerhub. After that, it will use the cached image.