Building an AMD Deep Learning Machine: Part 3

Benchmarking

The biggest question is how does this perform? Yeah the stack is open source, it uses a driver already integrated into the kernel and is able to run tensorflow, pytorch and caffe but how well does it do that?

Some results are provided from lambda labs for comparsion with the vega 56.

Model / GPU Vega 56 1080 Ti
ResNet-50 145.19 203.99
Inception v3 67.08 130.2
VGG16 80.57 133.16

The Vega GPU is quite about half as fast as the 1080ti on the worst performing model (Inception v3). The result for ResNet-50 is where the gap is the closest and that result was actually achieved by turning on ROCm Fusion. This fusion operation seems to alter the computation graph to combine multiple operations into a single convolution where possible.

To enable this, run export TF_ROCM_FUSION_ENABLE=1 inside the docker container before starting a tensorflow workload. Perhaps the other models would have been closer in performance to the 1080ti with this setting. Unfortunately, I was not able to perform very rigorous testing as I was building this machine for someone else. I would like to try out ROCm Fusion as well as [undervolting and overclocking the card])https://github.com/RadeonOpenCompute/ROCm/issues/463). Undervolting should reduce the amount of heat and fan noise, allowing the card to maintain higher boost frequencies.

Conclusion

While this build wasn't for me, I would certainly build this myself if I had an extra thousand dollars to use on this. After tax and everything the entire build was 996.17. However, the value for performance with the Vega GPU is actually pretty decent. I got the Vega 56 for $320 after tax. Considering that the cursory benchmark results obtained showed the Vega getting anywhere from 50-75% of the performance of a 1080ti at under a third of the price of a 1080ti (most new 1080ti's I see are around $850 at the moment).

In the future, it would be better to compare the cost/performance of the Vega to a lower tier Nvidia GPU like a 1070.