Install pytorch-rocm on bare metal opensuse Tumbleweed
These instructions are adopted from section 4 of this page.
pytorch commit 9d1138afec26a4fe0be74187e4064076f8d45de7 worked for some stuff but pieces of allennlp are incompatible because this is pytorch v1.7.0a0+9d1138a.
I tried with 1.5 and 1.5.1 several times but it wasn't working on opensuse. I could have sworn it worked on ubuntu though.
Add rocm repos
Using the files provided by AMD for SLES SP1 per the official instructions.
sudo zypper install dkms
sudo zypper clean
sudo zypper addrepo --no-gpgcheck http://repo.radeon.com/rocm/zyp/zypper/ rocm
sudo zypper ref
sudo zypper install rocm-dkms
sudo reboot
Modify /etc/modprobe.d/10-unsupported-modules.conf
to have
allow_unsupported_modules 1
Then run the following, though it's probably not strictly necessary on tumbleweed
sudo modprobe amdgpu
Add your user to the video group
usermod -a -G video <username>
Verify everything is working by examining the output of rocminfo
and make sure your gpu is listed.
Create a virtual environment
virtualenv -p python3 ~/venvs/torch
source ~/venvs/torch/bin/activate
Install pytorch prerequisites
sudo zypper in glog-devel python3-pip libopenblas-devel libprotobuf-devel libnuma-devel libpthread-stubs0-devel libopencv-devel git gcc cmake make lmdb-devel libleveldb1 snappy-devel hiredis-devel
sudo zypper in rocm-dev rocm-libs miopen-hip hipsparse rocthrust hipcub rccl roctracer-dev
Fix issues with cmake files for rocm
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocsparse/lib/cmake/rocsparse/rocsparse-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocfft/lib/cmake/rocfft/rocfft-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/miopen/lib/cmake/miopen/miopen-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rccl/lib/cmake/rccl/rccl-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/hipsparse/lib/cmake/hipsparse/hipsparse-config.cmake
Clone the repo
git clone https://github.com/pytorch/pytorch.git
cd pytorch
git checkout v1.5.0
git submodule update --init --recursive
Build
This process will take a while (2-3 hours)
export RCCL_DIR="/opt/rocm/rccl/lib/cmake"
python tools/amd_build/build_amd.py
USE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 MAX_JOBS=4 python setup.py install
Install allennlp
pip install allennlp
pip uninstall torch
# rebuild torch (very fast this time)
USE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 MAX_JOBS=4 python setup.py install
Verify installation
Make sure you get a non-zero value (should correspond to the number of GPUs.
python
>>import torch
>>torch.cuda.device_count()
No Comments