ROCm was originally meant to be the acronym for Radeon Open Compute platform. It's a software stack for GPU programming. Sometimes, GPU's are available along with the CPU on the same chip, and are called APU's (Accelerated Processing Unit). Sadly, AMD does not seem to be providing support for lower end GPU's and APU's now. So one is stuck with the older versions.
Installation
The older versions are available here. Download the appropriate deb file and run:
sudo dpkg -i amdgpu-install_5.4.50403-1_all.deb
sudo amdgpu-install --list-usecase
sudo amdgpu-install --usecase=hiplibsdk,rocm,dkms,graphics
sudo reboot
rocm-smi
sudo rocminfo
You have to run this after installing:
sudo usermod -a -G video <username>
sudo usermod -a -G render <username>
You can use radeontop to view the GPU utilization.
sudo apt-get install -y radeontop
radeontop -c
Check the RoCm version with:
apt show rocm-libs -a
The Maintainers are: ROCm Libs Support <rocm-libs.support@amd.com>.
Only after installing ROCm should PyTorch be installed.
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/rocm5.4.2
Test it
It's named CUDA here, but is basically checking if the program is able to detect the APU/GPU via ROCm.
import torch.cuda
print(f'CUDA available? : {torch.cuda.is_available()}')
If successful, the output should look like this:
CUDA available? : True
Also try this program:
import torch, grp, pwd, os, subprocess
devices = []
try:
print("\n\nChecking ROCM support...")
result = subprocess.run(['rocminfo'], stdout=subprocess.PIPE)
cmd_str = result.stdout.decode('utf-8')
cmd_split = cmd_str.split('Agent ')
for part in cmd_split:
item_single = part[0:1]
item_double = part[0:2]
if item_single.isnumeric() or item_double.isnumeric():
new_split = cmd_str.split('Agent '+item_double)
device = new_split[1].split('Marketing Name:')[0].replace(' Name: ', '').replace('\n','').replace(' ','').split('Uuid:')[0].split('*******')[1]
devices.append(device)
if len(devices) > 0:
print('GOOD: ROCM devices found: ', len(devices))
else:
print('BAD: No ROCM devices found.')
print("Checking PyTorch...")
x = torch.rand(5, 3)
has_torch = False
len_x = len(x)
if len_x == 5:
has_torch = True
for i in x:
if len(i) == 3:
has_torch = True
else:
has_torch = False
if has_torch:
print('GOOD: PyTorch is working fine.')
else:
print('BAD: PyTorch is NOT working.')
print("Checking user groups...")
user = os.getlogin()
groups = [g.gr_name for g in grp.getgrall() if user in g.gr_mem]
gid = pwd.getpwnam(user).pw_gid
groups.append(grp.getgrgid(gid).gr_name)
if 'render' in groups and 'video' in groups:
print('GOOD: The user', user, 'is in RENDER and VIDEO groups.')
else:
print('BAD: The user', user, 'is NOT in RENDER and VIDEO groups. This is necessary in order to PyTorch use HIP resources')
if torch.cuda.is_available():
print("GOOD: PyTorch ROCM support found.")
t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')
print('Testing PyTorch ROCM support...')
if str(t) == "tensor([5, 5, 5], device='cuda:0')":
print('Everything fine! You can run PyTorch code inside of: ')
for device in devices:
print('---> ', device)
else:
print("BAD: PyTorch ROCM support NOT found.")
except:
print('Cannot find rocminfo command information. Unable to determine if AMDGPU drivers with ROCM support were installed.')
The output should look like this:
Checking ROCM support...
GOOD: ROCM devices found: 2
Checking PyTorch...
GOOD: PyTorch is working fine.
Checking user groups...
GOOD: The user nav is in RENDER and VIDEO groups.
GOOD: PyTorch ROCM support found.
Testing PyTorch ROCM support...
Everything fine! You can run PyTorch code inside of:
---> AMD Ryzen 5 5600G with Radeon Graphics
---> gfx90c
That's it. All the best with your GPU programming. Also take a look at the Mojo language for GPU programming.

No comments:
Post a Comment