Ubuntu
Ubuntu is a GNU/Linux distribution downstream from Debian with proprietary bits added. It is used in many Top500 clusters.
It is used by tinygrad on the tinybox.
Documentation
Ubuntu server docs.
Install
Get Ubuntu and install.
The upstream tinybox runs Ubuntu 22.04 LTS. Run that, perhaps.
Write to USB drive, make sure device is correct…
sudo dd if=ubuntu-22.04.3-live-server-amd64.iso of=/dev/sdXX bs=16M status=progress oflag=sync
Configuration
Setup, perhaps as so:
ssh keys.
Packages
Update and install new packages from Ubuntu repos.
# Use IPv4 for apt
echo 'Acquire::ForceIPv4 "true";' | sudo tee /etc/apt/apt.conf.d/99force-ipv4
# Set up apt-cache
echo 'Acquire::http::Proxy "http://192.168.1.1:3142";' | sudo tee /etc/apt/apt.conf.d/90cache
sudo sed -i -e 's/https:/http:/g' /etc/apt/sources.list.d/*.list
sudo apt update
sudo apt dist-upgrade
sudo apt install bc bison build-essential ccache cmake-curses-gui colordiff \
cpufrequtils devscripts dpkg-dev equivs flex gfortran git haveged host \
libbz2-dev libdrm-dev libedit-dev libegl1-mesa-dev libelf-dev libffi-dev \
libhdf5-openmpi-dev liblzma-dev libncurses-dev libnuma-dev \
libopenmpi-dev libpomp2-dev libsqlite3-dev libssl-dev libsystemd-dev \
libudev-dev libxml2-dev libxml2-utils libz3-dev libzstd-dev lshw \
lzma-dev mesa-common-dev net-tools ninja-build nlohmann-json3-dev \
ntpsec-ntpdate nvme-cli ocl-icd-opencl-dev openmpi-bin pahole pkg-config \
portaudio19-dev python3-argcomplete python3-pip python3-pygments \
python3-venv python3-virtualenv python3-yaml quilt rsync rsyslog sshfs \
sudo swig traceroute vim xxd python3-sphinx git-lfs hwdata \
lua5.3 liblua5.3-dev libmpfr-dev libmsgpack-dev libfmt-dev \
environment-modules python3-numpy pybind11-dev libopengl-dev zip zsh \
hpcc gawk googletest libdw-dev libgtest-dev libsigsegv2 \
libbabeltrace-dev libbabeltrace1 libbison-dev libncurses5-dev \
libtext-unidecode-perl tex-common texinfo ucx-utils libucx-dev \
librdmacm-dev libhdf4-dev libnetcdff-dev libnetcdf-c++4-dev \
libnetcdf-dev libnetcdf-mpi-dev libnetcdf-pnetcdf-dev libpnetcdf-dev \
netcdf-bin libadios-bin libadios-dev libadios-openmpi-dev csh
OS Configuration
Operating system configuration.
# Lazy sudo
sed -i -e 's/%sudo\tALL=(ALL:ALL) ALL/%sudo ALL=(ALL) NOPASSWD: ALL/g' /etc/sudoers
After all packages installed, add to groups: sudo adduser debian audio sudo adduser debian dialout sudo adduser debian kvm sudo adduser debian render sudo adduser debian video
# Disable various startup packages systemctl disable XXX
User Configuration
Set up the user account. Configure to use various caching services already available in the cluster.
ccache
There is a redis
ccache
server on the tinyrocs network.
Edit ~/.config/ccache/ccache.conf
thusly:
remote_storage = redis://192.168.1.2
remote_only = true
reshare = true
PATH
Add the ROCm binary path and ccache (XXX) to ~/.bashrc
:
PATH=/usr/lib/ccache:/opt/rocm/bin:$PATH
Python pip cache
Set up to use LAN pip
cache pydev
if available,
by editing ~/.config/pip/pip.conf
, such as:
[global]
trusted-host = 192.168.1.3
index-url = http://192.168.1.3:4040/root/pypi/+simple/
[search]
index = http://192.168.1.3:4040/root/pypi/
ROCm
ROCm for Ubuntu.
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
wget https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb
sudo apt install ./amdgpu-install_6.0.60002-1_all.deb
sudo apt update
sudo apt install amdgpu-dkms
sudo apt install rocm-hip-libraries
# sudo reboot
sudo apt install rocm-hip-sdk rocm-ml-sdk rocm-opencl-sdk rocm-openmp-sdk \
rocm-bandwidth-test rocm-clang-ocl amdgpu-dkms-headers rocm \
llvm-amdgpu llvm-amdgpu-runtime rocm-dkms rocm-dev rocm-libs \
rocm-khronos-cts rocm-ocltst rocm-validation-suite \
smi-lib-amdgpu smi-lib-amdgpu-dev \
libstdc++-12-dev python-is-python3 \
vulkan-amdgpu libvulkan-dev libvulkan-volk-dev vulkan-tools \
vulkan-validationlayers-dev glslang-dev glslang-tools
# sudo apt purge --autoremove libc6-dev-i386 libc6-dev-x32
sudo apt install gcc-multilib
Misc
More.
systemctl disable ModemManager.service nvmefc-boot-connections.service \
nvmf-autoconnect.service open-iscsi.service ubuntu-advantage.service \
ufw.service unattended-upgrades.service update-notifier-download.timer \
update-notifier-motd.timer \
apport-autoreport.path apport-autoreport.timer apport-forward.socket \
apt-daily.timer apt-daily-upgrade.timer fwupd-refresh.timer \
remote-fs.target iscsid.socket motd-news.timer \
ua-reboot-cmds.service ua-timer.timer
sudo snap install nvtop
GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 selinux=0 apparmor=0"
lvresize --resizefs -L 500G /dev/ubuntu-vg/ubuntu-lv
XXX Disable sound card. XXX long time to wait for network to be configured … XXX