Ubuntu

Ubuntu is a GNU/Linux distribution downstream from Debian with proprietary bits added. It is used in many Top500 clusters.

It is used by tinygrad on the tinybox.

Documentation

Ubuntu server docs.

Install

Get Ubuntu and install.

The upstream tinybox runs Ubuntu 22.04 LTS. Run that, perhaps.

Write to USB drive, make sure device is correct…

sudo dd if=ubuntu-22.04.3-live-server-amd64.iso of=/dev/sdXX bs=16M status=progress oflag=sync

Configuration

Setup, perhaps as so:

  • ssh keys.

Packages

Update and install new packages from Ubuntu repos.

# Use IPv4 for apt
echo 'Acquire::ForceIPv4 "true";' | sudo tee /etc/apt/apt.conf.d/99force-ipv4
# Set up apt-cache
echo 'Acquire::http::Proxy "http://192.168.1.1:3142";' | sudo tee /etc/apt/apt.conf.d/90cache
sudo sed -i -e 's/https:/http:/g' /etc/apt/sources.list.d/*.list

sudo apt update
sudo apt dist-upgrade
sudo apt install bc bison build-essential ccache cmake-curses-gui colordiff \
  cpufrequtils devscripts dpkg-dev equivs flex gfortran git haveged host \
  libbz2-dev libdrm-dev libedit-dev libegl1-mesa-dev libelf-dev libffi-dev \
  libhdf5-openmpi-dev liblzma-dev libncurses-dev libnuma-dev \
  libopenmpi-dev libpomp2-dev libsqlite3-dev libssl-dev libsystemd-dev \
  libudev-dev libxml2-dev libxml2-utils libz3-dev libzstd-dev lshw \
  lzma-dev mesa-common-dev net-tools ninja-build nlohmann-json3-dev \
  ntpsec-ntpdate nvme-cli ocl-icd-opencl-dev openmpi-bin pahole pkg-config \
  portaudio19-dev python3-argcomplete python3-pip python3-pygments \
  python3-venv python3-virtualenv python3-yaml quilt rsync rsyslog sshfs \
  sudo swig traceroute vim xxd python3-sphinx git-lfs hwdata \
  lua5.3 liblua5.3-dev libmpfr-dev libmsgpack-dev libfmt-dev \
  environment-modules python3-numpy pybind11-dev libopengl-dev zip zsh \
  hpcc gawk googletest libdw-dev libgtest-dev libsigsegv2 \
  libbabeltrace-dev libbabeltrace1 libbison-dev libncurses5-dev \
  libtext-unidecode-perl tex-common texinfo ucx-utils libucx-dev \
  librdmacm-dev libhdf4-dev libnetcdff-dev libnetcdf-c++4-dev \
  libnetcdf-dev libnetcdf-mpi-dev libnetcdf-pnetcdf-dev libpnetcdf-dev \
  netcdf-bin libadios-bin libadios-dev libadios-openmpi-dev csh

OS Configuration

Operating system configuration.

# Lazy sudo
sed -i -e 's/%sudo\tALL=(ALL:ALL) ALL/%sudo ALL=(ALL) NOPASSWD: ALL/g' /etc/sudoers
  • After all packages installed, add to groups: sudo adduser debian audio sudo adduser debian dialout sudo adduser debian kvm sudo adduser debian render sudo adduser debian video

    # Disable various startup packages systemctl disable XXX

User Configuration

Set up the user account. Configure to use various caching services already available in the cluster.

ccache

There is a redis ccache server on the tinyrocs network. Edit ~/.config/ccache/ccache.conf thusly:

remote_storage = redis://192.168.1.2
remote_only = true
reshare = true

PATH

Add the ROCm binary path and ccache (XXX) to ~/.bashrc:

PATH=/usr/lib/ccache:/opt/rocm/bin:$PATH

Python pip cache

Set up to use LAN pip cache pydev if available, by editing ~/.config/pip/pip.conf, such as:

[global]
trusted-host = 192.168.1.3
index-url = http://192.168.1.3:4040/root/pypi/+simple/

[search]
index = http://192.168.1.3:4040/root/pypi/

ROCm

ROCm for Ubuntu.

sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
wget https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb
sudo apt install ./amdgpu-install_6.0.60002-1_all.deb
sudo apt update
sudo apt install amdgpu-dkms
sudo apt install rocm-hip-libraries
# sudo reboot
sudo apt install rocm-hip-sdk rocm-ml-sdk rocm-opencl-sdk rocm-openmp-sdk \
  rocm-bandwidth-test rocm-clang-ocl amdgpu-dkms-headers rocm \
  llvm-amdgpu llvm-amdgpu-runtime rocm-dkms rocm-dev rocm-libs \
  rocm-khronos-cts rocm-ocltst rocm-validation-suite \
  smi-lib-amdgpu smi-lib-amdgpu-dev \
  libstdc++-12-dev python-is-python3 \
  vulkan-amdgpu libvulkan-dev libvulkan-volk-dev vulkan-tools \
  vulkan-validationlayers-dev glslang-dev glslang-tools

# sudo apt purge --autoremove libc6-dev-i386 libc6-dev-x32
sudo apt install gcc-multilib

Misc

More.

systemctl disable ModemManager.service nvmefc-boot-connections.service \
  nvmf-autoconnect.service open-iscsi.service ubuntu-advantage.service \
  ufw.service unattended-upgrades.service update-notifier-download.timer \
  update-notifier-motd.timer \
  apport-autoreport.path apport-autoreport.timer apport-forward.socket \
  apt-daily.timer apt-daily-upgrade.timer fwupd-refresh.timer \
  remote-fs.target iscsid.socket motd-news.timer \
  ua-reboot-cmds.service ua-timer.timer


sudo snap install nvtop

GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 selinux=0 apparmor=0"

lvresize --resizefs -L 500G /dev/ubuntu-vg/ubuntu-lv

XXX Disable sound card. XXX long time to wait for network to be configured … XXX