tinygrad

tinygrad the software is being developed by tinygrad corp which is developing the upstream tinybox GPU computer.

git clone --recursive https://github.com/tinygrad/tinygrad
cd tinygrad/
python3 -m venv venv
source venv/bin/activate
pip install -U setuptools pip wheel
# Tinygrad from git repo
pip install -e .
# or from pypi
# pip install tinygrad

# To run the various examples and benchmarks:
# Torch's ROCm version of Torch:
# pip install --pre torch torchaudio torchvision --index-url https://download.pytorch.org/whl/nightly/rocm6.0
# Or AMD ROCm version of Torch:
pip install torch==2.1.1 torchvision==0.16.1 -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.0

pip install librosa nltk phonemizer protobuf pyyaml \
  sentencepiece tiktoken unidecode gymnasium pytest hypothesis \
  pillow opencv-python tensorflow ultralytics onnx pygame ctypeslib2 \
  tf2onnx lm_eval onnxruntime pydot tensorflow_addons
# If portaudio.h is available
pip install pyaudio 

Then run examples such as python examples/coder.py.

See the Output section of this documentation for example tinygrad output.

Note, installing via pip install -e . doesn’t pickup the runtime dir (?). This will cause errors such as tinygrad.runtime.ops_gpu import errors.

llama

Running tinygrad llama.py using the Phind CodeLlama 34B model.

python examples/llama.py --prompt "Write hello world in Python." \
  --gen code --size 34B-Instruct --shard 5 --model \
  ~/ml/huggingface/phind/Phind-CodeLlama-34B-v2/pytorch_model.bin.index.json

When using --shard 5 this gives an error in device.split. It does start loading the model across all the GPUs though, so it half starts…

Running without sharding it gives a HIP out of memory error, since it only runs on one GPU.

mixtral

MOE.

python examples/python mixtral.py \
  --count 30 \
  --temperature 0.7 \
  --timing False \
  --profile False \
  --weights ~/ml/huggingface/mistralai/Mixtral-8x7B-Instruct-v0.1-combined