tinygrad
tinygrad the software is being developed by tinygrad corp which is developing the upstream tinybox GPU computer.
git clone --recursive https://github.com/tinygrad/tinygrad
cd tinygrad/
python3 -m venv venv
source venv/bin/activate
pip install -U setuptools pip wheel
# Tinygrad from git repo
pip install -e .
# or from pypi
# pip install tinygrad
# To run the various examples and benchmarks:
# Torch's ROCm version of Torch:
# pip install --pre torch torchaudio torchvision --index-url https://download.pytorch.org/whl/nightly/rocm6.0
# Or AMD ROCm version of Torch:
pip install torch==2.1.1 torchvision==0.16.1 -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.0
pip install librosa nltk phonemizer protobuf pyyaml \
sentencepiece tiktoken unidecode gymnasium pytest hypothesis \
pillow opencv-python tensorflow ultralytics onnx pygame ctypeslib2 \
tf2onnx lm_eval onnxruntime pydot tensorflow_addons
# If portaudio.h is available
pip install pyaudio
Then run examples such as python examples/coder.py
.
See the Output section of this documentation for example tinygrad output.
Note, installing via pip install -e .
doesn’t pickup the runtime
dir (?).
This will cause errors such as tinygrad.runtime.ops_gpu
import errors.
llama
Running tinygrad llama.py
using the Phind CodeLlama 34B model.
python examples/llama.py --prompt "Write hello world in Python." \
--gen code --size 34B-Instruct --shard 5 --model \
~/ml/huggingface/phind/Phind-CodeLlama-34B-v2/pytorch_model.bin.index.json
When using --shard 5
this gives an error in device.split
.
It does start loading the model across all the GPUs though,
so it half starts…
Running without sharding it gives a HIP out of memory error, since it only runs on one GPU.
mixtral
MOE.
python examples/python mixtral.py \
--count 30 \
--temperature 0.7 \
--timing False \
--profile False \
--weights ~/ml/huggingface/mistralai/Mixtral-8x7B-Instruct-v0.1-combined