Skip to content

Zero-dependency implementation of BitNet neural network training and BPE tokenization in C

Notifications You must be signed in to change notification settings

kevin-pek/bitnet.c

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WIP: bitnet.c – a 0 dependency BitNet implementation in C

This is my attempt to implement neural network training and inference with the BitLinear layer from the BitNet paper from scratch in C for learning purposes. The long term goal is to work towards an implementation of a smaller version of the LLaMA architecture. This repo also implements inference for a BPE tokenizer trained with the tiktoken library.

To keep things concise, the source files for layers, data structures and other utilities are implemented as single header libraries.

Usage

Training

The train program initializes a new model and trains it on the dataset specified. For example,

gcc mnist_train.c -o train_mnist -lm
./train_mnist

Project Structure

├── experiments/    # miscellaneous programs used to test ideas
├── layers/         # source files for layers of the LLM
├── utils/          # utility functions (data structures, matrix functions, dataloaders, etc.)
├── tests/          # unit tests for various libraries and functions
├── tokenizer.h     # single header library for inference on BPE tokenizer
└── mnist_bitmlp.c  # train and test bit multi layer perceptron on MNIST dataset

Some conventions

Function names for layers contain suffix corresponding to their forward and backward pass.

  • _fwd – forward pass
  • _bkwd – backpropagation

Gradient variables are prefixed with d eg. gradient of output of a layer is dy. Additionally, quantised variables contain a q suffix eg. quantised activations will be xq.

Roadmap

  • BitLinear implementation
    • RMSNorm layer
    • BitLinear layer
      • Bit matrix multiplications
      • GELU activation
      • Weight and activation quantisation/dequantisation functions
    • BitLinear MLP Block
    • Cross entropy loss implementation
    • Training weight initialisation and allocation
    • AdamW optimiser implementation
    • Training loop on MNIST dataset for BitMLP
    • Train a multilayer perceptron classifier for the MNIST dataset
    • Parallelize code using OpenMP
  • Tokenizer implementation
    • Loading tokenizer from file
    • Base64 decoding
    • Hashtable implementation
    • PriorityQueue implementation
    • Encode text to input ids using tokenizer
    • Decode input ids to text using tokenizer
    • Verify correctness of tokenizer implementation on sample corpus
  • BitNet transformer implementation
    • Token embedding layer
    • Grouped query attention block
    • Forward and backward pass for BitNet architecture
    • Dataloader implementation
    • Saving and loading model weights
    • Training loop implementation

About

Zero-dependency implementation of BitNet neural network training and BPE tokenization in C

Topics

Resources

Stars

Watchers

Forks

Languages