Skip to content

Long term project about a custom AI architecture. Consist of cutting-edge technique in machine learning such as Flash-Attention, Group-Query-Attention, ZeRO-Infinity, BitNet, etc.

License

Notifications You must be signed in to change notification settings

kreasof-ai/Homunculus-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python PyTorch Jupyter Notebook

Follow me on HF

Homunculus Project - Experimental Custom Transformer Architecture

By Habibullah Akbar.

Key features:

  • Seamless integration with vision encoder. Along with selective RoPE for each image and text embedding sequence.
  • Internal iteration, making deeper abstraction while keeping the same parameter count.
  • GeGLU activation function, inspired by Gemma 2 models.
  • Custom KV-caching, making sure each internal iteration has an independent KV-cache.
  • BPE tokenizer based on KBBI.
  • Grouped Query Attention.
  • PyTorch Lightning implementation.
  • DeepSpeed and ZeRO-3 integration. Automatically offload the memory overflow into CPU and NVMe.
  • Finetuning scripts example with LoRA adapters, with and without quantization.
  • Add BitNet implementation.
  • Flash Attention implementation.
  • Speech encoder.
  • 2D and 3D RoPE.
  • Diffusion Transformer for image detokenization.
  • Influential token extraction from attention heatmap.
  • Jupyter notebook example, both for training and finetuning.
  • Dual license open-source for individuals, paid for commercial uses.

Internal latent loop (9)

The iterable Transformer model, where the model can rethink its internal cognitive process with an internal confidence score as a guide. Akin of slow thinking mechanism. So this is the simple explanation of how it works:

  • We put an adjustable parameter to handle internal looping, the default value is 1.
  • If the loss value is high, this iteration is triggered, with max iterations set to 10.
  • We train an independent layer to output a confidence score, trained by loss value from the main training process.
  • When inference, both the next token and confidence scores are outputted and can determine how many iterations are needed for the current inference.

YouTube progress documentation playlist:

Soon:

  • Short-term memory injection.
  • SageAttention implementation.
  • Speech generation integration.
  • Discrete Latent Representation."
  • Grokfast
  • Mamba2 block (?).
  • Kolmogorov Arnold Network (KAN).
  • Mixture of Experts block.
  • Fast object detection integration, possibly YOLO or RT-DETR.
  • OCR model integration.
  • MIinference.
  • Pre-train model integration, possibly Gemma 2 since it uses the same activation function.
  • Citation to all of the papers used as references or inspirations.

UPDATE LICENSE: This software is dual-licensed under the terms of the GNU Affero General Public License (AGPL) and a commercial license. For commercial use, please contact Habibullah Akbar at akbar2habibullah.gmail to obtain a commercial license. Commercial use is defined as any use of the software for financial gain, including but not limited to, selling, licensing, or distributing the software as part of a product or service.

About

Long term project about a custom AI architecture. Consist of cutting-edge technique in machine learning such as Flash-Attention, Group-Query-Attention, ZeRO-Infinity, BitNet, etc.

Topics

Resources

License

Stars

Watchers

Forks