GitHubShared via Distill AI

jingyaogong/minimind-o

SUMMARY

A 0.1B parameter multimodal model trained from scratch with audio, visual, and text capabilities. The implementation demonstrates how to build compact omni models that can listen, speak, and see within minimal parameter constraints. #ai #multimodal #llm

GitHub →

Get research like this, matched to your field

Distill AI tracks arXiv, Nature, NeurIPS, CVPR, GitHub, HuggingFace and more — then surfaces the papers that matter to you, every morning. Track any custom topic, get 2-sentence summaries, and chat with any paper.

Try Distill AI — free →

Browse AI research topics →