Multimodal Tutorial

Getting started

GGUF models with vision capabilities are uploaded along a mmproj file to Hugging Face.

As an example, download

to your textgen/user_data/models folder.

Then download

to your textgen/user_data/mmproj folder. Name it mmproj-gemma-3-4b-it-F16.gguf to give it a recognizable name.

Select your image by clicking on the 📎 icon and send your message:

The model will reply with great understanding of the image contents:

Multimodal also works with the ExLlamaV3 loader (the non-HF one).

No additional files are necessary, just load a multimodal EXL3 model and send an image.

Examples of models that you can use:

In the page below you can find some ready-to-use examples: