Back to Mem0

Multimodal Support

docs/open-source/features/multimodal-support.mdx

2.0.18.4 KB
Original Source

Multimodal support lets Mem0 extract facts from images alongside regular text. Add screenshots, receipts, or product photos and Mem0 will store the insights as searchable memories so agents can recall them later.

<Info> **You’ll use this when…** - Users share screenshots, menus, or documents and you want the details to become memories. - You already collect text conversations but need visual context for better answers. - You want a single workflow that handles both URLs and local image files. </Info> <Warning> Images larger than 20 MB are rejected. Compress or resize files before sending them to avoid errors. </Warning>

Feature anatomy

  • Vision processing: Mem0 runs the image through a vision model that extracts text and key details.
  • Memory creation: Extracted information is stored as standard memories so search, filters, and analytics continue to work.
  • Context linking: Visual and textual turns in the same conversation stay linked, giving agents richer context.
  • Flexible inputs: Accept publicly accessible URLs or base64-encoded local files in both Python and JavaScript SDKs.
<AccordionGroup> <Accordion title="Supported formats"> | Format | Used for | Notes | | --- | --- | --- | | JPEG / JPG | Photos and screenshots | Default option for camera captures. | | PNG | Images with transparency | Keeps sharp text and UI elements crisp. | | WebP | Web-optimized images | Smaller payloads for faster uploads. | | GIF | Static or animated graphics | Works for simple graphics and short loops. | </Accordion> </AccordionGroup>

Configure it

Add image messages from URLs

<CodeGroup> ```python Python from mem0 import Memory

client = Memory()

messages = [ {"role": "user", "content": "Hi, my name is Alice."}, { "role": "user", "content": { "type": "image_url", "image_url": { "url": "https://example.com/menu.jpg" } } } ]

client.add(messages, user_id="alice")


```ts TypeScript
import { Memory } from "mem0ai";

const client = new Memory();

const messages = [
  { role: "user", content: "Hi, my name is Alice." },
  {
    role: "user",
    content: {
      type: "image_url",
      image_url: { url: "https://example.com/menu.jpg" }
    }
  }
];

await client.add(messages, { userId: "alice" });
</CodeGroup> <Info icon="check"> Inspect the response payload—the memories list should include entries extracted from the menu image as well as the text turns. </Info>

Upload local images as base64

<CodeGroup> ```python Python import base64 from mem0 import Memory

def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode("utf-8")

client = Memory() base64_image = encode_image("path/to/your/image.jpg")

messages = [ { "role": "user", "content": [ {"type": "text", "text": "What's in this image?"}, { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{base64_image}" } } ] } ]

client.add(messages, user_id="alice")


```ts TypeScript
import fs from "fs";
import { Memory } from "mem0ai";

function encodeImage(imagePath: string) {
  const buffer = fs.readFileSync(imagePath);
  return buffer.toString("base64");
}

const client = new Memory();
const base64Image = encodeImage("path/to/your/image.jpg");

const messages = [
  {
    role: "user",
    content: [
      { type: "text", text: "What's in this image?" },
      {
        type: "image_url",
        image_url: {
          url: `data:image/jpeg;base64,${base64Image}`
        }
      }
    ]
  }
];

await client.add(messages, { userId: "alice" });
</CodeGroup> <Tip> Keep base64 payloads under 5 MB to speed up uploads and avoid hitting the 20 MB limit. </Tip>

See it in action

Restaurant menu memory

python
from mem0 import Memory

client = Memory()

messages = [
    {
        "role": "user",
        "content": "Help me remember which dishes I liked."
    },
    {
        "role": "user",
        "content": {
            "type": "image_url",
            "image_url": {
                "url": "https://example.com/restaurant-menu.jpg"
            }
        }
    },
    {
        "role": "user",
        "content": "I’m allergic to peanuts and prefer vegetarian meals."
    }
]

result = client.add(messages, user_id="user123")
print(result)
<Info icon="check"> The response should capture both the allergy note and menu items extracted from the photo so future searches can combine them. </Info>

Document capture

python
messages = [
    {
        "role": "user",
        "content": "Store this receipt information for expenses."
    },
    {
        "role": "user",
        "content": {
            "type": "image_url",
            "image_url": {
                "url": "https://example.com/receipt.jpg"
            }
        }
    }
]

client.add(messages, user_id="user123")
<Tip> Combine the receipt upload with structured metadata (tags, categories) if you need to filter expenses later. </Tip>

Error handling

<CodeGroup> ```python Python from mem0 import Memory from mem0.exceptions import InvalidImageError, FileSizeError

client = Memory()

try: messages = [{ "role": "user", "content": { "type": "image_url", "image_url": {"url": "https://example.com/image.jpg"} } }]

client.add(messages, user_id="user123")
print("Image processed successfully")

except InvalidImageError: print("Invalid image format or corrupted file") except FileSizeError: print("Image file too large") except Exception as exc: print(f"Unexpected error: {exc}")


```ts TypeScript
import { Memory } from "mem0ai";

const client = new Memory();

try {
  const messages = [{
    role: "user",
    content: {
      type: "image_url",
      image_url: { url: "https://example.com/image.jpg" }
    }
  }];

  await client.add(messages, { userId: "user123" });
  console.log("Image processed successfully");
} catch (error: any) {
  if (error.type === "invalid_image") {
    console.log("Invalid image format or corrupted file");
  } else if (error.type === "file_size_exceeded") {
    console.log("Image file too large");
  } else {
    console.log(`Unexpected error: ${error.message}`);
  }
}
</CodeGroup> <Warning> Fail fast on invalid formats so you can prompt users to re-upload before losing their context. </Warning>

Verify the feature is working

  • After calling add, inspect the returned memories and confirm they include image-derived text (menu items, receipt totals, etc.).
  • Run a follow-up search for a detail from the image; the memory should surface alongside related text.
  • Monitor image upload latency—large files should still complete under your acceptable response time.
  • Log file size and URL sources to troubleshoot repeated failures.

Best practices

  1. Ask for intent: Prompt users to explain why they sent an image so the memory includes the right context.
  2. Keep images readable: Encourage clear photos without heavy filters or shadows for better extraction.
  3. Split bulk uploads: Send multiple images as separate add calls to isolate failures and improve reliability.
  4. Watch privacy: Avoid uploading sensitive documents unless your environment is secured for that data.
  5. Validate file size early: Check file size before encoding to save bandwidth and time.

Troubleshooting

IssueCauseFix
Upload rejectedFile larger than 20 MBCompress or resize before sending.
Memory missing image dataLow-quality or blurry imageRetake the photo with better lighting.
Invalid format errorUnsupported file typeConvert to JPEG or PNG first.
Slow processingHigh-resolution imagesDownscale or compress to under 5 MB.
Base64 errorsIncorrect prefix or encodingEnsure data:image/<type>;base64, is present and the string is valid.

<CardGroup cols={2}> <Card title="Connect Vision Models" icon="circle-dot" href="/components/llms/models/openai"> Review supported vision-capable models and configuration details. </Card> <Card title="Build Multimodal Retrieval" icon="image" href="/cookbooks/frameworks/multimodal-retrieval"> Follow an end-to-end workflow pairing text and image memories. </Card> </CardGroup>