docs/usage/getting-started/generation.mdx
Describe what you want — LobeHub turns text into images and videos. Product prototypes, design inspiration, illustrations, motion concepts, short clips, or creative exploration: choose a model, set your parameters, and generate in seconds. All output lands in your generation feed and can be downloaded or saved to your Resource Library.
LobeHub ships two parallel workspaces — Image and Video — built on the same generation pipeline but tuned for each medium.
From the LobeHub sidebar:
/image./video.Each workspace has the same three-pane layout: prompt input, configuration panel, and a generation feed for past results.
Describe the image you want in the input box. The more specific your description, the more accurate the result.
Effective prompt structure:
[Subject] [Style/Medium] [Setting/Background] [Lighting] [Mood] [Technical details]
Examples:
"A futuristic city skyline at sunset, digital art, cyberpunk style, neon lights reflecting on wet streets, cinematic lighting, 4K detail"
"A cozy coffee shop interior, watercolor illustration, warm golden light streaming through windows, potted plants on windowsills, soft and inviting atmosphere"
"A product photo of a minimalist leather wallet on a clean white background, studio lighting, sharp focus, commercial photography style"
Prompt tips:
LobeHub offers multiple AI image generation models. Different models have different strengths:
| Model | Best For |
|---|---|
| DALL-E 3 | Realistic photos, illustrations, following prompts accurately |
| GPT Image | High-fidelity edits, text rendering inside images |
| Flux | Artistic styles, creative images, fast generation |
| Stable Diffusion | Highly customizable, community styles and fine-tuned models |
| Gemini Imagen | Photoreal scenes, strong global composition |
| fal.ai models | Various specialized styles and fast generation |
Try different models with the same prompt to see which gives the best results for your use case.
If you have reference images, upload them to guide the generation process. Click the upload button or drag and drop your reference images directly. You can upload multiple reference images depending on the model.
Reference images help the model understand your desired style, composition, or color palette — and many models also support reference-based edits (e.g. swap the background, change the outfit) when you describe the change in the prompt.
The right-hand config panel exposes everything the selected model supports. Common controls:
1:1, 16:9, 9:16, 4:3, 3:2. Lock or unlock to free-form size.512px, 1K, 2K, 4K) or set width × height directly.Aspect ratio cheatsheet:
Once generated, images appear in the generation feed. You can:
The Video workspace mirrors Image — same prompt-first flow, same config panel, same feed — but with controls tuned for motion.
Describe the scene, motion, and camera, not just the subject. Models reward verbs and shot language.
"A red fox trotting through fresh snow at golden hour, breath visible in the cold air, slow tracking shot, cinematic"
"An astronaut floating into a colorful nebula, slow dolly-in, dreamy atmosphere, soft volumetric light"
"A cup of coffee being poured in macro slow motion, steam rising, shallow depth of field, commercial product shot"
Prompt tips for video:
LobeHub integrates the major text-to-video and image-to-video providers:
| Model | Best For |
|---|---|
| OpenAI Sora 2 / Sora 2 Pro | Coherent multi-second clips, strong scene understanding |
| Google Veo 3 / 3.1 | Photoreal motion, native audio generation, cinematic look |
| Kling V3 | High-motion fidelity, image-to-video and omni-video |
| MiniMax Hailuo 2.3 | Fast text-to-video, expressive characters |
| Qwen / Wan | Text-to-video with strong Chinese prompt understanding |
| fal.ai models | Specialised models, fast turnaround |
Different models support different parameter sets — switching models updates the config panel automatically.
Many video models support image conditioning:
When a start frame is set, the prompt placeholder shifts to "Describe the scene you want to generate with the image".
Controls vary by model, but typically include:
16:9, 9:16, 1:1, 4:3, 3:4, 21:9.480p, 720p, 1080p.Generated clips appear in the feed and play inline. You can:
A "🎁 N free videos today" badge shows your remaining free quota; once it's used up, credits are consumed per generation.
Iterate on prompts — If the first result isn't quite right, adjust one element at a time rather than rewriting the whole prompt. Add more detail, change the style descriptor, or specify what you don't want.
Use a reference image or start frame — Uploading a reference helps the model match your intended style, color palette, composition, or — for video — your opening shot.
Try multiple variations — Generate several images per run, or re-generate videos with the same seed and a tweaked prompt. AI generation has inherent randomness — some variations will be significantly better than others.
Match model to task — Photorealistic models (DALL-E 3, Flux, Imagen) for product photos and realistic scenes; style-focused models for artistic illustrations; Veo or Sora for cinematic motion; Kling or Hailuo for character-heavy clips.
Bridge image → video — Generate a strong still in the Image workspace, then feed it into the Video workspace as a start frame to animate it.
<Cards> <Card href={'/docs/usage/getting-started/resource'} title={'Resource Library'} /><Card href={'/docs/usage/getting-started/vision'} title={'Vision & Image Understanding'} />
<Card href={'/docs/usage/providers'} title={'AI Providers'} /> </Cards>