apps/opik-documentation/documentation/fern/docs/opik-university/3_evaluation/3.4-evaluation-evaluate-llm-application.mdx
This comprehensive video demonstrates the complete evaluation workflow in Opik, where datasets and metrics come together to systematically assess LLM performance. You'll see a practical comparison between GPT-4 and Gemini models on a RAG application, learn about prompt versioning, experiment management, and discover how to make data-driven decisions for production deployment. This is where all previous concepts unite into actionable insights.