eval_details.md
This document contains additional context on the settings and parameters for how we evaluated the Llama 3 pre-trained and instruct-aligned models.
This evaluation set contains 1,800 prompts that cover 12 key use cases: asking for advice, brainstorming, classification, closed question answering, coding, creative writing, extraction, inhabiting a character/persona, open question answering, reasoning, rewriting, and summarization.
| Category | Count |
|---|---|
| Coding | 150 |
| Mathematical reasoning | 150 |
| Asking for Advice | 150 |
| Brainstorming | 150 |
| Classification | 150 |
| Closed Question Answering | 150 |
| Creative Writing | 150 |
| Extraction | 150 |
| Inhabiting a Character/Persona | 150 |
| Open Question Answering | 150 |
| Rewriting | 150 |
| Summarization | 150 |