Back to Qmd

QMD Query Expansion Scoring

finetune/SCORING.md

2.1.010.9 KB
Original Source

QMD Query Expansion Scoring

Goal

Transform a random typed query into a great set of retrieval-optimized expansions.

Input: "auth config" Output:

hyde: Authentication can be configured by setting the AUTH_SECRET environment variable and enabling the auth middleware in your application's config file.
lex: authentication configuration
lex: auth settings setup
vec: how to configure authentication settings
vec: authentication configuration options

Output Format

PrefixPurposeRequiredCount
lex:BM25 keyword variations (shorter, keyword-focused)Yes1-3
vec:Semantic reformulations (natural language)Yes1-3
hyde:Hypothetical document passageOptional0-1

Scoring Criteria

1. Format Compliance (0-30 points)

CriterionPointsDeduction
Has at least one lex: line+10-10 if missing
Has at least one vec: line+10-10 if missing
All lines have valid prefix (lex:, vec:, hyde:)+10-5 per invalid line
No garbage/prose outside of prefixed lines--10 if present

2. Diversity & Coverage (0-30 points)

CriterionPointsDeduction
2+ different types present (lex + vec)+10-10 if only one type
2+ total expansions+5-5 if only one
Multiple lex: lines are diverse (edit distance > 3)+5-2 per duplicate pair
Multiple vec: lines are diverse (edit distance > 5)+5-2 per duplicate pair
lex/vec not identical to original query+5-5 per line that equals query

3. Hyde Quality (0-20 points, optional bonus)

CriterionPointsDeduction
Hyde present and well-formed+5-
Hyde is concise (50-200 chars)+5-3 if too short, -5 if too long
Hyde has no newlines+5-5 if contains newlines
Hyde has no excessive repetition+5-3 if word repeats 3+ times

4. Content Quality (0-20 points)

CriterionPointsDeduction
Base relevance+5Subjective
Lex lines preserve key terms from query+5-5 if lex is generic
Lex lines are keyword-focused (shorter)+5-2 if lex is longer than vec
Vec lines are natural language (complete phrases)+5-2 if vec is just keywords

5. Named Entity Preservation (-65 to +20 points, CRITICAL)

Named entities are proper nouns, brand names, personal names, technical terms, and acronyms that MUST appear in lex queries. This prevents generic expansions that lose the specific topic.

Two-level checking:

CriterionPointsDeduction
Per-line: All lex lines contain at least one entity+15-
Per-line: Some lex lines contain entities+5-
Per-line: NO lex lines contain entities--30 HEAVY PENALTY
Per-entity: Entity completely absent from all lex+vec--20 per dropped entity
Generic filler phrases in lex--15 per phrase
Entities also in vec lines+5-

Named Entity Detection:

  • All-caps acronyms: TDS, API, GPU, AWS
  • Capitalized proper nouns (any position): React, Docker, Bob, Sarah
  • Personal names at query start: Bob asked about deployBob is an entity
  • Technical terms: node.js, C++, .NET
  • CamelCase: JavaScript, TypeScript
  • Compound names: TDS motorsports → both words are entities
  • Project names: Project Atlas, Horizon team

Generic Filler Phrases (BANNED in lex):

  • "find information about"
  • "search for", "look up"
  • "get information", "learn about"
  • "details about", "guide to"

Examples:

QueryBad LexGood Lex
who is TDS motorsportslex: find information aboutlex: TDS motorsports history
lex: company detailslex: TDS motorsports founders
meeting with Bob about C++lex: c++ meetingslex: Bob "C++" meeting
vec: programming meeting notesvec: meeting notes with Bob about C++
how to use React hookslex: programming tutoriallex: React hooks tutorial
lex: how to codelex: useEffect useState hooks

Key Rule: If a query mentions a specific entity (person, brand, product, technology, project name), that entity MUST appear somewhere in the lex+vec output. Dropping a person's name is especially costly.

6. Lex Phrase Quoting (bonus, +3 points)

When a query contains multi-word technical terms or proper nouns, lex output should use quoted phrases for exact matching in BM25.

CriterionPoints
Uses "quoted phrases" in lex when query has multi-word entities+3

When to quote:

  • Multi-word proper nouns: "New York", "Monte Carlo"
  • Specific technical terms: "machine learning", "rate limit"
  • Exact compound terms: "connection pool", "merge conflict"

When to use negation (-term):

  • Disambiguating terms: rust -corrosion, java -coffee, apple -fruit
  • Excluding related-but-wrong topics: "machine learning" -"deep learning"
  • Narrowing scope: docker -kubernetes, python -snake

Example:

Query: python memory leak debugging
Good lex: "memory leak" python -java -javascript
Good lex: tracemalloc "garbage collector" profiler

Score Calculation

Total Score = Format + Diversity + Hyde + Quality + Entity
Max Score = 120 (100 without hyde)
Normalized = Total / Max (0.0 - 1.0)

Rating:

  • 0.80-1.00: Excellent
  • 0.60-0.79: Good
  • 0.40-0.59: Acceptable
  • 0.20-0.39: Poor
  • 0.00-0.19: Failed

Note: Entity score can go negative, heavily penalizing outputs that miss named entities.

Examples

Good Example (Score: 85/100)

Query: "react hooks"

Output:

hyde: React Hooks allow you to use state and lifecycle features in functional components without writing a class.
lex: react hooks tutorial
lex: usestate useeffect
vec: how to use react hooks in functional components
vec: react hooks best practices guide

Scoring:

  • Format: 30/30 (all prefixes valid)
  • Diversity: 28/30 (2 types, 4 expansions, good diversity)
  • Hyde: 17/20 (present, good length, no repetition)
  • Quality: 10/20 (relevant, good structure)

Bad Example (Score: 15/100)

Query: "auth"

Output:

auth is an important concept that relates to authentication.
The answer should be in Chinese.
The answer should be in Chinese.

Scoring:

  • Format: 0/30 (no valid prefixes)
  • Diversity: 0/30 (no valid expansions)
  • Hyde: 0/20 (N/A)
  • Quality: 15/20 (some relevance but wrong format)

Medium Example (Score: 55/100)

Query: "docker networking"

Output:

hyde: Docker networking is an important concept. Docker networking is used for container communication. Docker networking configuration is essential.
lex: docker networking
vec: docker networking

Scoring:

  • Format: 30/30 (valid prefixes)
  • Diversity: 10/30 (lex=vec=query, no diversity)
  • Hyde: 5/20 (too repetitive - "docker networking" 3x)
  • Quality: 10/20 (relevant but low effort)

Heuristics

Repetition Detection

python
def word_repetition_score(text):
    words = text.lower().split()
    counts = Counter(words)
    # Deduct for words appearing 3+ times (excluding stopwords)
    stopwords = {'the', 'a', 'an', 'is', 'are', 'to', 'for', 'of', 'in', 'and', 'or'}
    repeated = sum(1 for w, c in counts.items() if c >= 3 and w not in stopwords)
    return max(0, 5 - repeated * 2)

Diversity Check (Simple)

python
def is_diverse(a, b, min_distance=3):
    """Check if two strings are sufficiently different."""
    a, b = a.lower().strip(), b.lower().strip()
    if a == b:
        return False
    # Simple: check if one is not a substring of the other
    if a in b or b in a:
        return False
    # Check edit distance (simplified)
    return len(set(a.split()) ^ set(b.split())) >= min_distance

Query Echo Detection

python
def echoes_query(expansion, query):
    """Check if expansion is just echoing the query."""
    exp = expansion.lower().strip()
    q = query.lower().strip()
    return exp == q or exp in q or q in exp

Named Entity Extraction

python
KEY_TERM_STOPWORDS = {'what', 'is', 'how', 'to', 'the', 'a', 'an', 'in', 'on', 'for', 'of',
                      'and', 'or', 'with', 'my', 'your', 'do', 'does', 'can', 'i', 'me', 'we',
                      'who', 'where', 'when', 'why', 'which', 'find', 'get', 'show', 'tell'}

def extract_named_entities(query: str) -> set:
    """Extract named entities using simple heuristics."""
    entities = set()
    words = query.split()
    prev_was_entity = False

    for i, word in enumerate(words):
        clean = word.strip('.,!?:;()[]"\'')
        if not clean:
            prev_was_entity = False
            continue

        is_entity = False

        # All-caps acronyms: TDS, API, GPU
        if clean.isupper() and len(clean) >= 2:
            entities.add(clean.lower())
            is_entity = True
        # Capitalized proper nouns (not first word)
        elif i > 0 and clean[0].isupper() and clean.lower() not in KEY_TERM_STOPWORDS:
            entities.add(clean.lower())
            is_entity = True
        # Technical terms: node.js, C++
        elif any(c in clean for c in '.+-#@') and len(clean) >= 2:
            entities.add(clean.lower())
            is_entity = True
        # CamelCase: JavaScript
        elif len(clean) > 1 and any(c.isupper() for c in clean[1:]) and clean[0].isupper():
            entities.add(clean.lower())
            is_entity = True
        # Word following an entity (compound names: TDS motorsports)
        elif prev_was_entity and clean.lower() not in KEY_TERM_STOPWORDS:
            entities.add(clean.lower())
            is_entity = True

        prev_was_entity = is_entity

    return entities

Generic Phrase Detection

python
GENERIC_LEX_PHRASES = {
    'find information about', 'search for', 'look up', 'get information',
    'learn about', 'information on', 'details about', 'find out about',
    'what is', 'how to', 'guide to', 'help with'
}

def lex_is_generic(lex_line: str) -> bool:
    """Check if lex line is a useless generic filler."""
    lex_lower = lex_line.lower().strip()
    for phrase in GENERIC_LEX_PHRASES:
        if phrase in lex_lower:
            # Check if there's specific content beyond the generic phrase
            remaining = lex_lower
            for word in phrase.split():
                remaining = remaining.replace(word, '', 1).strip()
            if len(remaining) < 3:  # Nothing specific left
                return True
    return False

Training Data Requirements

  1. EOM tokens: Ensure training examples end with proper end-of-message tokens
  2. Diverse examples: Include varied query types (short, long, technical, casual)
  3. Quality hyde: Hyde passages should be informative, not template-y
  4. No repetition: Avoid "This is important. This is very important." patterns