finetune/SCORING.md
Transform a random typed query into a great set of retrieval-optimized expansions.
Input: "auth config"
Output:
hyde: Authentication can be configured by setting the AUTH_SECRET environment variable and enabling the auth middleware in your application's config file.
lex: authentication configuration
lex: auth settings setup
vec: how to configure authentication settings
vec: authentication configuration options
| Prefix | Purpose | Required | Count |
|---|---|---|---|
lex: | BM25 keyword variations (shorter, keyword-focused) | Yes | 1-3 |
vec: | Semantic reformulations (natural language) | Yes | 1-3 |
hyde: | Hypothetical document passage | Optional | 0-1 |
| Criterion | Points | Deduction |
|---|---|---|
Has at least one lex: line | +10 | -10 if missing |
Has at least one vec: line | +10 | -10 if missing |
All lines have valid prefix (lex:, vec:, hyde:) | +10 | -5 per invalid line |
| No garbage/prose outside of prefixed lines | - | -10 if present |
| Criterion | Points | Deduction |
|---|---|---|
| 2+ different types present (lex + vec) | +10 | -10 if only one type |
| 2+ total expansions | +5 | -5 if only one |
| Multiple lex: lines are diverse (edit distance > 3) | +5 | -2 per duplicate pair |
| Multiple vec: lines are diverse (edit distance > 5) | +5 | -2 per duplicate pair |
| lex/vec not identical to original query | +5 | -5 per line that equals query |
| Criterion | Points | Deduction |
|---|---|---|
| Hyde present and well-formed | +5 | - |
| Hyde is concise (50-200 chars) | +5 | -3 if too short, -5 if too long |
| Hyde has no newlines | +5 | -5 if contains newlines |
| Hyde has no excessive repetition | +5 | -3 if word repeats 3+ times |
| Criterion | Points | Deduction |
|---|---|---|
| Base relevance | +5 | Subjective |
| Lex lines preserve key terms from query | +5 | -5 if lex is generic |
| Lex lines are keyword-focused (shorter) | +5 | -2 if lex is longer than vec |
| Vec lines are natural language (complete phrases) | +5 | -2 if vec is just keywords |
Named entities are proper nouns, brand names, personal names, technical terms, and acronyms that MUST appear in lex queries. This prevents generic expansions that lose the specific topic.
Two-level checking:
| Criterion | Points | Deduction |
|---|---|---|
| Per-line: All lex lines contain at least one entity | +15 | - |
| Per-line: Some lex lines contain entities | +5 | - |
| Per-line: NO lex lines contain entities | - | -30 HEAVY PENALTY |
| Per-entity: Entity completely absent from all lex+vec | - | -20 per dropped entity |
| Generic filler phrases in lex | - | -15 per phrase |
| Entities also in vec lines | +5 | - |
Named Entity Detection:
TDS, API, GPU, AWSReact, Docker, Bob, SarahBob asked about deploy → Bob is an entitynode.js, C++, .NETJavaScript, TypeScriptTDS motorsports → both words are entitiesProject Atlas, Horizon teamGeneric Filler Phrases (BANNED in lex):
Examples:
| Query | Bad Lex | Good Lex |
|---|---|---|
who is TDS motorsports | lex: find information about | lex: TDS motorsports history |
lex: company details | lex: TDS motorsports founders | |
meeting with Bob about C++ | lex: c++ meetings | lex: Bob "C++" meeting |
vec: programming meeting notes | vec: meeting notes with Bob about C++ | |
how to use React hooks | lex: programming tutorial | lex: React hooks tutorial |
lex: how to code | lex: useEffect useState hooks |
Key Rule: If a query mentions a specific entity (person, brand, product, technology, project name), that entity MUST appear somewhere in the lex+vec output. Dropping a person's name is especially costly.
When a query contains multi-word technical terms or proper nouns, lex output should use quoted phrases for exact matching in BM25.
| Criterion | Points |
|---|---|
Uses "quoted phrases" in lex when query has multi-word entities | +3 |
When to quote:
"New York", "Monte Carlo""machine learning", "rate limit""connection pool", "merge conflict"When to use negation (-term):
rust -corrosion, java -coffee, apple -fruit"machine learning" -"deep learning"docker -kubernetes, python -snakeExample:
Query: python memory leak debugging
Good lex: "memory leak" python -java -javascript
Good lex: tracemalloc "garbage collector" profiler
Total Score = Format + Diversity + Hyde + Quality + Entity
Max Score = 120 (100 without hyde)
Normalized = Total / Max (0.0 - 1.0)
Rating:
Note: Entity score can go negative, heavily penalizing outputs that miss named entities.
Query: "react hooks"
Output:
hyde: React Hooks allow you to use state and lifecycle features in functional components without writing a class.
lex: react hooks tutorial
lex: usestate useeffect
vec: how to use react hooks in functional components
vec: react hooks best practices guide
Scoring:
Query: "auth"
Output:
auth is an important concept that relates to authentication.
The answer should be in Chinese.
The answer should be in Chinese.
Scoring:
Query: "docker networking"
Output:
hyde: Docker networking is an important concept. Docker networking is used for container communication. Docker networking configuration is essential.
lex: docker networking
vec: docker networking
Scoring:
def word_repetition_score(text):
words = text.lower().split()
counts = Counter(words)
# Deduct for words appearing 3+ times (excluding stopwords)
stopwords = {'the', 'a', 'an', 'is', 'are', 'to', 'for', 'of', 'in', 'and', 'or'}
repeated = sum(1 for w, c in counts.items() if c >= 3 and w not in stopwords)
return max(0, 5 - repeated * 2)
def is_diverse(a, b, min_distance=3):
"""Check if two strings are sufficiently different."""
a, b = a.lower().strip(), b.lower().strip()
if a == b:
return False
# Simple: check if one is not a substring of the other
if a in b or b in a:
return False
# Check edit distance (simplified)
return len(set(a.split()) ^ set(b.split())) >= min_distance
def echoes_query(expansion, query):
"""Check if expansion is just echoing the query."""
exp = expansion.lower().strip()
q = query.lower().strip()
return exp == q or exp in q or q in exp
KEY_TERM_STOPWORDS = {'what', 'is', 'how', 'to', 'the', 'a', 'an', 'in', 'on', 'for', 'of',
'and', 'or', 'with', 'my', 'your', 'do', 'does', 'can', 'i', 'me', 'we',
'who', 'where', 'when', 'why', 'which', 'find', 'get', 'show', 'tell'}
def extract_named_entities(query: str) -> set:
"""Extract named entities using simple heuristics."""
entities = set()
words = query.split()
prev_was_entity = False
for i, word in enumerate(words):
clean = word.strip('.,!?:;()[]"\'')
if not clean:
prev_was_entity = False
continue
is_entity = False
# All-caps acronyms: TDS, API, GPU
if clean.isupper() and len(clean) >= 2:
entities.add(clean.lower())
is_entity = True
# Capitalized proper nouns (not first word)
elif i > 0 and clean[0].isupper() and clean.lower() not in KEY_TERM_STOPWORDS:
entities.add(clean.lower())
is_entity = True
# Technical terms: node.js, C++
elif any(c in clean for c in '.+-#@') and len(clean) >= 2:
entities.add(clean.lower())
is_entity = True
# CamelCase: JavaScript
elif len(clean) > 1 and any(c.isupper() for c in clean[1:]) and clean[0].isupper():
entities.add(clean.lower())
is_entity = True
# Word following an entity (compound names: TDS motorsports)
elif prev_was_entity and clean.lower() not in KEY_TERM_STOPWORDS:
entities.add(clean.lower())
is_entity = True
prev_was_entity = is_entity
return entities
GENERIC_LEX_PHRASES = {
'find information about', 'search for', 'look up', 'get information',
'learn about', 'information on', 'details about', 'find out about',
'what is', 'how to', 'guide to', 'help with'
}
def lex_is_generic(lex_line: str) -> bool:
"""Check if lex line is a useless generic filler."""
lex_lower = lex_line.lower().strip()
for phrase in GENERIC_LEX_PHRASES:
if phrase in lex_lower:
# Check if there's specific content beyond the generic phrase
remaining = lex_lower
for word in phrase.split():
remaining = remaining.replace(word, '', 1).strip()
if len(remaining) < 3: # Nothing specific left
return True
return False