Reviewer Guidelines & Evaluation Criteria

This reference documents how reviewers evaluate papers at major ML/AI conferences, helping authors anticipate and address reviewer concerns.

Universal Evaluation Dimensions
NeurIPS Reviewer Guidelines
ICML Reviewer Guidelines
ICLR Reviewer Guidelines
ACL Reviewer Guidelines
What Makes Reviews Strong
Common Reviewer Concerns
How to Address Reviewer Feedback

Universal Evaluation Dimensions

All major ML conferences assess papers across four core dimensions:

1. Quality (Technical Soundness)

What reviewers ask:

Are claims well-supported by theoretical analysis or experimental results?
Are the proofs correct? Are the experiments properly controlled?
Are baselines appropriate and fairly compared?
Is the methodology sound?

How to ensure high quality:

Include complete proofs (main paper or appendix with sketches)
Use appropriate baselines (not strawmen)
Report variance/error bars with methodology
Document hyperparameter selection process

2. Clarity (Writing & Organization)

What reviewers ask:

Is the paper clearly written and well organized?
Can an expert in the field reproduce the results?
Is notation consistent? Are terms defined?
Is the paper self-contained?

How to ensure clarity:

Use consistent terminology throughout
Define all notation at first use
Include reproducibility details (appendix acceptable)
Have non-authors read before submission

3. Significance (Impact & Importance)

What reviewers ask:

Are the results impactful for the community?
Will others build upon this work?
Does it address an important problem?
What is the potential for real-world impact?

How to demonstrate significance:

Clearly articulate the problem's importance
Connect to broader research themes
Discuss potential applications
Compare to existing approaches meaningfully

4. Originality (Novelty & Contribution)

What reviewers ask:

Does this provide new insights?
How does it differ from prior work?
Is the contribution non-trivial?

Key insight from NeurIPS guidelines:

"Originality does not necessarily require introducing an entirely new method. Papers that provide novel insights from evaluating existing approaches or shed light on why methods succeed can also be highly original."

NeurIPS Reviewer Guidelines

Scoring System (1-6 Scale)

Score	Label	Description
6	Strong Accept	Groundbreaking, flawless work; top 2-3% of submissions
5	Accept	Technically solid, high impact; would benefit the community
4	Borderline Accept	Solid work with limited evaluation; leans accept
3	Borderline Reject	Solid but weaknesses outweigh strengths; leans reject
2	Reject	Technical flaws or weak evaluation
1	Strong Reject	Well-known results or unaddressed ethics concerns

Reviewer Instructions

Reviewers are explicitly instructed to:

Evaluate the paper as written - not what it could be with revisions
Provide constructive feedback - 3-5 actionable points
Not penalize honest limitations - acknowledging weaknesses is encouraged
Assess reproducibility - can the work be verified?
Consider ethical implications - potential misuse or harm

What Reviewers Should Avoid

Superficial, uninformed reviews
Demanding unreasonable additional experiments
Penalizing authors for honest limitation acknowledgment
Rejecting for missing citations to reviewer's own work

Timeline (NeurIPS 2025 — verify dates for current year)

Bidding: May 17-21
Reviewing period: May 29 - July 2
Author rebuttals: July 24-30
Discussion period: July 31 - August 13
Final notifications: September 18

Note: These dates are from the 2025 cycle. Always check the current year's call for papers at the venue website.

ICML Reviewer Guidelines

Review Structure

ICML reviewers provide:

Summary - Brief description of contributions
Strengths - Positive aspects
Weaknesses - Areas for improvement
Questions - Clarifications for authors
Limitations - Assessment of stated limitations
Ethics - Any concerns
Overall Score - Recommendation

Scoring Guidelines

ICML uses a similar 1-6 scale with calibration:

Top 25% of accepted papers: Score 5-6
Typical accepted paper: Score 4-5
Borderline: Score 3-4
Clear reject: Score 1-2

Key Evaluation Points

Reproducibility - Are there enough details?
Experimental rigor - Multiple seeds, proper baselines?
Writing quality - Clear, organized, well-structured?
Novelty - Non-trivial contribution?

ICLR Reviewer Guidelines

OpenReview Process

ICLR uses OpenReview with:

Public reviews (after acceptance decisions)
Author responses visible to reviewers
Discussion between reviewers and ACs

Scoring

ICLR reviews include:

Soundness: 1-4 scale
Presentation: 1-4 scale
Contribution: 1-4 scale
Overall: 1-10 scale
Confidence: 1-5 scale

Unique ICLR Considerations

LLM Disclosure - Reviewers assess whether LLM use is properly disclosed
Reproducibility - Emphasis on code availability
Reciprocal Reviewing - Authors must also serve as reviewers

ACL Reviewer Guidelines

ACL-Specific Criteria

ACL adds NLP-specific evaluation:

Linguistic soundness - Are linguistic claims accurate?
Resource documentation - Are datasets/models properly documented?
Multilingual consideration - If applicable, is language diversity addressed?

Limitations Section

ACL specifically requires a Limitations section. Reviewers check:

Are limitations honest and comprehensive?
Do limitations undermine core claims?
Are potential negative impacts addressed?

Ethics Review

ACL has a dedicated ethics review process for:

Dual-use concerns
Data privacy issues
Bias and fairness implications

AAAI Reviewer Guidelines

Evaluation Criteria

AAAI reviewers evaluate along similar axes to NeurIPS/ICML but with some differences:

Criterion	Weight	Notes
Technical quality	High	Soundness of approach, correctness of results
Significance	High	Importance of the problem and contribution
Novelty	Medium-High	New ideas, methods, or insights
Clarity	Medium	Clear writing, well-organized presentation
Reproducibility	Medium	Sufficient detail to reproduce results

AAAI-Specific Considerations

Broader AI scope: AAAI covers all of AI, not just ML. Papers on planning, reasoning, knowledge representation, NLP, vision, robotics, and multi-agent systems are all in scope. Reviewers may not be deep ML specialists.
Formatting strictness: AAAI reviewers are instructed to flag formatting violations. Non-compliant papers may be desk-rejected before review.
Application papers: AAAI is more receptive to application-focused work than NeurIPS/ICML. Framing a strong application contribution is viable.
Senior Program Committee: AAAI uses SPCs (Senior Program Committee members) who mediate between reviewers and make accept/reject recommendations.

Scoring (AAAI Scale)

Strong Accept: Clearly above threshold, excellent contribution
Accept: Above threshold, good contribution with minor issues
Weak Accept: Borderline, merits outweigh concerns
Weak Reject: Borderline, concerns outweigh merits
Reject: Below threshold, significant issues
Strong Reject: Well below threshold

COLM Reviewer Guidelines

Evaluation Criteria

COLM reviews focus on relevance to language modeling in addition to standard criteria:

Criterion	Weight	Notes
Relevance	High	Must be relevant to language modeling community
Technical quality	High	Sound methodology, well-supported claims
Novelty	Medium-High	New insights about language models
Clarity	Medium	Clear presentation, reproducible
Significance	Medium-High	Impact on LM research and practice

COLM-Specific Considerations

Language model focus: Reviewers will assess whether the contribution advances understanding of language models. General ML contributions need explicit LM framing.
Newer venue norms: COLM is newer than NeurIPS/ICML, so reviewer calibration varies more. Write more defensively — anticipate a wider range of reviewer expertise.
ICLR-derived process: Review process is modeled on ICLR (open reviews, author response period, discussion among reviewers).
Broad interpretation of "language modeling": Includes training, evaluation, alignment, safety, efficiency, applications, theory, multimodality (if language is central), and social impact of LMs.

Scoring

COLM uses an ICLR-style scoring system:

8-10: Strong accept (top papers)
6-7: Weak accept (solid contribution)
5: Borderline
3-4: Weak reject (below threshold)
1-2: Strong reject

What Makes Reviews Strong

Following Daniel Dennett's Rules

Good reviewers follow these principles:

Re-express the position fairly - Show you understand the paper
List agreements - Acknowledge what works well
List what you learned - Credit the contribution
Only then critique - After establishing understanding

Review Structure Best Practices

Strong Review Structure:

Summary (1 paragraph):
- What the paper does
- Main contribution claimed

Strengths (3-5 bullets):
- Specific positive aspects
- Why these matter

Weaknesses (3-5 bullets):
- Specific concerns
- Why these matter
- Suggestions for addressing

Questions (2-4 items):
- Clarifications needed
- Things that would change assessment

Minor Issues (optional):
- Typos, unclear sentences
- Formatting issues

Overall Assessment:
- Clear recommendation with reasoning

Common Reviewer Concerns

Technical Concerns

Concern	How to Pre-empt
"Baselines too weak"	Use state-of-the-art baselines, cite recent work
"Missing ablations"	Include systematic ablation study
"No error bars"	Report std dev/error, multiple runs
"Hyperparameters not tuned"	Document tuning process, search ranges
"Claims not supported"	Ensure every claim has evidence

Novelty Concerns

Concern	How to Pre-empt
"Incremental contribution"	Clearly articulate what's new vs prior work
"Similar to [paper X]"	Explicitly compare to X in Related Work
"Straightforward extension"	Highlight non-obvious aspects

Clarity Concerns

Concern	How to Pre-empt
"Hard to follow"	Use clear structure, signposting
"Notation inconsistent"	Review all notation, create notation table
"Missing details"	Include reproducibility appendix
"Figures unclear"	Self-contained captions, proper sizing

Significance Concerns

Concern	How to Pre-empt
"Limited impact"	Discuss broader implications
"Narrow evaluation"	Evaluate on multiple benchmarks
"Only works in restricted setting"	Acknowledge scope, explain why still valuable

How to Address Reviewer Feedback

Rebuttal Best Practices

Do:

Thank reviewers for their time
Address each concern specifically
Provide evidence (new experiments if possible)
Be concise—reviewers are busy
Acknowledge valid criticisms

Don't:

Be defensive or dismissive
Make promises you can't keep
Ignore difficult criticisms
Write excessively long rebuttals
Argue about subjective assessments

Rebuttal Template

markdown

We thank the reviewers for their thoughtful feedback.

## Reviewer 1

**R1-Q1: [Quoted concern]**
[Direct response with evidence]

**R1-Q2: [Quoted concern]**
[Direct response with evidence]

## Reviewer 2

...

## Summary of Changes
If accepted, we will:
1. [Specific change]
2. [Specific change]
3. [Specific change]

When to Accept Criticism

Some reviewer feedback should simply be accepted:

Valid technical errors
Missing important related work
Unclear explanations
Missing experimental details

Acknowledge these gracefully: "The reviewer is correct that... We will revise to..."

When to Push Back

You can respectfully disagree when:

Reviewer misunderstood the paper
Requested experiments are out of scope
Criticism is factually incorrect

Frame disagreements constructively: "We appreciate this perspective. However, [explanation]..."

Pre-Submission Reviewer Simulation

Before submitting, ask yourself:

Quality:

Would I trust these results if I saw them?
Are all claims supported by evidence?
Are baselines fair and recent?

Clarity:

Can someone reproduce this from the paper?
Is the writing clear to non-experts in this subfield?
Are all terms and notation defined?

Significance:

Why should the community care about this?
What can people do with this work?
Is the problem important?

Originality:

What specifically is new here?
How does this differ from closest related work?
Is the contribution non-trivial?

Reviewer Guidelines & Evaluation Criteria

Reviewer Guidelines & Evaluation Criteria

Contents

Universal Evaluation Dimensions

1. Quality (Technical Soundness)

2. Clarity (Writing & Organization)

3. Significance (Impact & Importance)

4. Originality (Novelty & Contribution)

NeurIPS Reviewer Guidelines

Scoring System (1-6 Scale)

Reviewer Instructions

What Reviewers Should Avoid

Timeline (NeurIPS 2025 — verify dates for current year)

ICML Reviewer Guidelines

Review Structure

Scoring Guidelines

Key Evaluation Points

ICLR Reviewer Guidelines

OpenReview Process

Scoring

Unique ICLR Considerations

ACL Reviewer Guidelines

ACL-Specific Criteria

Limitations Section

Ethics Review

AAAI Reviewer Guidelines

Evaluation Criteria

AAAI-Specific Considerations

Scoring (AAAI Scale)

COLM Reviewer Guidelines

Evaluation Criteria

COLM-Specific Considerations

Scoring

What Makes Reviews Strong

Following Daniel Dennett's Rules

Review Structure Best Practices

Common Reviewer Concerns

Technical Concerns

Novelty Concerns

Clarity Concerns

Significance Concerns

How to Address Reviewer Feedback

Rebuttal Best Practices

Rebuttal Template

When to Accept Criticism

When to Push Back

Pre-Submission Reviewer Simulation