CS Conference Writing Style Guide

Comprehensive writing guide for ACL, EMNLP, NAACL (NLP), CHI, CSCW (HCI), SIGKDD, WWW, SIGIR (data mining/IR), and other major CS conferences.

Last Updated: 2024

Overview

CS conferences span diverse subfields with distinct writing cultures. This guide covers NLP, HCI, and data mining/IR venues, each with unique expectations and evaluation criteria.

Part 1: NLP Conferences (ACL, EMNLP, NAACL)

NLP Writing Philosophy

"Strong empirical results on standard benchmarks with insightful analysis."

NLP papers balance empirical rigor with linguistic insight. Human evaluation is increasingly important alongside automatic metrics.

Audience and Tone

Target Reader

NLP researchers and computational linguists
Familiar with transformer architectures, standard benchmarks
Expect reproducible results and error analysis

Tone Characteristics

Characteristic	Description
Task-focused	Clear problem definition
Benchmark-oriented	Standard datasets emphasized
Analysis-rich	Error analysis, qualitative examples
Reproducible	Full implementation details

Abstract (NLP Style)

Structure

Task/problem (1 sentence)
Limitation of prior work (1 sentence)
Your approach (1-2 sentences)
Results on benchmarks (2 sentences)
Analysis finding (optional, 1 sentence)

Example Abstract

Coreference resolution remains challenging for pronouns with distant or 
ambiguous antecedents. Prior neural approaches struggle with these 
difficult cases due to limited context modeling. We introduce 
LongContext-Coref, a retrieval-augmented coreference model that 
dynamically retrieves relevant context from document history. On the 
OntoNotes 5.0 benchmark, LongContext-Coref achieves 83.4 F1, improving 
over the previous state-of-the-art by 1.2 points. On the challenging 
WinoBias dataset, we reduce gender bias by 34% while maintaining 
accuracy. Qualitative analysis reveals that our model successfully 
resolves pronouns requiring world knowledge, a known weakness of 
prior approaches.

NLP Paper Structure

├── Introduction
│   ├── Task motivation
│   ├── Prior work limitations
│   ├── Your contribution
│   └── Contribution bullets
├── Related Work
├── Method
│   ├── Problem formulation
│   ├── Model architecture
│   └── Training procedure
├── Experiments
│   ├── Datasets (with statistics)
│   ├── Baselines
│   ├── Main results
│   ├── Analysis
│   │   ├── Error analysis
│   │   ├── Ablation study
│   │   └── Qualitative examples
│   └── Human evaluation (if applicable)
├── Discussion / Limitations
└── Conclusion

NLP-Specific Requirements

Datasets

Use standard benchmarks: GLUE, SQuAD, CoNLL, OntoNotes
Report dataset statistics: train/dev/test sizes
Data preprocessing: Document all steps

Evaluation Metrics

Task-appropriate metrics: F1, BLEU, ROUGE, accuracy
Statistical significance: Paired bootstrap, p-values
Multiple runs: Report mean ± std across seeds

Human Evaluation

Increasingly expected for generation tasks:

Annotator details: Number, qualifications, agreement
Evaluation protocol: Guidelines, interface, payment
Inter-annotator agreement: Cohen's κ or Krippendorff's α

Example Human Evaluation Table

Table 3: Human Evaluation Results (100 samples, 3 annotators)
─────────────────────────────────────────────────────────────
Method        | Fluency | Coherence | Factuality | Overall
─────────────────────────────────────────────────────────────
Baseline      |   3.8   |    3.2    |    3.5     |   3.5
GPT-3.5       |   4.2   |    4.0    |    3.7     |   4.0
Our Method    |   4.4   |    4.3    |    4.1     |   4.3
─────────────────────────────────────────────────────────────
Inter-annotator κ = 0.72. Scale: 1-5 (higher is better).

ACL-Specific Notes

ARR (ACL Rolling Review): Shared review system across ACL venues
Responsible NLP checklist: Ethics, limitations, risks
Long (8 pages) vs. Short (4 pages): Different expectations
Findings papers: Lower-tier acceptance track

Part 2: HCI Conferences (CHI, CSCW, UIST)

HCI Writing Philosophy

"Technology in service of humans—understand users first, then design and evaluate."

HCI papers are fundamentally user-centered. Technology novelty alone is insufficient; understanding human needs and demonstrating user benefit is essential.

Audience and Tone

Target Reader

HCI researchers and practitioners
UX designers and product developers
Interdisciplinary (CS, psychology, design, social science)

Tone Characteristics

Characteristic	Description
User-centered	Focus on people, not technology
Design-informed	Grounded in design thinking
Empirical	User studies provide evidence
Reflective	Consider broader implications

HCI Abstract

Focus on Users and Impact

Video calling has become essential for remote collaboration, yet 
current interfaces poorly support the peripheral awareness that makes 
in-person work effective. Through formative interviews with 24 remote 
workers, we identified three key challenges: difficulty gauging 
colleague availability, lack of ambient presence cues, and interruption 
anxiety. We designed AmbientOffice, a peripheral display system that 
conveys teammate presence through subtle ambient visualizations. In a 
two-week deployment study with 18 participants across three distributed 
teams, AmbientOffice increased spontaneous collaboration by 40% and 
reduced perceived isolation (p<0.01). Participants valued the system's 
non-intrusive nature and reported feeling more connected to remote 
colleagues. We discuss implications for designing ambient awareness 
systems and the tension between visibility and privacy in remote work.

HCI Paper Structure

Research Through Design / Systems Papers

├── Introduction
│   ├── Problem in human terms
│   ├── Why technology can help
│   └── Contribution summary
├── Related Work
│   ├── Domain background
│   ├── Prior systems
│   └── Theoretical frameworks
├── Formative Work (often)
│   ├── Interviews / observations
│   └── Design requirements
├── System Design
│   ├── Design rationale
│   ├── Implementation
│   └── Interface walkthrough
├── Evaluation
│   ├── Study design
│   ├── Participants
│   ├── Procedure
│   ├── Findings (quant + qual)
│   └── Limitations
├── Discussion
│   ├── Design implications
│   ├── Generalizability
│   └── Future work
└── Conclusion

Qualitative / Interview Studies

├── Introduction
├── Related Work
├── Methods
│   ├── Participants
│   ├── Procedure
│   ├── Data collection
│   └── Analysis method (thematic, grounded theory, etc.)
├── Findings
│   ├── Theme 1 (with quotes)
│   ├── Theme 2 (with quotes)
│   └── Theme 3 (with quotes)
├── Discussion
│   ├── Implications for design
│   ├── Implications for research
│   └── Limitations
└── Conclusion

HCI-Specific Requirements

Participant Reporting

Demographics: Age, gender, relevant experience
Recruitment: How and where recruited
Compensation: Payment amount and type
IRB approval: Ethics board statement

Quotes in Findings

Use direct quotes to ground findings:

Participants valued the ambient nature of the display. As P7 described: 
"It's like having a window to my teammate's office. I don't need to 
actively check it, but I know they're there." This passive awareness 
reduced the barrier to initiating contact.

Design Implications Section

Translate findings into actionable guidance:

**Implication 1: Support peripheral awareness without demanding attention.**
Ambient displays should be visible in peripheral vision but not require 
active monitoring. Designers should consider calm technology principles.

**Implication 2: Balance visibility with privacy.**
Users want to share presence but fear surveillance. Systems should 
provide granular controls and make visibility mutual.

CHI-Specific Notes

Contribution types: Empirical, artifact, methodological, theoretical
ACM format: acmart document class with sigchi option
Accessibility: Alt text, inclusive language expected
Contribution statement: Required per-author contributions

Part 3: Data Mining & IR (SIGKDD, WWW, SIGIR)

Data Mining Writing Philosophy

"Scalable methods for real-world data with demonstrated practical impact."

Data mining papers emphasize scalability, real-world applicability, and solid experimental methodology.

Audience and Tone

Target Reader

Data scientists and ML engineers
Industry researchers
Applied ML practitioners

Tone Characteristics

Characteristic	Description
Scalable	Handle large datasets
Practical	Real-world applications
Reproducible	Datasets and code shared
Industrial	Industry datasets valued

KDD Abstract

Emphasize Scale and Application

Fraud detection in e-commerce requires processing millions of 
transactions in real-time while adapting to evolving attack patterns. 
We present FraudShield, a graph neural network framework for real-time 
fraud detection that scales to billion-edge transaction graphs. Unlike 
prior methods that require full graph access, FraudShield uses 
incremental updates with O(1) inference cost per transaction. On a 
proprietary dataset of 2.3 billion transactions from a major e-commerce 
platform, FraudShield achieves 94.2% precision at 80% recall, 
outperforming production baselines by 12%. The system has been deployed 
at [Company], processing 50K transactions per second and preventing 
an estimated $400M in annual fraud losses. We release an anonymized 
benchmark dataset and code.

KDD Paper Structure

├── Introduction
│   ├── Problem and impact
│   ├── Technical challenges
│   ├── Your approach
│   └── Contributions
├── Related Work
├── Preliminaries
│   ├── Problem definition
│   └── Notation
├── Method
│   ├── Overview
│   ├── Technical components
│   └── Complexity analysis
├── Experiments
│   ├── Datasets (with scale statistics)
│   ├── Baselines
│   ├── Main results
│   ├── Scalability experiments
│   ├── Ablation study
│   └── Case study / deployment
└── Conclusion

KDD-Specific Requirements

Scalability

Dataset sizes: Report number of nodes, edges, samples
Runtime analysis: Wall-clock time comparisons
Complexity: Time and space complexity stated
Scaling experiments: Show performance vs. data size

Industrial Deployment

Case studies: Real-world deployment stories
A/B tests: Online evaluation results (if applicable)
Production metrics: Business impact (if shareable)

Example Scalability Table

Table 4: Scalability Comparison (runtime in seconds)
──────────────────────────────────────────────────────
Dataset     | Nodes  | Edges  | GCN   | GraphSAGE | Ours
──────────────────────────────────────────────────────
Cora        |  2.7K  |  5.4K  |  0.3  |    0.2    |  0.1
Citeseer    |  3.3K  |  4.7K  |  0.4  |    0.3    |  0.1
PubMed      | 19.7K  | 44.3K  |  1.2  |    0.8    |  0.3
ogbn-arxiv  | 169K   | 1.17M  |  8.4  |    4.2    |  1.6
ogbn-papers | 111M   | 1.6B   |  OOM  |   OOM     | 42.3
──────────────────────────────────────────────────────

Part 4: Common Elements Across CS Venues

Writing Quality

Clarity

One idea per sentence
Define terms before use
Use consistent notation

Precision

Exact numbers: "23.4%" not "about 20%"
Clear claims: Avoid hedging unless necessary
Specific comparisons: Name the baseline

Contribution Bullets

Used across all CS venues:

Our contributions are:
• We identify [problem/insight]
• We propose [method name] that [key innovation]
• We demonstrate [results] on [benchmarks]
• We release [code/data] at [URL]

Reproducibility Standards

All CS venues increasingly expect:

Code availability: GitHub link (anonymous for review)
Data availability: Public datasets or release plans
Full hyperparameters: Training details complete
Random seeds: Exact values for reproduction

Ethics and Broader Impact

NLP (ACL/EMNLP)

Limitations section: Required
Responsible NLP checklist: Ethical considerations
Bias analysis: For models affecting people

HCI (CHI)

IRB/Ethics approval: Required for human subjects
Informed consent: Procedure described
Privacy considerations: Data handling

KDD/WWW

Societal impact: Consider misuse potential
Privacy preservation: For sensitive data
Fairness analysis: When applicable

Venue Comparison Table

Aspect	ACL/EMNLP	CHI	KDD/WWW	SIGIR
Focus	NLP tasks	User studies	Scalable ML	IR/search
Evaluation	Benchmarks + human	User studies	Large-scale exp	Datasets
Theory weight	Moderate	Low	Moderate	Moderate
Industry value	High	Medium	Very high	High
Page limit	8 long / 4 short	10 + refs	9 + refs	10 + refs
Review style	ARR	Direct	Direct	Direct

CS Conference Writing Style Guide

CS Conference Writing Style Guide

Overview

Part 1: NLP Conferences (ACL, EMNLP, NAACL)

NLP Writing Philosophy

Audience and Tone

Target Reader

Tone Characteristics

Abstract (NLP Style)

Structure

Example Abstract

NLP Paper Structure

NLP-Specific Requirements

Datasets

Evaluation Metrics

Human Evaluation

Example Human Evaluation Table

ACL-Specific Notes

Part 2: HCI Conferences (CHI, CSCW, UIST)

HCI Writing Philosophy

Audience and Tone

Target Reader

Tone Characteristics

HCI Abstract

Focus on Users and Impact

HCI Paper Structure

Research Through Design / Systems Papers

Qualitative / Interview Studies

HCI-Specific Requirements

Participant Reporting

Quotes in Findings

Design Implications Section

CHI-Specific Notes

Part 3: Data Mining & IR (SIGKDD, WWW, SIGIR)

Data Mining Writing Philosophy

Audience and Tone

Target Reader

Tone Characteristics

KDD Abstract

Emphasize Scale and Application

KDD Paper Structure

KDD-Specific Requirements

Scalability

Industrial Deployment

Example Scalability Table

Part 4: Common Elements Across CS Venues

Writing Quality

Clarity

Precision

Contribution Bullets

Reproducibility Standards

Ethics and Broader Impact

NLP (ACL/EMNLP)

HCI (CHI)

KDD/WWW

Venue Comparison Table

Pre-Submission Checklist

All CS Venues

NLP-Specific

HCI-Specific

Data Mining-Specific

See Also