communications/postmortems/template.md
Date: DATE
Duration: [Start time] to [End time]
Impact: [Brief description of user/system impact]
Severity: [Critical/High/Medium/Low]
Prepared by: [Name(s)]
[Provide a concise summary of the incident, including what happened, the scope of impact, and how it affected users/systems. This should be 2-3 paragraphs that give readers a clear understanding of the incident without diving into technical details.]
All times in Pacific Time (PT)
[Detailed technical explanation of what caused the incident. Include relevant system components, configuration issues, code problems, or external factors that contributed to the incident.]
[Describe the steps taken to resolve the incident, including temporary mitigations and permanent fixes. Explain why these solutions addressed the root cause.]
[Describe the specific changes being implemented to prevent similar incidents in the future. This could include monitoring improvements, process changes, architectural modifications, or additional safeguards.]
[Assessment of internal and external communication during the incident. Include what was communicated, to whom, and through what channels. Note any improvements needed for future incidents.]
[Include or link to relevant monitoring data, error rates, system metrics, etc.]
[Reference any similar past incidents or related issues]