Back to Mastra

Incident Post-Mortem: [Incident Title]

communications/postmortems/template.md

2025-12-183.1 KB
Original Source

Incident Post-Mortem: [Incident Title]

Date: DATE Duration: [Start time] to [End time]
Impact: [Brief description of user/system impact]
Severity: [Critical/High/Medium/Low]
Prepared by: [Name(s)]

Issue Summary

[Provide a concise summary of the incident, including what happened, the scope of impact, and how it affected users/systems. This should be 2-3 paragraphs that give readers a clear understanding of the incident without diving into technical details.]

Timeline

All times in Pacific Time (PT)

  • [Time]: [Event - Initial detection/alert/report]
  • [Time]: [Event - Investigation began]
  • [Time]: [Event - Key findings or actions]
  • [Time]: [Event - Additional steps taken]
  • [Time]: [Event - Resolution implemented]
  • [Time]: [Event - Service fully restored]

Root Cause Analysis

[Detailed technical explanation of what caused the incident. Include relevant system components, configuration issues, code problems, or external factors that contributed to the incident.]

Resolution

[Describe the steps taken to resolve the incident, including temporary mitigations and permanent fixes. Explain why these solutions addressed the root cause.]

Impact Assessment

  • Users Affected: [Number or percentage of users impacted]
  • Services Affected: [List of affected services/features]
  • Duration: [Total downtime or degradation period]
  • Business Impact: [Any financial, reputational, or operational impacts]

Action Items

Immediate Actions (0-7 days)

  • [Action Item]: [Description of task] - Owner: [Name], Due: [Date]
  • [Action Item]: [Description of task] - Owner: [Name], Due: [Date]

Short-term Improvements (8-30 days)

  • [Action Item]: [Description of task] - Owner: [Name], Due: [Date]
  • [Action Item]: [Description of task] - Owner: [Name], Due: [Date]

Long-term Projects (30+ days)

  • [Action Item]: [Description of task] - Owner: [Name], Due: [Date]
  • [Action Item]: [Description of task] - Owner: [Name], Due: [Date]

Lessons Learned

What Went Well

  • [Positive aspect of the response]
  • [Positive aspect of the response]

What Went Wrong

  • [Area for improvement]
  • [Area for improvement]

Where We Got Lucky

  • [Fortunate circumstance that limited impact]
  • [Fortunate circumstance that limited impact]

Prevention Measures

[Describe the specific changes being implemented to prevent similar incidents in the future. This could include monitoring improvements, process changes, architectural modifications, or additional safeguards.]

Communication Notes

[Assessment of internal and external communication during the incident. Include what was communicated, to whom, and through what channels. Note any improvements needed for future incidents.]

Appendix

Relevant Metrics/Graphs

[Include or link to relevant monitoring data, error rates, system metrics, etc.]

[Reference any similar past incidents or related issues]

References

  • [Link to relevant documentation]
  • [Link to related tickets/issues]
  • [Link to relevant code changes]