Security Incident Response Plan
This document outlines the process for responding to a security incident. This plan is distinct from the Disaster Recovery plan, which focuses on operational outages. A security incident involves a suspected breach of system security, data, or confidentiality.
What is a Security Incident?
A security incident is any event that compromises the confidentiality, integrity, or availability of our systems or data. Examples include:
- Unauthorized access to a server or database.
- A data breach (e.g., exposure of user data).
- A malware infection.
- A successful Denial of Service (DoS) attack that is security-related.
- Discovery of a critical vulnerability that is being actively exploited.
The Security Incident Response Team (SIRT)
- Incident Lead: The single point of contact responsible for coordinating the response.
- Technical/Forensic Lead: Responsible for the technical investigation of the incident.
- Communications Lead: Responsible for all internal and external communications.
- Legal/Compliance Lead: To be engaged if the incident involves legal or regulatory issues (e.g., a data breach under GDPR).
The 6 Phases of Incident Response
We follow the SANS/NIST framework for incident response.
1. Preparation
This is what we do before an incident occurs.
- Tools: Having logging, monitoring, and security tools in place (Cloudflare WAF, Sentry, log aggregation).
- Training: Ensuring the team is aware of this plan.
- Access: Ensuring the SIRT has the necessary access to tools and systems.
2. Identification
This phase begins when a potential incident is detected.
- Detection: An incident can be identified via automated alerts (e.g., Cloudflare WAF block, Sentry alert) or manual discovery.
- Verification: The first step is to quickly verify if the alert represents a genuine security incident.
- Documentation: The Incident Lead starts a log of all actions taken, decisions made, and communications sent. This is critical for the post-mortem and any legal proceedings.
- Escalation: The Incident Lead escalates to the rest of the SIRT and company leadership.
3. Containment
The goal of this phase is to stop the bleeding and prevent the incident from spreading.
- Short-term containment: This may involve immediate actions like:
- Blocking a malicious IP address in the Cloudflare WAF.
- Temporarily disabling a compromised user account.
- Isolating an affected server from the network.
- Long-term containment: Building a clean environment to restore services to.
4. Eradication
This phase focuses on removing the root cause of the incident.
- Find and eliminate the root cause: This could involve:
- Patching a software vulnerability.
- Removing malware.
- Fixing a misconfiguration.
- Improve defenses: If the attacker got in through a weak password, this is where we would implement stronger password policies.
5. Recovery
This phase involves restoring the affected systems to normal operation.
- Restore from a known-good backup: If data was corrupted, restore it from a backup taken before the incident.
- Bring systems back online: Carefully monitor the systems as they are brought back online to ensure they are stable and the attacker is gone.
- Communicate: The Communications Lead informs stakeholders that the incident is resolved.
6. Lessons Learned (Post-Mortem)
This is the most critical phase for long-term improvement.
- Timeline: A detailed timeline of the incident is created.
- Root Cause Analysis: What was the fundamental vulnerability or failure that allowed the incident to happen?
- What went well? What went poorly? A blame-free analysis of the response itself.
- Action Items: Create a list of concrete actions to improve our security posture and our incident response process. This list is tracked to completion.
This structured approach ensures that we can respond to security incidents in a calm, methodical, and effective manner.