Dataset Design Goals
Datasets are designed to force entity correlation, not just volume ranking.
Noise is preserved so rules must survive benign lookalikes.
Rows are useful only when they support a triage decision, investigation pivot, or rule validation.
Core Schema
| Field | Type | Why It Exists |
|---|
| Timestamp | datetime | Normalized event timestamp in UTC for deterministic replay and incident sequencing. |
| User | string | Targeted or authenticated identity used for spread, privilege, and handoff analysis. |
| SrcIp | string | Primary attacking or originating infrastructure entity for source-based detections. |
| Action | string | Security-relevant event verb such as LoginFailed or LoginSuccess. |
| Device | string | Host or fingerprint context used to reduce ambiguity and raise confidence. |
| Location | string | Geo/site context for travel, anomaly, and follow-on pivot analysis. |
| ResultCode | string | Platform return code used for exclusion logic and benign noise control. |
Password Spray
T1110.003Signal Shape
One source, broad user spread, low attempts per user, repeated failures.
False-Positive Trap
Benign SSO issues can inflate failures without true spread.
Expected Deliverable
Detection rule with spread threshold and analyst-facing evidence output.
Fail-To-Success Compromise Path
T1078Signal Shape
Repeated failures followed by a success event and post-auth activity.
False-Positive Trap
Legitimate user retries can mimic compromise if you ignore device or source context.
Expected Deliverable
Investigation note with confidence statement and containment recommendation.
Impossible Travel
T1078Signal Shape
Implausible location shift in a tight time window with supporting device novelty.
False-Positive Trap
VPN edge exits and corporate proxies create false-positive pressure.
Expected Deliverable
Correlation query plus responder handoff note explaining confidence boundaries.
How This Data Should Be Used
The goal is not to memorize event rows. The goal is to learn how to isolate signal, explain confidence, document false positives, and produce detection outputs that would survive review.