AI Integration
DFIRe integrates with large language models (LLMs) to assist with report writing. AI features are optional, user-initiated, and designed to give administrators full control over what is sent to the LLM provider.
Overview
DFIRe connects to LLM providers via API, supporting OpenAI, Anthropic, Google Gemini, Azure OpenAI, and GitHub Models. The same configuration workflow applies regardless of which provider you use.
Key design principles:
- On-demand only: AI generation is always triggered by an explicit user action. Nothing is sent to the LLM automatically.
- Stateless: Case data is sent per request. DFIRe does not maintain conversation history or context across requests with the LLM.
- Transparent prompts: Administrators can view and customize every prompt sent to the LLM, including the built-in system instructions.
- Output is always a draft: Generated content is presented to the user for review. It is never published or saved without human approval.
Optional feature: AI integration requires configuration. If no LLM provider is configured, all AI-related controls are hidden from the interface. DFIRe is fully functional without AI features.
Supported Providers
DFIRe supports the following LLM providers via API:
| Provider | Model Format | Notes |
|---|---|---|
| OpenAI | gpt-4o, gpt-4, o3-mini |
Direct OpenAI API |
| Anthropic | anthropic/claude-sonnet-4-20250514 |
Claude models via Anthropic API |
| Google Gemini | gemini/gemini-2.5-pro |
Gemini models via Google AI |
| Azure OpenAI | azure/<deployment-name> |
Requires base URL pointing to your Azure endpoint |
| GitHub Models | github/<model-name> |
Models available via GitHub Models marketplace |
How It Works
When a user requests AI-generated content, DFIRe performs the following steps:
-
Case data collection
DFIRe assembles a structured JSON snapshot of the case. This includes case metadata, timeline events, evidence items, indicators of compromise, notes, CAN report history, and team information. The same data structure is used regardless of which report type is being generated.
-
Data minification
The raw case JSON is optimized for token efficiency before being sent to the LLM. This post-processing step removes information that does not contribute to report quality while preserving all analytically relevant content. See Data Sent to the LLM for details.
-
Prompt assembly
The minified case data is inserted into the prompt template using the
{case_data}variable. The prompt is assembled from two parts: a system message (sets the output format and analyst role) and a user message (provides content guidance and the case data). For report sections, the{section_title}and{writing_guide}variables are also replaced. -
LLM request
The assembled prompt is sent to the configured LLM provider via LiteLLM. Temperature and max token settings from the configuration are applied.
-
Output sanitization
The LLM response is sanitized to remove any potentially harmful content (script injection, etc.) before being presented to the user. For CAN reports, the response is additionally validated as JSON with the expected structure.
-
User review
The sanitized output is displayed to the user as a draft. The user can accept, edit, or discard the generated content.
Data Sent to the LLM
Understanding what data leaves your environment is critical for security and compliance. DFIRe sends a minified JSON snapshot of the case as part of the prompt. The minification step removes noise and internal metadata while retaining everything needed for coherent report writing.
What is included
- Case metadata: Case number, title, status, severity, case mode, creation and closure dates, case type
- Team information: Lead investigator and investigator names
- Timeline events: All visible timeline entries (hidden entries are excluded at the query level). This is typically the most important data source for narrative coherence. The
is_manualflag is stripped as it is only relevant to the UI. - Evidence items: Item names, descriptions, types, statuses, notes, and flags (simplified to names only). Internal UUIDs and parent references are removed. Attachment metadata is simplified to filename, category, and description.
- Indicators of compromise: Value, STIX type, classification (benign/suspicious/malicious), confidence level, TLP designation, tags, public notes, and case context notes. Enrichment data is reduced to provider name and finding severity only — raw enrichment blobs are stripped.
- Case notes: Note content and authorship. The
show_on_timelinedisplay preference is stripped. - Todo checklist: Only items with status in_progress or done. Items that are not started or skipped are excluded.
- CAN report history: The two most recent CAN report versions (older versions are trimmed).
- Existing report content: When generating a report section, the content of other report sections is included so the LLM can avoid repetition and maintain cross-section consistency.
What is excluded
- Internal UUIDs and database identifiers
- Encryption keys and security tokens
- File contents and attachment binary data (only metadata is sent)
- Raw enrichment data blobs from threat intelligence providers
- Empty or null fields (stripped to save tokens)
- UI-only flags (
is_manual,show_on_timeline) - Hidden timeline events
- Inactive todo items
- Indicator details beyond what is analytically relevant (e.g., internal timestamps, normalization data)
Data sensitivity: The case data snapshot includes case content, investigator names, IOC values, and notes. Ensure your LLM provider's data handling terms are compatible with the sensitivity level of your investigation data. TLP designations on indicators are preserved in the data sent to the LLM.
Prompt Architecture
DFIRe uses a two-part prompt structure for all AI generation requests. Administrators have full visibility into both parts.
System message
The system message is a built-in instruction that sets the LLM's role and output format. It is read-only and cannot be modified by users. Its purpose is to ensure the LLM responds in the expected format (e.g., valid JSON for CAN reports, Markdown for report sections) and behaves as a forensic analyst.
The system message can be viewed in the Settings UI by expanding the "Built-in System Instructions" panel under the CAN Report AI Prompt section.
User prompt
The user prompt provides content guidance and includes the case data. This is where administrators control what the LLM writes about. The user prompt is fully customizable:
- CAN reports: A single prompt template with a
{case_data}variable, configured under Settings → Reporting → CAN Report AI Prompt. Leave empty to use the built-in default. - Report sections: Each section template has its own AI prompt with
{case_data},{section_title}, and{writing_guide}variables. A default prompt can be loaded using the "Use Default Prompt" button and then customized per section.
Additional instructions
When generating a CAN report, users can provide free-text additional instructions that are appended to the prompt. This allows per-generation guidance without changing the template (e.g., "Focus on the network intrusion timeline" or "Keep the report concise for executive stakeholders").
Template variables
| Variable | Available In | Replaced With |
|---|---|---|
{case_data} |
CAN prompts, section prompts | Minified case data JSON |
{section_title} |
Section prompts only | Name of the section being generated (e.g., "Executive Summary") |
{writing_guide} |
Section prompts only | The section's writing guide text, if configured |
Configuration
LLM integration is configured in Settings → AI / LLM (requires superuser access).
LLM Provider Settings
| Setting | Description |
|---|---|
| Provider | LLM service provider (OpenAI, Azure, Anthropic, Google, Bedrock, or custom) |
| Model | Model identifier string passed to LiteLLM (e.g., gpt-4o, azure/my-deployment) |
| API Key | Authentication credential for the provider. Stored encrypted, never displayed after saving. |
| Base URL | Optional custom API endpoint for Azure deployments or self-hosted models |
| Temperature | Controls response randomness (0.0 = deterministic, 2.0 = creative). Default: 0.4 |
| Max Tokens | Maximum length of the generated response |
After entering your credentials, use the Test Connection button to verify the configuration. A successful test confirms that DFIRe can reach the provider and authenticate.
Prompt Settings
AI prompt templates are configured under Settings → Reporting:
- CAN Report AI Prompt: At the bottom of the Reporting settings page. View the built-in system message, edit the user prompt, or reset to the default.
- Section AI Prompts: Expand any editable section template card and enable AI generation to configure its prompt. Use the "Use Default Prompt" button to start from the built-in default.
See Reporting configuration for the full list of section template settings.
Security and Privacy
Credential handling
- API keys are stored using Fernet encryption (AES-128-CBC) via
EncryptedCharField - Keys are never returned in API responses after being saved (write-only field)
- The encryption key (
CREDENTIAL_ENCRYPTION_KEY) is separate from the DjangoSECRET_KEY
Access control
- LLM configuration requires superuser access
- Report generation requires case team membership (same permissions as editing the report)
- AI prompt template management requires superuser access
- The LLM status check (used to show/hide AI buttons) is available to all authenticated users
Output safety
- All LLM output is sanitized through the same Markdown sanitizer used for user-written content, preventing stored XSS
- CAN report responses are validated as JSON with exactly the three expected keys before being accepted
- Generated content is presented as a draft for human review before being saved
Audit trail
- All generation requests are logged in the audit system with the requesting user and case context
- The LLM provider and model used are recorded with each generation event
Important: DFIRe sends case data to an external LLM provider. Your organization's information security policies and any applicable regulations (GDPR, data residency requirements, client confidentiality agreements) should be reviewed before enabling this feature. Consider using a self-hosted model or a provider with appropriate data processing agreements in place.
Limitations
- No conversation memory: Each generation request is independent. The LLM does not remember previous requests or maintain context across generations.
- Token limits: Very large cases may exceed the LLM's context window. The minification pipeline reduces this risk, but cases with hundreds of evidence items or extensive timelines may need to be summarized manually.
- Output quality varies: LLM-generated content should always be reviewed by a qualified analyst. The output may contain inaccuracies, miss important context, or draw incorrect conclusions.
- Generated sections cannot use AI: Auto-generated sections (Title Page, Table of Contents, Evidence Inventory, etc.) are populated from structured data and do not support AI generation.
- No streaming: Responses are returned as a complete block after the LLM finishes generating. This may take several seconds for longer outputs.