AI Integration - DFIRe Documentation

Overview

DFIRe connects to LLM providers via API, supporting OpenAI, Anthropic, Google Gemini, Azure OpenAI, and GitHub Models. The same configuration workflow applies regardless of which provider you use.

Key design principles:

On-demand only: AI generation is always triggered by an explicit user action. Nothing is sent to the LLM automatically.
Stateless: Case data is sent per request. DFIRe does not maintain conversation history or context across requests with the LLM.
Transparent prompts: Administrators can view and customize every prompt sent to the LLM, including the built-in system instructions.
Output is always a draft: Generated content is presented to the user for review. It is never published or saved without human approval.

Optional feature: AI integration requires configuration. If no LLM provider is configured, all AI-related controls are hidden from the interface. DFIRe is fully functional without AI features.

Supported Providers

DFIRe supports the following LLM providers via API:

Provider	Model Format	Notes
OpenAI	`gpt-4o`, `gpt-4`, `o3-mini`	Direct OpenAI API
Anthropic	`anthropic/claude-sonnet-4-20250514`	Claude models via Anthropic API
Google Gemini	`gemini/gemini-2.5-pro`	Gemini models via Google AI
Azure OpenAI	`azure/<deployment-name>`	Requires base URL pointing to your Azure endpoint
GitHub Models	`github/<model-name>`	Models available via GitHub Models marketplace

How It Works

When a user requests AI-generated content, DFIRe performs the following steps:

Case data collection
DFIRe assembles a structured JSON snapshot of the case. This includes case metadata, timeline events, evidence items, indicators of compromise, notes, CAN report history, and team information. The same data structure is used regardless of which report type is being generated.
Data minification
The raw case JSON is optimized for token efficiency before being sent to the LLM. This post-processing step removes information that does not contribute to report quality while preserving all analytically relevant content. See Data Sent to the LLM for details.
Prompt assembly
The minified case data is inserted into the prompt template using the {case_data} variable. The prompt is assembled from two parts: a system message (sets the output format and analyst role) and a user message (provides content guidance and the case data). For report sections, the {section_title} and {writing_guide} variables are also replaced.
LLM request
The assembled prompt is sent to the configured LLM provider via LiteLLM. Temperature and max token settings from the configuration are applied.
Output sanitization
The LLM response is sanitized to remove any potentially harmful content (script injection, etc.) before being presented to the user. For CAN reports, the response is additionally validated as JSON with the expected structure.
User review
The sanitized output is displayed to the user as a draft. The user can accept, edit, or discard the generated content.

Data Sent to the LLM

Understanding what data leaves your environment is critical for security and compliance. DFIRe sends a minified JSON snapshot of the case as part of the prompt. The minification step removes noise and internal metadata while retaining everything needed for coherent report writing.

What is included

Case metadata: Case number, title, status, severity, case mode, creation and closure dates, case type
Team information: Lead investigator and investigator names
Timeline events: All visible timeline entries (hidden entries are excluded at the query level). This is typically the most important data source for narrative coherence. The is_manual flag is stripped as it is only relevant to the UI.
Evidence items: Item names, descriptions, types, statuses, notes, and flags (simplified to names only). Internal UUIDs and parent references are removed. Attachment metadata is simplified to filename, category, and description.
Indicators of compromise: Value, STIX type, classification (benign/suspicious/malicious), confidence level, TLP designation, tags, public notes, and case context notes. Enrichment data is reduced to provider name and finding severity only — raw enrichment blobs are stripped.
Case notes: Note content and authorship. The show_on_timeline display preference is stripped.
Todo checklist: Only items with status in_progress or done. Items that are not started or skipped are excluded.
CAN report history: The two most recent CAN report versions (older versions are trimmed).
Existing report content: When generating a report section, the content of other report sections is included so the LLM can avoid repetition and maintain cross-section consistency.

What is excluded

Internal UUIDs and database identifiers
Encryption keys and security tokens
File contents and attachment binary data (only metadata is sent)
Raw enrichment data blobs from threat intelligence providers
Empty or null fields (stripped to save tokens)
UI-only flags (is_manual, show_on_timeline)
Hidden timeline events
Inactive todo items
Indicator details beyond what is analytically relevant (e.g., internal timestamps, normalization data)

Data sensitivity: The case data snapshot includes case content, investigator names, IOC values, and notes. Ensure your LLM provider's data handling terms are compatible with the sensitivity level of your investigation data. TLP designations on indicators are preserved in the data sent to the LLM.

Prompt Architecture

DFIRe uses a two-part prompt structure for all AI generation requests. Administrators have full visibility into both parts.

System message

The system message is a built-in instruction that sets the LLM's role and output format. It is read-only and cannot be modified by users. Its purpose is to ensure the LLM responds in the expected format (e.g., valid JSON for CAN reports, Markdown for report sections) and behaves as a forensic analyst.

The system message can be viewed in the Settings UI by expanding the "Built-in System Instructions" panel under the CAN Report AI Prompt section.

User prompt

The user prompt provides content guidance and includes the case data. This is where administrators control what the LLM writes about. The user prompt is fully customizable:

CAN reports: A single prompt template with a {case_data} variable, configured under Settings → Reporting → CAN Report AI Prompt. Leave empty to use the built-in default.
Report sections: Each section template has its own AI prompt with {case_data}, {section_title}, and {writing_guide} variables. A default prompt can be loaded using the "Use Default Prompt" button and then customized per section.

Additional instructions

When generating a CAN report, users can provide free-text additional instructions that are appended to the prompt. This allows per-generation guidance without changing the template (e.g., "Focus on the network intrusion timeline" or "Keep the report concise for executive stakeholders").

Template variables

Variable	Available In	Replaced With
`{case_data}`	CAN prompts, section prompts	Minified case data JSON
`{section_title}`	Section prompts only	Name of the section being generated (e.g., "Executive Summary")
`{writing_guide}`	Section prompts only	The section's writing guide text, if configured

Configuration

LLM integration is configured in Settings → AI / LLM (requires superuser access).

LLM Provider Settings

Setting	Description
Provider	LLM service provider (OpenAI, Azure, Anthropic, Google, Bedrock, or custom)
Model	Model identifier string passed to LiteLLM (e.g., `gpt-4o`, `azure/my-deployment`)
API Key	Authentication credential for the provider. Stored encrypted, never displayed after saving.
Base URL	Optional custom API endpoint for Azure deployments or self-hosted models
Temperature	Controls response randomness (0.0 = deterministic, 2.0 = creative). Default: 0.4
Max Tokens	Maximum length of the generated response

After entering your credentials, use the Test Connection button to verify the configuration. A successful test confirms that DFIRe can reach the provider and authenticate.

Prompt Settings

AI prompt templates are configured under Settings → Reporting:

CAN Report AI Prompt: At the bottom of the Reporting settings page. View the built-in system message, edit the user prompt, or reset to the default.
Section AI Prompts: Expand any editable section template card and enable AI generation to configure its prompt. Use the "Use Default Prompt" button to start from the built-in default.

See Reporting configuration for the full list of section template settings.

Security and Privacy

Credential handling

API keys are stored using Fernet encryption (AES-128-CBC) via EncryptedCharField
Keys are never returned in API responses after being saved (write-only field)
The encryption key (CREDENTIAL_ENCRYPTION_KEY) is separate from the Django SECRET_KEY

Access control

LLM configuration requires superuser access
Report generation requires case team membership (same permissions as editing the report)
AI prompt template management requires superuser access
The LLM status check (used to show/hide AI buttons) is available to all authenticated users

Output safety

All LLM output is sanitized through the same Markdown sanitizer used for user-written content, preventing stored XSS
CAN report responses are validated as JSON with exactly the three expected keys before being accepted
Generated content is presented as a draft for human review before being saved

Audit trail

All generation requests are logged in the audit system with the requesting user and case context
The LLM provider and model used are recorded with each generation event

Important: DFIRe sends case data to an external LLM provider. Your organization's information security policies and any applicable regulations (GDPR, data residency requirements, client confidentiality agreements) should be reviewed before enabling this feature. Consider using a self-hosted model or a provider with appropriate data processing agreements in place.

Limitations

No conversation memory: Each generation request is independent. The LLM does not remember previous requests or maintain context across generations.
Token limits: Very large cases may exceed the LLM's context window. The minification pipeline reduces this risk, but cases with hundreds of evidence items or extensive timelines may need to be summarized manually.
Output quality varies: LLM-generated content should always be reviewed by a qualified analyst. The output may contain inaccuracies, miss important context, or draw incorrect conclusions.
Generated sections cannot use AI: Auto-generated sections (Title Page, Table of Contents, Evidence Inventory, etc.) are populated from structured data and do not support AI generation.
No streaming: Responses are returned as a complete block after the LLM finishes generating. This may take several seconds for longer outputs.