Threat modeling is a structured, proactive approach to identifying potential security threats, vulnerabilities, and attack vectors in a system or application. It helps teams understand what could go wrong, and how to fix it before it becomes a real issue. Ideally, threat modeling is performed early in the design phase, and continuously updated as systems evolve.

At its core, threat modeling is about asking four key questions:

What are we building?
What can go wrong?
What are we going to do about it?
Did we do a good job?

Threat modeling involves mapping system components, data flows, trust boundaries, and then methodically analyzing potential threats using frameworks like STRIDE. Traditionally, this process relies on human-led workshops, whiteboarding, and security experts working closely with architects and developers. While invaluable for security, manual threat modeling is often time-consuming, requires specialized expertise, and can become a bottleneck in modern fast-paced development environments.

Understanding STRIDE: A Common Threat Modeling Framework

Before we talk about AI, it’s worth grounding ourselves in STRIDE, one of the most widely adopted threat modeling methodologies. Developed by Microsoft, STRIDE provides a simple mnemonic for categorizing threats based on security properties:

Spoofing – Pretending to be something or someone else
Tampering – Modifying data or code
Repudiation – Denying actions without a trace
Information Disclosure – Exposing sensitive data
Denial of Service – Disrupting system availability
Elevation of Privilege – Gaining unauthorized access or control

By mapping system components and data flows to these categories, security teams can systematically explore what might go wrong. STRIDE is highly effective, but applying it manually to every service, API, or change can be labor-intensive—especially in fast-paced DevOps environments.

Introducing GenAI to Threat Modeling

With the emergence of generative AI, we can now augment the threat modeling process with automation and intelligence. By feeding LLMs contextual information such as architecture diagrams, API specs, or user stories, we can generate highly relevant threats and mitigations.

This is especially powerful in a DevSecOps environment where speed, automation, and repeatability are critical.

STRIDE GPT: Automating Threat Modeling

STRIDE GPT is an AI-powered threat modelling tool that leverages Large Language Models (LLMs) to generate threat models and attack trees for a given application based on the STRIDE methodology. Users provide application details, such as the application type, authentication methods, and whether the application is internet-facing or processes sensitive data. The model then generates its output based on the provided information.

By providing structured input—such as a data flow diagram (DFD), OpenAPI specification, or infrastructure-as-code snippet—STRIDE GPT can generate:

Potential threats for each STRIDE category
Relevant mitigations aligned with secure design principles
Developer-friendly explanations or stories of how a threat might manifest

Getting Started: Running STRIDE GPT via Docker

Before we dive into how generative AI can help with threat modeling, let’s start with how you can get STRIDE GPT up and running on your own machine. The easiest way to try it is by running it locally using Docker.

What You’ll Need:

Docker installed
OpenAI API Key
Add API Key to .env file (example on STRIDE GPT github - https://github.com/mrwadams/stride-gpt/blob/master/.env.example)

Download and Run the Docker Image

pull mrwadams/stridegpt:latest

docker run -p 8501:8501 --env-file .env mrwadams/stridegpt

This will start a local web-based interface at http://localhost:8501/.

Once it’s running, you can start experimenting by inputting descriptions of your own systems, or follow along with the MicroBlog case study.

Real-World Example: Threat Modeling the MicroBlog App with STRIDE GPT

To show how generative AI can streamline threat modeling, let’s walk through an example using a fictional microblogging platform—MicroBlog—which mirrors the architecture of a real-world app like Twitter or X.

MicroBlog includes core components such as user authentication, tweet posting, follower relationships, and an admin moderation panel. It’s built with a service-oriented architecture, including web and mobile frontends, a REST API, and internal microservices handling tweets, social graph, and notifications.

By providing STRIDE GPT with a structured description of this application—its components, data flows, and trust boundaries—we can quickly generate a contextual threat model.

What We Provided to STRIDE GPT:

A narrative application overview (services, data, users, trust zones)
Data flow details (e.g., user login, posting a tweet, following someone)
Information about authentication mechanisms (e.g., JWT, OAuth)
Security controls (e.g., role-based access, rate limiting)

What STRIDE GPT Produced:

Threats categorized by STRIDE: For each component and data flow, STRIDE GPT suggested realistic threats such as token replay (spoofing), unauthorized role changes (EoP), or rate-limit bypass (DoS).
Mitigation strategies: Suggestions included best practices like token binding, strict input validation, and enforcing principle of least privilege.
Human-readable threat scenarios: For example, “An attacker could tamper with tweet metadata using an unvalidated API request.”

This AI-assisted process drastically reduces the time required to produce a meaningful threat model, especially valuable in environments where traditional workshops are too slow or costly.

Breakdown of Prompt Techniques in STRIDE GPT

Because STRIDE GPT is open source, we can look at the prompting that is being used to interact with the LLM.

https://github.com/mrwadams/stride-gpt/blob/master/threat_model.py

1. Role Prompting (Context Setting)

“Act as a cyber security expert with more than 20 years experience of using the STRIDE threat modelling methodology…”

This is a context-establishing technique. It tells the model to “think like” a seasoned security expert. This does a few things:

Sets expectations for expert-level language and reasoning
Encourages thoroughness and domain-specific accuracy
Triggers the model to recall relevant knowledge from STRIDE methodology

This kind of role definition helps GenAI models output more contextually grounded results, especially in security or technical domains.

2. Explicit Structure & JSON Output Instruction

“When providing the threat model, use a JSON formatted response with the keys…”

This is a classic structured output prompt that ensures the response is predictable and machine-readable:

Telling the model exactly how to format its output (especially in JSON) increases success in programmatic consumption and pipeline integration.

3. Few-Shot Prompting (Single Example)

“Example of expected JSON response format:”

This includes a single-shot example of what a correct response looks like. This technique:

Anchors the output style (capitalization, label naming, value types)
Demonstrates the level of detail expected in each field
Gives the model a “template” to imitate

Models are more consistent and compliant when shown an example, especially for multi-part or nested outputs.

4. Instructional Constraints

“Do not provide general security recommendations – focus only on…”

This narrows the scope of what the model should do, keeping it on-task. It avoids generic advice and reinforces specificity.

Without guardrails, models may drift into boilerplate answers like “Always use HTTPS”—which, while true, are not helpful for a threat model tied to a specific app context.

5. Variable Substitution for Automation

APPLICATION TYPE: {app_type}

CODE SUMMARY, README CONTENT, AND APPLICATION DESCRIPTION: {app_input}

This enables dynamic prompting based on user input or CI/CD pipeline data. The function is meant to receive structured metadata and plug it into a rich, consistent prompt.

It allows repeatable automation while maintaining high-quality results from the model.

6. Gap Analysis Prompting

“Under improvement_suggestions, include an array of strings that suggest what additional information or details the user could provide…”

This is a really smart design. It gets the model to do meta-reasoning—not just threat modeling, but assessing what’s missing.

This enables iterative threat modeling—you can refine your inputs based on what the model says it needs more of.

How It Works in CI/CD

Here’s how this fits into a DevSecOps pipeline:

Design: Engineers commit architecture descriptions or data flow diagrams to version control. A pre-commit hook or pipeline job passes that info to STRIDE GPT.
Threats: AI suggests STRIDE-aligned threats and security controls. These can be included in pull request comments, security tickets, or sent to security dashboards.
Mitigation: Development and security teams review findings and implement necessary controls.

Benefits of GenAI-Augmented Threat Modeling

Scalability – Helps smaller security teams support dozens or hundreds of dev teams.
Speed – Automates the “what can go wrong?” phase so teams can focus on risk decisions.
Consistency – Enforces STRIDE across the organization, reducing variance in threat model quality.
Shift Left – Embeds security early into design, not after code is written.

The true value of generative AI in threat modeling isn't just automation—it's augmentation. STRIDE GPT doesn't replace security experts but rather expands their reach and effectiveness:

Security experts focus on novel, complex threats while AI handles routine analysis
AI suggestions serve as a starting point that human experts can refine
The combination of human creativity and machine pattern recognition creates more robust security outcomes

Caveats and Human-in-the-Loop Considerations

While powerful, AI-enhanced threat modeling has limitations:

AI systems are only as good as their training data and may miss novel attack vectors
Organizations must validate AI-generated findings and avoid over-reliance on automation
Privacy considerations around sharing sensitive system designs with AI tools
Need for human oversight to contextualize threats within business priorities

Final Thoughts

GenAI won’t replace expert threat modelers, but it can accelerate and scale their efforts. In high-velocity environments, pairing human expertise with AI-augmented insights enables better security outcomes without slowing down innovation.

STRIDE GPT is just the beginning. The future of threat modeling is intelligent, contextual, and continuous, just like modern software delivery.

From Whiteboards to LLMs: Automating STRIDE Threat Models with GenAI