It has taken me a long time to get around to writing this post, mostly because having an opinion about threat modeling can be so polarising. I’m expecting to be told “you are wrong!", “that is not what threat modeling is!", and “that is not how you threat model!". Fortunately, this is the internet, and we all get to have our own wrong opinions.
What follows are some of my personal views on threat modeling, how I approach threat modeling and what has worked for me (both as a Platform Security Engineer and vulnerability researcher). I’ve been fortunate enough to work with, and to learn threat modeling from, Wade Winright (@vashta_nerdrada) who has shaped much of my thinking. The rest boils down to things learned during four years of daily threat modeling.
It is about communication and risk
One of the primary functions of threat modeling is to provide a means of effectively communicating security threats. Threat models are not for actively identifying vulnerabilities. The idea that threat modeling will help eliminate all vulnerabilities from the system is a misconception I’ve often seen in new threat modeling programs. Another common use of threat modeling is to simply tick a compliance tick-box. Yes, a mature threat modeling program will help eliminate some vulnerabilities, but this is as-much a by-product of increased security awareness, as it is from actual issues identified through threat modeling.
Security is often seen as an add-on component that needs to be applied to the SSDL or be applied because “security”. Threat modeling provides the opportunity to talk about security as a functional component of the system. This is something that Jamie Arlen spoke about recently, and probably one of the biggest and most impactful things I’ve experienced at Heroku and Salesforce. For example, it is possible to approach a threat model and warn of
denial of service (part of STRIDE), but that might not mean much to engineering as it is seen as meaning a script kiddie will attempt to crash the service. However, when you start talking about “availability” or “throughput” being affected, the conversation switches to a core functional component of the system that engineering cares about. This is where threat modeling as a communication mechanism gains real value; not talking about vulnerabilities, threats, risks or mitigations, but rather in identifying and communicating about the parts of the system that matter. Being deliberate and precise™ Trey Ford, in these threat modeling sessions is important and ensures each party knows the goal of the exercise before starting. Especially in newer threat modeling programs or when working with teams that haven’t done threat modeling before, start off each session with the stated aims of the exercise. I like to remind teams that we aren’t looking for vulnerabilities or find fault in the proposed (or deployed) design. Rather, the session is about identifying the key assets and functionality we are aiming to secure and then about identifying how we can secure those.
Approaching threat modeling
I’m going to repeat myself, but one of the first things that turn engineering teams off about threat models is that they fear they are “not security people” and are unable to think-of or identify all threats. This is mainly because they are lead to believe that the threat model is there to identify vulnerabilities or bugs in their code/design. When you start breaking it down to the basics, starting with a data flow diagram and walking through the function of the system, and asking “what is it we care about?” that the real threat and mitigation identification happens. There are three approaches that I like to take when starting a threat model;
- Security solo start - security starts modeling the system without engineering’s direct input
- Engineering solo start - let engineering model the system without security’s direct/immediate input
- Collaborative start - start the threat model together
In my experience, it is worth trying all three to see which works best for the team you are working alongside. Although the three approaches start differently, they should all end in a collaborative effort where both the security team and the engineering team work together, walk through the threat model together and be able to explain the function of the system, what is important and how is that being protected. Your approach should also take into account the stage a system is in. Is this a threat model for a newly designed system or a retroactive threat model of a system that has been in production for years?
Security solo start - Start the threat model without engineering
As it says on the tin, you as a security team start threat modeling the system as you see it. This means gathering and studying documentation, interacting with the system, and drawing up how you see the system fitting together. With this approach, I commonly end up with one or more data flow diagrams describing the system as I understand it. Once these data flow diagrams have been created, you walk through them with the engineering team. At this point, you either have the engineering team confirming that you have understood the system correctly, and you can start working together on fleshing out the threat model. Or your engineering team tells you that you are completely wrong about how the system is put together. This second situation usually triggers two things; firstly, the engineering team discovers flaws in their own documentation (or lack there-of) or over-complexity of design. Secondly, teams may find that they didn’t fully understand the whole system or how complex parts are working together. This can lead to the discovery not only of security risks; but actual functional flaws.
Engineering solo start - engineering models the system
This should also only be done with teams that have done threat modeling before. Throwing an inexperienced engineering team a threat modeling challenge is more likely to antagonise them and will lead to a dismissal of threat modeling as a useful tool. The process is nearly the same as the above-mentioned approach and is ideally done in parallel to security’s own threat modeling (or a scaled-down version of). In my experience, solo starts represent the point at which teams start organically adopting threat modeling as part of their process, either thinking about the system with threats and mitigations in mind or using threat modeling (the data flows) as part of their documentation of the actual system function. Engineering teams also record the threats as they see them, these won’t always map directly to a model such as STRIDE, which is absolutely fine! The whole point here is to identify what is important in the system (functionality, assets, logic).
In this approach the security team and engineering team start the threat model together. Engineering walks security through the system functionality, and security guides them in drawing up data flows. Threats and mitigations can be identified during this functional walk-through. This collaborative effort continues with security raising possible threats and asking engineering’s opinion on them. You will likely find that engineering immediately starts throwing out mitigations for these threats. The best thing about these mitigations being mentioned is that they won’t always be inherently direct security mitigations but rather mitigations that exist to solve other problems in the system. This helps bridge that gap between security and engineering and brings us to a common communication platform as to what each mitigation does.
Keep it simple
A daunting concept when starting out with threat modeling is “how do I capture all the threats?”. A common trap I’ve seen folks falling into is the attempt to precisely model every threat and map those threats directly to vulnerabilities or bugs. For security practitioners starting off in threat modeling, you might try to create a “threat” of
SQL injection could lead to code execution; while that is accurate, it also pushes you into a realm of trying to model every single vulnerability you can think of, turning this into a check-boxing exercise or vulnerability assessment, rather than modeling the system state. This also applies to engineering teams starting out with threat modeling, you will often hear “I can’t threat model this because I’m not into security. I don’t know security vulnerabilities”. This mindset frustrates engineers and turns them off to the idea of a threat model as they see it as a vulnerability/bug hunting exercise.
A good threat model, in my opinion, does not dive into those individual vulnerabilities or bugs, but rather starts off with the mitigations that exist for a set of vulnerabilities, or more accurately said, a set of threats. When working at Sensepost I was introduced to the concept of
Assumed breach or the
zero-day card. This was a concept Haroon Meer (@haroonmeer) brought into every assessment, sure you can spend all your time trying to find every single vulnerability, but is that bringing value to the target? At the same time, you could spend five days proving that there are no low hanging fruit to exploit on the target perimeter, but does that give a true picture of the target’s security posture? With the
zero-day card, you start from the position of assumed breach, that an attacker possesses a zero-day and is able to get into the target environment. This is where the real assessment starts. How do you go from that assumed breach system to identifying what is of value to the target, and how would I gain access to (and how would I protect) that target asset? This is the scenario where I try and start all my threat models from, I assume there is going to be a vulnerability, it isn’t the role of the threat model to identify or prevent that possible vulnerability, but rather to identify what we need to protect and the mitigations necessary to make that happen.
Applying this assumed breach outlook to the previously mentioned
SQL injection case, the threat model would start out with the fact that you have an application that accepts user input and stores that into a database. The threat model assumes that
SQL injection will occur and tries to limit the impact of that. What are the mitigations in this case? Least privilege access for the user role used in the query is a start and a mitigation. This limits both the amount of data that can be accessed, as well as what can be done on the database (
COPY FROM PROGRAM as command execution examples). Next, you assume a flaw exists in your least privileged access, or maybe a zero-day exists that allows privilege escalation in the database, what mitigation exists to limit the impact of command execution from within the database process? Is the database process running in an isolated system? For example, a container? What are the protections applied to that container? Is there a least privileged system user used for the database process? How is the network access from the database host limited? All of these mitigations apply not to prevent SQL injection but rather to prevent further escalation from that point or to limit the blast radius.
This is an iterative process whereby you identify your mitigations and, at each step, assume that the mitigation is going to fail. Then move into identifying further mitigations, think a swiss cheese model. You aren’t looking for one magic mitigation, you are looking for a set of mitigations that can work in unison. And even then, you may assume a catastrophic failure of the whole system, just don’t drive yourself crazy trying to go down the rabbit hole of having a mitigation for every other mitigation. The goal is to reduce relative risk, while balancing system requirements and usability. A oft-overlooked mitigation, and one Wade always reminds me of, is the use of instrumentation. It may not prevent the black-swan event from happening, but it sure is nice knowing that it has happened. Knowing how to react is as important as being able to prevent.
Keeping it simple is not only about viewing the system from a 1000ft in terms of threats, but also keeping it simple for communicating about the system. Don’t try and cram every single bit of information into the data flow diagram. That is what architecture diagrams are for. Use the data flow diagram to simplify the function of the system and make it possible to talk about the distinct components of the system. The most important part of the data flow diagram is your trust boundaries. This goes back to communication as well; engineering teams use sequence diagrams to show the sequence of steps taken in a particular process, a data flow diagram often maps cleanly to these sequence diagrams.
Your threat model is not my threat model
The threat model you have in mind for a particular use-case won’t be the same as for another use-case. Yes, the same concepts are transferable across systems and threat models, however, building up a catch-all threat database will have limited effectiveness and your efforts are better spent refining your threat modeling process (and simplifying it).
This is not only when comparing risk in a system, for example, your bank’s threat model versus my lemonade stand’s threat model. The environment you are operating in plays a huge role and before starting your threat modeling process, it is important to actually understand that environment. To threat model something, you need to understand what it is trying to achieve. Clear communication and keeping it simple will help to understand the system and the components that matter.
The threat model won’t always map cleanly to the classifications in STRIDE, and this is a trap I’ve seen many fall into when starting out with threat modeling. When something doesn’t map 1:1 to the definition of STRIDE, then all of a sudden it can’t be a threat, but that is not the case! The threats you are identifying are the threats to your system and the assets that matter most to you. If you look closely enough at these threats, you might just find that there is actually some overlap with STRIDE. Remember, STRIDE is there as another tool for communication, it is a means of classifying threats, not a definition of what constitutes a threat. A good example for me is a PaaS platform, where the free tier of the platform is multi-tenant, the immediate temptation when threat modeling this is to look at that multi-tenant aspect as the part that needs protecting. Threats around Information Disclosure, Elevation of Privilege, Spoofing, and Tampering would be at the top of the list. What about Denial of Service? Yes, on the list, but what is the cause of that denial of service, and why does it matter? Denial of service can be the result of platform abuse, for example via cryptomining, this doesn’t only impact the customers of the platform but also the cost to run the platform. For each cryptominer, you need more resources, meaning higher costs. The negative experience of other users on the platform, which drives them from the platform, is also what matters. In this case, denial of service has just as high of an impact on the value of the platform as any other issue.
Threat model guided bug hunting
Yes, threat modeling is not about identifying vulnerabilities, but don’t let that stop you in using it to find vulnerabilities. That whole
assumed breach thing turns out to be really useful for working backwards towards finding vulnerabilities. Some of the best bugs I’ve found have been during the threat modeling of existing systems, assuming compromise and then asking myself “how would I have gotten to this point”. Your data flow diagrams are invaluable here since they provide a simplified break down of what comes in, which systems processes it, what comes back out, and, most importantly, where trust boundaries are being crossed. Mitigations in themselves sometimes function to remediate vulnerabilities, for example TLS mitigates tampering, information disclosure, and spoofing through PitM attacks, but it also addresses the vulnerability of clear-text transport.
There is a lot more to say about threat modeling and I hope to do some of that here. This post barely scrapes the surface however I feel it hits on some points worth considering and discussing. Threat modeling is an extremely useful tool in creating secure systems. At its base, threat modeling is simple, so keep it simple. Security and engineering often struggle to find a common means of communication, threat modeling, when used correctly, can help bridge this communication gap. At the day there are no wrong ways of threat modeling and even the most basic attempts at modeling a system will yield results. Keep in mind what matters (function, assets, logic) in the system and assume that it has been compromised, work backwards and identify mitigations. And don’t forget instrumentation.
A final thanks to Wade for showing me how effective threat modeling can be and how a well implemented threat modeling program not only improves overall security, but also goes a long way in strengthening the security-engineering partnership.
I haven’t covered a lot of the formal concepts that goes into threat modeling. These are all important and go hand-in-hand with these “soft skills” mentioned above. For me, these concepts are mostly grounded in the work of Adam Shostack who’s book I’d recommend as the best starting point for anyone wanting to get familiar with threat modeling.
- Wade Winright - https://twitter.com/vashta_nerdrada
- Jamie Arlen - https://twitter.com/myrcurial
- Security as a function - https://twitter.com/myrcurial/status/1365268663490740227
- Haroon Meer - https://twitter.com/haroonmeer
- SensePost (Orange Cyberdefense) - https://sensepost.com/
- Swiss Cheese model - https://en.wikipedia.org/wiki/Swiss_cheese_model
- Black Swan events - https://blog.heroku.com/bug-bounties-black-swans
- STRIDE - https://en.wikipedia.org/wiki/STRIDE_(security)
- threat modeling. designing for security (book) - https://adam.shostack.org/blog/category/threat-modeling/