What I've Learned in Five Years of Conducting Code Audits

META

Activist
SUPREME
MEMBER
Joined
Mar 1, 2026
Messages
118
Reaction score
378
Deposit
0$
When I was at PKC, my team led about thirty code audits. Many of them were for startups that had raised a Series A or B—this is the stage when founders typically have raised money, are distracted from their all-out focus on going to market, and realize they need to focus more on security.

The work was fascinating: we dug deep into projects with a wide variety of stacks and architectures, and from various programming domains. We found security issues of all kinds, from catastrophic to simply amusing. We also had the opportunity to interview senior developers and CTOs about more general topics—for example, what technical and other challenges they encountered during their initial growth period.

Another interesting thing was that seven or eight years had passed since the first audits, giving us the opportunity to see which projects succeeded and which fizzled. I'd like to share some unexpected insights I drew from these observations. I'll start with the most general things, and then I'll move on to what concerns security specifically.
1. You don't need hundreds of developers to create a great product. I've written about this in more detail before, but to reiterate the gist: although the startups we analyzed were at roughly the same stage of development, team sizes varied wildly. Surprisingly, the most impressive products with a wide range of features were sometimes the brainchild of small teams. And those same "small but mighty" teams went on to dominate the market years later.

2. Simplicity produces better results than cleverness. As a self-proclaimed snob, it's hard for me to say this, but the fact remains: of all the startups we examined, the ones that resolutely adhered to the KISS principle in development were the ones that fared best. Cleverness for the sake of cleverness was rejected in these teams. On the other hand, the projects that we said "wow, how cleverly done" have largely faded into oblivion. To summarize, the most common ways to shoot yourself in the foot (I discussed these methods in another article ) were adopting microservices too early, architectures relying on distributed computing, and message-driven designs.

3. The most significant results were achieved in the first and last hours of the audit. If you think about it, this makes sense. In the first hours of an audit, you pick out everything that's on the surface. What's really obvious is easily discovered with grep and basic functionality testing. By the last hours, you're fully immersed in the context, and insights begin to dawn.

4. Over the past ten years, securing your software has become much easier. I don't have reliable statistics to back this up, but it seems to me that code written before 2012 had a much higher number of vulnerabilities per line than code written later (we started auditing in 2014). Perhaps it's Web 2.0 frameworks, or perhaps programmers have simply become more security-conscious. Whatever the case, I believe this means security has indeed improved in terms of the tools and baseline available to developers.

5. The most dangerous vulnerabilities are always the most obvious. In about a fifth of our audits, we encountered a Major Miscalculation—a vulnerability so serious that we immediately called the client and told them to fix it as quickly as possible. I don't recall a single case where this vulnerability turned out to be something clever. In fact, their banality was partly what made the threat so serious—we were concerned precisely because they were so easy to detect and exploit. "Probability of detection" has long been a part of impact analysis metrics, so the idea itself is not new. However, I still think this factor is not given enough attention. When it comes to real attacks, probability of detection is crucial. Hackers are lazy; they follow well-trodden paths. They won't bother with heap spraying, even if there are very serious errors in memory management, if it's possible to set a new user password because the corresponding token was in the response to the request (Uber had the opportunity to see this in 2016). One could argue that focusing on the probability of detection leads to the principle of "security through obscurity," which prioritizes guesswork about what a hacker might or should know. However, I repeat: in my personal experience, the probability of detection is directly related to the probability of an exploit.

6. Security by default in frameworks and infrastructure has significantly improved software security. I've written about this before , but in short: things like automatic escaping for any HTML in React to prevent cross-site scripting or serverless stacks that configure the operating system and web server for the developer have radically strengthened cybersecurity in companies that use these capabilities. Compared to our PHP project audits, the latest ones are simply teeming with XSS. It's not that new stacks and frameworks are impenetrable, but they have fewer attack surfaces in precisely the areas that matter in practice.

7. Audits are much easier in monorepositories. From a security analysis perspective, auditing monorepositories was easier than checking a series of services spread across different codebases. With monorepositories, there was no need to write wrapper scripts for our various tools. It was easier to determine whether a particular code fragment was used elsewhere. And most importantly, there was no need to worry about a different version of a common library in another repository.

8. You could spend an entire audit delving into the labyrinth of vulnerabilities in dependency libraries. It's incredibly difficult to determine whether a vulnerability in a dependency poses an attack threat. It's safe to say that our industry invests far too little in securing foundational libraries, which is why Log4j and the like were so important. Node and npm are simply terrible in this regard – their dependency chains are a complete mess. When GitHub released dependabot, it was a godsend. Now we can mostly just recommend that clients upgrade in order of priority.

9. Never deserialize untrusted data. This happens most often in PHP – for some reason, PHP developers love serializing and deserializing instead of using JSON. I'd say almost every case we've seen where a server deserialized and parsed a client object resulted in a horrific exploit. For those who haven't seen it yet, Portswigger has a good breakdown of what can go wrong in these scenarios (using PHP as an example, by the way – coincidence?). In short, the common thread running through all deserialization vulnerabilities is this: when a user gains control over an object that will later be used by the server, it creates very powerful capabilities with a wide surface area. Conceptually, this is close to prototype pollution and user-generated HTML templates. How to fix this? It's much better to allow the user to send JSON objects (they are very limited in the possible data types) and manually construct the object based on the fields it represents. It's a bit more work, but definitely worth it.

10. Business logic flaws were rare, but they were spot-on. If you think about it, business logic flaws will definitely impact the business. There's an interesting corollary here: even if your protocol was designed with provable security in mind, human error in the form of flawed business logic makes its presence felt surprisingly often (just recall the series of disastrous exploits that resulted from poorly written smart contracts).

11. Custom fuzzing has proven surprisingly effective. After a couple of years of code auditing, I began insisting that all audits include the creation of custom fuzzers for testing the product's API, authentication, and so on. This is a fairly common practice; I personally stole the idea from Thomas Ptacek—he mentions it briefly in his article on recruiting.Until we started doing this ourselves, I was inclined to consider fuzzing a waste of time. I thought it was a misplaced developer resource, and those hours would be better spent reading code and testing various hypotheses. But contrary to my expectations, fuzzing turned out to be an effective and profitable investment of time, especially when working on large codebases.

12. Company mergers significantly complicated security work. More programming patterns to learn, more AWS accounts to review, more diversity in SDLC tools. And of course, mergers often meant the emergence of an entirely new language and/or framework with its own applicable patterns.

13. There's usually at least one data security enthusiast lurking among developers. The identity of these enthusiasts was usually a surprise to everyone, and they themselves weren't even aware of it. Nowadays, cybersecurity and programming skills have become more convergent, so identifying such a person can be quite rewarding.

14. The overall professional level of the company generally correlated with the speed of vulnerability remediation. Our best clients usually asked to simply report everything they found in real time so they could address the issues immediately.

15. With JWT tokens and webhooks, almost no one got it right on the first try. With webhooks, people often forgot to authenticate incoming requests (or the service they were using didn't allow it... a pretty strange situation, if you think about it). This type of problem led to Josh, one of our employees, asking clients a series of questions and eventually preparing a talk at DefCON/Blackhat . It's no secret that JWTs are difficult to work with, even if you use a library. We've encountered numerous implementations where tokens didn't expire correctly on exit, JWT authentication failed, or simply didn't work by default.

MD5 is still actively used, but these are mostly false positives. It turns out that MD5 has many other uses besides (not sufficiently) collision-resistant password hashing. For example, due to its high speed, it's often used in automated testing to quickly generate a bunch of pseudo-random GUIDs. In such cases, MD5's weak properties don't matter at all, no matter what your statistical code analysis tool might tell you.

I'm curious: have any of you noticed the same patterns? Or maybe others? If you disagree, please let me know in the comments!
 
Top Bottom