Q: Is the NVD digging out of the deep hole it dug for itself since Feb. 2024? A: No

Note from Tom:

I have moved from Blogspot to Substack as the main platform for my blog. Because of my long relationship with Energy Central, I will continue to publish my new posts that have to do with the electric power industry (including my posts on NERC CIP) on EC for free. However, if you want to see all of my other new posts (including posts about AI, the cloud, and developments with vulnerability management and vulnerability databases), you need to become a paid subscriber.

A charge of $30 per year (or $5 for one month) gets you a subscription to all my new posts on Substack, as well as 1200+ legacy posts that go back to 2013 (when I started my blog). Close to half of those posts deal with NERC CIP. Please start reading me on both Substack and Energy Central!

 

If you’ve been reading my blog posts for a year or so, you probably know I’ve written a lot about what I consider to be the biggest problem in vulnerability management today: the fact that, starting in February 2024, the National Vulnerability Database (NVD) stopped doing something it had done for close to two decades: as quickly as possible adding a “CPE name” to every new CVE Record it downloaded from the CVE.org database.[i] CPE was one of the first machine-readable software identifiers, and is the only identifier that the NVD has supported for most of its existence.

In the last few days, my friend Andrey Lukashenkov of Vulners, along with Kyle Pazandak, has been posting more on this problem on LinkedIn. I asked him whether the NVD is at least beginning to keep the promise it has regularly been making since February 2024; his answer was effectively “No”. To back that up, he sent me a chart that he regularly updates and publishes:

The vertical axis of this chart shows the percentage of CVEs for all five categories shown below the graph; of course, this axis runs from 0 to 100. Since the NVD downloads almost all CVEs added to the CVE.org database during the time period shown on the bottom axis, every CVE created during that period (with a lag of probably a couple of days) is represented in this graph. Since the bottom axis covers January 1 through August 25, this means the graph shows in aggregate what happened to every new CVE published so far this year.

Andrey’s answer to my question in the title of this post isn’t intuitively obvious from the graph, but it’s more understandable if you just look at two colors. The green area (Received) describes CVE records that the NVD downloaded but hasn’t touched since then; therefore, they don’t usually include a CPE name[ii]. The dark orange area (Analyzed) describes CVE records that the NVD has “analyzed”, which usually means that the NVD added a CPE name to them (along with other things like CVSS score).

I think you’ll agree that the Received[iii] and Analyzed areas of the graph are about equal; this was the case last year as well. The green area represents CVE records that were downloaded but not enriched. Thus, the green area represents most (but not all) of the NVD’s shortfall from their former goal of enriching every CVE record soon after they download it from CVE.org. Since the green area is about half the graph and since the 2024 graph was about the same, this means the NVD’s shortfall has been on average 50%[iv] since their problems began.

Since Andrey’s 2024 graph includes a green area of about the same relative size, it’s clear that the NVD has made little or no progress in fixing the problem that started in February 2024. You can think of the problem this way: Before 2024, the graph would have consisted almost entirely of the dark orange color. There would never have been a significant amount of green in the graph, since almost all CVE Records were enriched by the NVD within a few days of when they were downloaded.

But now, about half of the graph is green; it’s been that way for 18 months. Why is this a problem? It’s  a problem because:

1.      Software products that are listed as “affected” in the text of a CVE record are thereby reported as being affected by the vulnerability described in the record.

2.      However, even though the text of the record for say CVE-2025-12345 may describe a product and version that your organization utilizes, you will never see that vulnerability listed if you search the NVD using the name of the product and the version number you have installed. This is because, for the CVE record to be found by a normal search, the product must be named in the record with a CPE name, not just a textual description.

3.      Therefore, given the results of Andrey’s analysis, a normal search of the NVD for any software product or intelligent device is likely to miss on average half of the new CVEs that have been identified as affecting the product this year. And since the 50% number has been true since the NVD’s troubles began in February 2024, this same statement can be made regarding CVE vulnerabilities identified in the last 18 months. In other words, on average most NVD searches today only identify about half of the vulnerabilities that have been reported in the last 18 months for the software product or device that is the subject of the search.

This might not be so bad if the NVD warned you about this problem. However, it doesn’t do that. The NVD simply displays the message “There are 0 matching records” if no CVEs are found for a search by product name. There are many other reasons why you might receive the same message, but most NVD users will probably assume the message means that no vulnerability has ever been reported for the product - meaning it doesn’t have any vulnerabilities and never did.

Unfortunately, that is not likely to be the truth. In fact, this blog post describes a single open source product that probably contains at least 40,000 identified vulnerabilities; yet, anyone searching the NVD for that product will probably get the message “There are 0 matching records.” That same post mentions a very large online software/services organization whose policy is only to purchase software from suppliers that have reported vulnerabilities in their products. In other words, they think the only reason a software or device supplier wouldn’t have reported any vulnerabilities for their products is that they’ve never looked for any – not an attitude that should comfort their customers.

While we wait for the Age of Gold when streams flow with milk and honey and all software is vulnerability-free, what can we do to make things better?

In a recent post, I wrote about changes I would like to see in the CVE Program; I also wrote that I am optimistic these changes will be made in the next year or two, especially if the CVE Foundation takes over control of MITRE’s contract to run the Program next March (which I believe is likely). Do I see something similar happening with the NVD (which is, of course, separate from the CVE Program, including being part of a different federal Department)?

No, I don’t. The big problem with the NVD, along with the fact that it’s built on database technology that’s a couple of decades old, is that CPE is an inherently unreliable identifier (as the OWASP SBOM Forum described in this 2022 white paper). Of course, since the NVD houses over 250,000 CVE records that contain only CPE identifiers for vulnerable software products, CPE will be with us for at least the next decade or two. This will be the case even if, starting tomorrow, CVE Numbering Authorities (CNAs) all switch to using purl, which is the only good alternative to CPE today (and even purl is deficient in one important way, which I hope will be addressed in the next year or two).

Filling in missing CPEs can’t produce a “fix” for this problem, since the CPE specification itself is incomplete. For example, the company that sells Windows might be referred to in a CPE name as “Microsoft”, “Microsoft, Inc.”, “Microsoft Inc”, etc. All of these are perfectly legitimate entries for the “vendor” field in CPE. Which one will be chosen for a CPE name for a Microsoft product in any instance? There’s no way to predict that, since the spec is silent and since the NVD has never published any guide to how they make those decisions.

Clearly, these decisions are being made by the NIST employee (or more likely a contractor) who happens to be assigned the task of creating a particular CPE name. What guides their decision? Who knows? Perhaps something the person had for breakfast, perhaps whether they had a fight with their spouse the previous evening, etc. Most likely, it’s just an Eeny-Miny-Moe decision. This shows why there can never be truly automated vulnerability searches in the NVD. You need to use fuzzy logic, machine learning, or plain dumb luck to guess what name the NIST contractor used to create the CPE name for the product you’re searching for (the same reasoning applies to the product name and even to how the version number is expressed – “v2.1”, “version2.1”, “Version_2.1”, etc.).

I believe the only long term solution to the NVD’s CPE problem is a Global Vulnerability Database, like what I described in this post. However, that is years away. What can you do in the near term to fill the NVD’s gaps? If the NVD remains in charge of creating CPE names and someone creates their own names, the names can’t be counted on to be usable in the NVD even a couple of years from now.

However, if an organization that maintains their own vulnerability database – built on top of the NVD - assigns CPE names to vulnerable products in otherwise “unenriched” CVE records, this means that users of the database will be able to find those records when they search that database, using the product name and version number. This means they will receive the same functionality they will receive if the NVD itself had created those CPE names – even though the names created by the organization won’t work in the NVD itself, or in other databases built on the NVD. And if the organization goes further by including other information in CVE records that is not available in the NVD (e.g., whether the CVE is on CISA’s KEV list, or the most recent EPSS score for the vulnerability), the user will come out ahead by utilizing that organization for their vulnerability data searches.

If you’re interested in what I just described, you might want to engage with Andrey Lukashenkov of Vulners, since his organization has been doing their own CVE “enrichment” for years. Moreover, since February 2024, they have been creating their own “CPE names” where there is none in the CVE record[v], so that 65-70% of CVE records in their database now contain either true CPE names or “CPE-like” identifiers for affected products, along with additional CVE information not found in the NVD.

I will also point out that there are at least two or three other companies that offer similar services to Vulners’. If you work for one of those companies, please email me and I’ll let my readers know about you as well – without taking sides regarding which company has the better solution.

If you would like to comment on what you have read here, I would love to hear from you. Please email me at [email protected] or comment on this blog’s Substack community chat.


[i] For an introduction to the CVE Program (aka “MITRE”) and the NVD, see this post.

[ii] In some cases, the CVE Numbering Authority (CNA) that originally created a CVE record included a CPE name with it. However, the NVD normally does this; indeed, in the past they have followed a policy of overwriting a CPE name created by a CAN, with one that they created.

[iii] Of course, the “Received” category in fact includes the whole graph, since almost 100% of CVEs are received by the NVD. When the NVD does something with the CVE (including adding a CPE name), the CVE falls into one of the other four categories (colors) in the graph, so the green area includes the remaining CVEs.

[iv] In fact, the shortfall is more than 50%, since both the “Awaiting Analysis” and “Undergoing Analysis” categories include CVEs that have not yet been assigned a CPE number. As you can see, those two categories account for a tiny percentage of total CVEs downloaded by the NVD. In fact, it’s not clear what purpose those two categories serve, other than to make the backlog appear a little less than it is.

[v] Vulners doesn’t create CPE names for some CVE records that are missing those names, due to some technical reason that Andrey can explain to you.

1