Skip to content

Automatic removal of unused standard header#192800

Open
serge-sans-paille wants to merge 6 commits intollvm:mainfrom
serge-sans-paille:feature/cleanup-standard-includes
Open

Automatic removal of unused standard header#192800
serge-sans-paille wants to merge 6 commits intollvm:mainfrom
serge-sans-paille:feature/cleanup-standard-includes

Conversation

@serge-sans-paille
Copy link
Copy Markdown
Collaborator

@serge-sans-paille serge-sans-paille commented Apr 18, 2026

patch automatically generated through

$ diskarzhan --fix `find llvm/lib llvm/include -name '*.cpp' -or -name '*.h'`

@serge-sans-paille serge-sans-paille force-pushed the feature/cleanup-standard-includes branch from f39fd4f to faef13e Compare April 18, 2026 17:30
@androm3da
Copy link
Copy Markdown
Member

patch automatically generated through

$ diskarzhan --fix find llvm/lib llvm/include -name '*.cpp' -or -name '*.h'

Sorry if it's a stupid question - does the tool consider whether C/C++ library includes incidentally/indirectly are taking place? In which case it might appear to be unused only because some other prior include pulled it in (for this particular C/C++ library implementation, for this particular version of it etc).

@serge-sans-paille
Copy link
Copy Markdown
Collaborator Author

serge-sans-paille commented Apr 18, 2026

Sorry if it's a stupid question - does the tool consider whether C/C++ library includes incidentally/indirectly are taking place?

It's not :-)

In which case it might appear to be unused only because some other prior include pulled it in (for this particular C/C++ library implementation, for this particular version of it etc).

The algorithm of diskarzhañ is stupidly simple: each standard header is associated with the set of symbols it defines (e.g. <vector> only defines std::vector, although some implementation may bring in <cstddef> for std::size_t). If a standard header is included but none of the symbol it defines is referenced in the same code, the include is removed.

@serge-sans-paille serge-sans-paille force-pushed the feature/cleanup-standard-includes branch from 3b348e9 to 46581d3 Compare April 18, 2026 18:00
@llvmbot llvmbot added the clang:frontend Language frontend issues, e.g. anything involving "Sema" label Apr 18, 2026
Copy link
Copy Markdown
Contributor

@aengelke aengelke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Please add a motivation to the PR description. For larger changes/long-term efforts/adding CI checks, an RFC would be good.

From a compile-time perspective, standard headers are not the problem, because many expensive ones get pulled in almost everywhere through Support/ headers anyway. If anything, we should focus on reducing unneeded includes from LLVM headers, also by using forward declarations where possible. Just focusing on standard headers provides little value, IMO.

Comment thread llvm/include/llvm/Support/pch.h
@serge-sans-paille serge-sans-paille force-pushed the feature/cleanup-standard-includes branch from 1ed39da to 34238db Compare April 18, 2026 19:02
@serge-sans-paille serge-sans-paille requested a review from a team as a code owner April 18, 2026 19:58
@llvmbot llvmbot added the libc++abi libc++abi C++ Runtime Library. Not libc++. label Apr 18, 2026
@serge-sans-paille
Copy link
Copy Markdown
Collaborator Author

Why? Please add a motivation to the PR description. For larger changes/long-term efforts/adding CI checks, an RFC would be good.

The goal would be to add a CI step that prevents future regression on that topic. Agreed for the RFC.

From a compile-time perspective, standard headers are not the problem, because many expensive ones get pulled in almost everywhere through Support/ headers anyway. If anything, we should focus on reducing unneeded includes from LLVM headers, also by using forward declarations where possible. Just focusing on standard headers provides little value, IMO.

Based on that change I can confirm that the amount of preprocessed lines before / after the patch was only shrunk by 24_507 lines, which is mostly negligible compare to the 395_187_372 original ones (only compiling LLVM).

As such the only value of this PR would be to pave the way for automatic regression, if that's a path we want to follow.

@RKSimon
Copy link
Copy Markdown
Collaborator

RKSimon commented Apr 20, 2026

We have commits that do IWYU cleanup every so often - how well does diskarzhañ match with that? I'm worried we'll end up in an infinite loop of commits from different tools micro-optimizing include placement....

@firewave
Copy link
Copy Markdown

For reference the location of the tool in question: https://pypi.org/project/diskarzhan/.

FYI'ing @kimgr and @bolshakov-a from IWYU

@firewave
Copy link
Copy Markdown

FWIW looking at the source of diskarzhan we now have another set of header-to-symbol mappings in the cosmos of LLVM (LLVM itself, IWYU and the new tool in town).

@firewave
Copy link
Copy Markdown

CC'ing @alejandro-colomar and @AaronBallman as they are involved with mapping stuff

Comment on lines 17 to 24
#include "llvm/ADT/DenseMapInfo.h"
#include "llvm/Support/Compiler.h"

#include <cassert>

namespace clang {

namespace tok {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message seems to be incomplete. It says removal, but for example in this file, we only have additions.

Comment on lines 58 to 65
#ifndef LLVM_PROFILEDATA_CTXINSTRCONTEXTNODE_H
#define LLVM_PROFILEDATA_CTXINSTRCONTEXTNODE_H

#include <stddef.h>
#include <stdint.h>
#include <stdlib.h>

namespace llvm {
namespace ctx_profile {
Copy link
Copy Markdown
Contributor

@alejandro-colomar alejandro-colomar Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to learn the rules that diskharzan uses. I'd like to learn how they differ from iwyu(1) and why.

Why does diskharzan exist in the first place? Is there anything from iwyu(1) that you believe is better addressed by diskharzan?

@AaronBallman
Copy link
Copy Markdown
Collaborator

We have commits that do IWYU cleanup every so often - how well does diskarzhañ match with that? I'm worried we'll end up in an infinite loop of commits from different tools micro-optimizing include placement....

+1 to this concern, also this is a significant amount of churn for an NFC change that likely will cause merge conflicts for downstreams. So definitely +1 to having some kind of RFC to see if there's community buy-in, but also CC @llvm/clang-vendors for awareness of this PR directly.

@alejandro-colomar
Copy link
Copy Markdown
Contributor

Cc: @bolshakov-a , @kimgr

@bolshakov-a
Copy link
Copy Markdown
Contributor

I think the tool is generally good because, for standard library symbols, the simple textual lookup is usually sufficient for doing IWYU (but not always, see https://discourse.llvm.org/t/should-stacktrace-provide-string/89605). However, I've noticed that the mapping in diskarzhan is not full even for C++17 (std::lcm is missing for <numeric> at least). @serge-sans-paille, you can use https://github.com/include-what-you-use/include-what-you-use/blob/master/std_symbol_map.inc for reference, but keep in mind that it is also still not full.

@serge-sans-paille
Copy link
Copy Markdown
Collaborator Author

hey folks,
first, sorry for dropping this in the face of so many people. The downside of automatic review assignment, I guess.

As of why diskharzan exists, we tried to deploy clang-include-cleaner and iwyu in the context of firefox codebase, but we have a lot (a lot) of different configurations, leading to various preprocessor guards, and so the status of whether a file is needed or not depends on the configuration, which is fixable but requires a lot of work.
So we came up with an approximated but fast approach that removed many unusued headers, independently from the actual configuration, and that's what diskarzhan does. It's basically very fast and independent from the build configuration, at the expense of only working for standard headers and having some false negative.

@bolshakov-a I'll make sure to look at the database from iwyu, that will avoid duplicating effort.

@firewave
Copy link
Copy Markdown

We have commits that do IWYU cleanup every so often - how well does diskarzhañ match with that? I'm worried we'll end up in an infinite loop of commits from different tools micro-optimizing include placement....

There already is flip-flopping with include-what-you-use and clang-include-cleaner. As we already integrated both in the Cppcheck CI I will do an exploratory PR integrating diskarzhan as well and see how much applying the suggested changes would boomerang.

@firewave
Copy link
Copy Markdown

As we already integrated both in the Cppcheck CI I will do an exploratory PR integrating diskarzhan as well and see how much applying the suggested changes would boomerang.

See cppcheck-opensource/cppcheck#8474.

Having a short look at the include-what-you-use results only, it is a mixed bag of includes mostly thrown back in, agreements not reflected in suggestions, different symbols being associated and also an issue with canonical headers (probably a diskarzhan issue). Most of these appear to be already known/tracked and I will provide a list after I had a closer look.

@bolshakov-a
Copy link
Copy Markdown
Contributor

Note that diskarzhan and iwyu are indeed incompatible. When IWYU encounters e.g. str1 == str2, it should report <string> because it cannot assume that std::string was explicitly spelled at str1 or str2 declarations, nor that #include <string> appeared somewhere before them (maybe, <stacktrace> provides std::string definition but doesn't provide operator== for it; see https://discourse.llvm.org/t/should-stacktrace-provide-string/89605), and I don't know which standard library classes can be excluded from such a precautionary measure.

There are some more cases when a standard library header is required even when the type is not explicitly written:

// Requires '#include <initializer_list>' despite std::initializer_list is not explicitly spelled.
for (int i : {1,2,3})
  do_smth_with(i);
// Requires '#include <typeinfo>' despite std::type_info is not explicitly spelled.
const auto& ti = typeid(int);

That said, iwyu and diskarzhan implement different approaches: while iwyu exposes some more paranoid behavior, diskarzhan requires a header only when some symbol from it is explicitly written and thus provides a reasonable approximation for most use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants