What to Do When Creating Your CodeQL Database Fails – and How to Report the Perfect Reproducer Using cvise

Recently, a colleague was trying to create a CodeQL database for a specific version of the monad project to perform some security analysis.

Everything seemed to work fine during the database creation process. The build succeeded, CodeQL didn’t report any errors, and the database was created successfully.

However, when trying to query the database, something was clearly wrong.

The Problem

My colleague wanted to find a specific class in the database. Even a simple query to select everything that has a location in a specific folder failed to return any results:

import cpp

from Element e
where e.getLocation().getFile().getAbsolutePath().matches("%transaction%")
select e

This should have returned a few results, but instead returned nothing. Something was clearly broken with the database.

Looking at the Build Tracer Log

When CodeQL database creation fails silently like this, the first thing to check is the build tracer log. This log contains detailed information about what happened during the build process and can reveal issues that aren’t immediately obvious.

The build tracer log is located at $DB/log/build-tracer.log inside your CodeQL database directory.

If we open this file and scroll through it, we notice something alarming: many “catastrophic errors”.

[T 00:45:26 93] CodeQL CLI version 2.23.2
[T 00:45:26 93] Initializing tracer.
...
64 errors and 1 catastrophic error detected in the compilation of "/app/monad/category/execution/ethereum/core/transaction.cpp".

The log shows many traced compilations, but also 129 catastrophic errors detected during compilation! If a compilation unit fails catastrophically, the extractor cannot extract any information from it, which explains why our queries returned no results.

To find what caused the catastrophic error, we need to scroll up a bit from where we see the catastrophic failures and look for actual error messages.

Finding the Root Cause

After scrolling through the build tracer log, we eventually find error messages that look like this:

error: assertion failed at: "decls.c", line 18401 in add_src_seq_end_of_variable_if_needed

This is the smoking gun! The CodeQL C/C++ extractor is hitting an internal assertion failure when processing certain source files¹. When this happens, the extractor fails to extract any information from that compilation unit, which explains why our queries returned no results.

The error points to a specific file (decls.c) and line number (18401) in the CodeQL extractor’s internal code where an assertion failed. While we can’t fix the extractor directly, we can create a minimal reproducer to report the bug to the CodeQL team.

Creating a Minimal Reproducer with cvise

When reporting bugs to the CodeQL team (or any compiler/static analysis tool team), providing a minimal reproducer is incredibly valuable. Instead of asking them to clone and build the entire monad project, we can use a tool called cvise (or its predecessor, C-Reduce) to automatically reduce our failing test case to a minimal example.

What Is cvise?

cvise is a tool for reducing C/C++ programs. It takes a large program that triggers a bug and automatically removes code while ensuring the bug still reproduces. The result is a minimal test case that’s much easier to understand and debug.

I cannot recommend cvise enough for this purpose - it saved me hours of manual reduction work! Whether you’re dealing with compiler crashes, static analysis tool bugs, or any other C/C++ code issues, cvise is an invaluable tool in your debugging arsenal. In many cases, it even works pretty well for non-C/C++ languages, such as JavaScript or Java, by treating them as plain text files and applying similar reduction strategies!

Setting Up the Interestingness Test

To use cvise, we need to create an “interestingness test” - a script that returns 0 (success) if the bug reproduces and non-zero (failure) if it doesn’t.

Here’s the interestingness test script we’ll use:

#!/bin/bash

set -e

cleanup() {
    rm -rf "$mytmpdir"
}
trap cleanup EXIT

mytmpdir=$(mktemp -d 2>/dev/null || mktemp -d -t 'mytmpdir')
codeql database create "$mytmpdir" --language=cpp --command="/usr/lib/llvm-19/bin/clang -std=gnu++23 -c minimal.cpp" --overwrite
cat "$mytmpdir/log/build-tracer.log" | grep 'error: assertion failed at: "decls.c", line 18401 in add_src_seq_end_of_variable_if_needed'
status=$?
exit $status

This script:

Creates a temporary directory for the CodeQL database
Tries to create a CodeQL database by compiling minimal.cpp with the same compiler and flags used in the original build
Searches the build tracer log for our specific error message
Returns 0 (success) if the error is found, non-zero (failure) if it’s not
Cleans up the temporary directory when done

Finding the Failing Source File

Before we can run cvise, we need to identify which source file is causing the problem. We can grep through the build tracer log for the error message and look at the preceding compilation commands to find the problematic file.

Once we’ve identified the file, we copy it to minimal.cpp and verify that our interestingness test works:

cp /path/to/monad/consensus/problematic_file.cpp minimal.cpp
chmod +x test.sh
./test.sh
echo $?  # should print 0

In our case, the log shows that the problematic file is from the GNU C++ standard library header alloc_traits.h, so we copy that file into minimal.cpp.

CodeQL C++ extractor: Current location: /app/monad/category/vm/core/assert.cpp:62055,3
CodeQL C++ extractor: Current physical location: /usr/lib/gcc/x86_64-linux-gnu/15/../../../../include/c++/15/bits/alloc_traits.h:146,3
"/usr/lib/gcc/x86_64-linux-gnu/15/../../../../include/c++/15/bits/alloc_traits.h", line 146: internal error: assertion failed at: "decls.c", line 18401 in add_src_seq_end_of_variable_if_needed

  	};
  	 ^

Running cvise

Now we can run cvise to reduce the file:

cvise --n 8 test.sh minimal.cpp

The --n 8 flag tells cvise to use 8 parallel processes to speed up the reduction.

cvise will now automatically try removing various parts of the code - functions, statements, expressions, type qualifiers, and more - while continuously checking that the bug still reproduces. This process can take anywhere from a few minutes to several hours depending on the size of the original file.

What cvise Does

During the reduction process, cvise will:

Try removing entire functions
Try removing statements and expressions
Try simplifying complex expressions
Try removing template parameters and type qualifiers
Try renaming identifiers to simpler names
Try many other transformations

At each step, it runs our interestingness test to verify the bug still reproduces. If a transformation causes the bug to disappear, it’s reverted. If the bug still reproduces, the transformation is kept.

The Final Result

After cvise finishes, we’ll have a minimal.cpp file that might look something like this:

struct __allocator_traits_base {
  template < typename >
  static constexpr int __can_construct_at{
# 1
  };
};

This is much simpler than the original thousands of lines of code, but it still triggers the same assertion failure in the CodeQL extractor!

Reporting the Bug

Now that we have a minimal reproducer, we can create a bug report for the CodeQL team. The report should include:

Description: A clear description of the problem (“CodeQL C/C++ extractor crashes with assertion failure on this code”)
CodeQL version: The version where the bug occurs (e.g., “CodeQL CLI version 2.23.2”)
Minimal reproducer: The reduced minimal.cpp file
Command to reproduce: The exact command that triggers the bug
Expected behavior: What should happen (“The code should be extracted successfully”)
Actual behavior: What actually happens (“Assertion failure: error: assertion failed at: ‘decls.c’, line 18401”)

With this information, the CodeQL team can quickly reproduce the issue, debug it, and create a fix.

Conclusion

When CodeQL database creation appears to succeed but queries return no results:

Check the build tracer log at codeql-db/log/build-tracer.log
Look for error messages and assertion failures
Identify the failing source file(s)
Use cvise to create a minimal reproducer
Report the bug with all relevant details

By following this process, you can turn a frustrating debugging experience into a valuable bug report that helps improve CodeQL for everyone.

The bug has been fixed after just 9 days and released in CodeQL CLI version 2.23.5!

Appendix: Dockerfile for Reproducing the Issue

# syntax=docker/dockerfile:1-labs

FROM ubuntu:25.04 AS base

RUN apt update && apt upgrade -y

RUN apt update && apt install -y apt-utils

RUN apt update && apt install -y dialog

RUN apt update && apt install -y \
    ca-certificates \
    curl \
    gnupg \
    software-properties-common \
    wget \
    git

RUN apt update && apt install -y \
    clang-19 \
    gcc-15 \
    g++-15

RUN apt update && apt install -y \
    libarchive-dev \
    libbrotli-dev \
    libcap-dev \
    libcli11-dev \
    libgmp-dev \
    libtbb-dev \
    libzstd-dev

RUN git clone https://github.com/category-labs/monad/ /monad && \
cd monad && git checkout 3f1f0063468e04f48ff068d388167af1c4ab5635 && \
cp /monad/scripts/ubuntu-build/* /opt/ && rm -rf /monad


RUN /opt/install-boost.sh
RUN /opt/install-tools.sh
RUN /opt/install-deps.sh


FROM base AS codeql

WORKDIR /app

RUN apt install -y unzip libstdc++-15-dev
# Change to v2.23.5 (fixed) or v2.23.3 (broken) to test different versions
RUN curl -LO "https://github.com/github/codeql-cli-binaries/releases/download/v2.23.3/codeql-linux64.zip"
RUN unzip codeql-linux64.zip && rm codeql-linux64.zip

ENV PATH="/app/codeql:$PATH"
ENV ASMFLAGS=-march=haswell
ENV CFLAGS=-march=haswell
ENV CXXFLAGS=-march=haswell


RUN git clone --recursive https://github.com/category-labs/monad/ && cd monad && git checkout 3f1f0063468e04f48ff068d388167af1c4ab5635 && mkdir build
WORKDIR /app/monad

RUN cmake -S . -B build/ -DCMAKE_C_COMPILER=/usr/bin/clang-19 -DCMAKE_CXX_COMPILER=/usr/bin/clang++-19
RUN codeql database create codeql-db/ --language=cpp --command="cmake --build build/ --target monad -- -j" --overwrite

Why does this only happen when CodeQL “compiles” the code? The CodeQL C/C++ extractor intercepts the compilation process to extract additional information about the command line, macros, types, and so on. During this process, it runs its own compiler frontend that is based on EDG. This frontend is separate from the actual compiler used to build the code (e.g., Clang or GCC) and can have its own bugs and limitations. So even if the original code compiles fine with Clang or GCC, the CodeQL extractor might still hit bugs in its own frontend! ↩