Cyber Security Rumble Finals CTF 2023 – elkcip – Writeup

Please reverse this simple¹ flag checker implemented in 8 lines² of Python³ code.

TLDR: Flag checker implemented in the Python pickle VM using NAND gates. The gates define 128 equations over 128 variables which can be solved using sage.
Depending on the SMT/SAT solver used, this challenge can be solved in a few minutes or not at all!
500 base points + 438 dynamic scoring points and 2 solves.
Flag: CSR{you_solved!}.

For this challenge, we are given a Python chall.py script and a code.pickle object.

Analyzing the Python Script

The Python script takes 16 flag bytes, combines them with a header (b"C\x10") and the pickle bytes and then uses the pickle module to load the pickle object.
If the result is truthy (True, or 1, or …), the flag is correct.

#!/usr/bin/env python3

import pickle

flag = input("Flag: ").encode().ljust(16)[:16]

with open("code.pickle", "rb") as f:
    code = f.read()

if pickle.loads(b"C\x10" + flag + code):
    print("Correct")
else:
    print("Wrong")

What is a Pickle Object? What is the Pickle VM?

The pickle module is part of the Python standard library and allows serializing and deserializing Python objects. To do this, it defines a pickle protocol that is used to serialize and deserialize Python objects ¹.

A pickle object is a sequence of bytes that is executed by a “pickle machine” (PM). The PM is a very simple machine:

there are no looping, testing, or conditional instructions, no arithmetic and no function calls. Opcodes are executed once each, from first to last, until a STOP opcode is reached ².

The result of the execution – the unpickled value – is the value that is left on the stack after the STOP opcode is executed.

Analyzing the Pickle Object Using pickletools

To analyze the pickle object, we can use the pickletools module, which is part of the standard library.

import pickletools
with open("code.pickle", "rb") as f:
    code = f.read()
flag = b'CSR{ABCD_EFGH_I}' # 16 bytes

final_code = b"C\x10" + flag + code
open("final_code.pickle", "wb").write(final_code) # save for later use

print(pickletools.dis(final_code))

Unfortunately, pickletools crashes:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.11/pickletools.py", line 2531, in dis
    raise ValueError(errormsg)
ValueError: memo key 4 already defined

Instead of using pickletools.dis we could have used pickletoos.genops to get the opcodes, but I wasn’t aware of that at the time. Also, the output of pickletoos.genops is harder to read than the output of the next tool.

Analyzing the Pickle Object Using fickling

Luckily, one of my teammates (Liam Wachter) remembered a tool from Trail of Bits called fickling that can be used to analyze pickle objects.

$ pip install fickling && fickling final_code.pickle

However, this also crashes:

Traceback (most recent call last):
  [...]
  File "fickling/pickle.py", line 135, in __new__
    raise NotImplementedError(f"TODO: Add support for Opcode {info.name}")
NotImplementedError: TODO: Add support for Opcode SHORT_BINSTRING

To me, a NotImplementedError sounded easier to fix than a ValueError, so I decided to fix this issue.

Bugfixing fickling – SHORT_BINSTRING

By looking at the source code of fickling we can find a definition for SHORT_BINBYTES ³.

class ShortBinBytes(DynamicLength, ConstantOpcode):
    name = "SHORT_BINBYTES"
    priority = Unicode.priority + 1
    length_bytes = 1

    @classmethod
    def validate(cls, obj):
        if not isinstance(obj, bytes):
            raise ValueError(
                f"{cls.__name__} must be instantiated with an object of type bytes, not {obj!r}"
            )
        return super().validate(obj)

and STRING ⁴:

class String(ConstantOpcode):
    name = "STRING"
    priority = Unicode.priority + 1

    def encode_body(self) -> bytes:
        return repr(self.arg).encode("utf-8")

    @classmethod
    def validate(cls, obj):
        if not isinstance(obj, str):
            raise ValueError(f"String must be instantiated from a str, not {obj!r}")
        return obj

By combining parts of the two, we arrive at this definition for SHORT_BINSTRING:

class ShortBinString(DynamicLength, ConstantOpcode):
    name = "SHORT_BINSTRING"
    priority = Unicode.priority + 1
    length_bytes = 1

    def encode_body(self) -> bytes:
        return repr(self.arg).encode("utf-8")

    @classmethod
    def validate(cls, obj):
        if not isinstance(obj, str):
            raise ValueError(f"String must be instantiated from a str, not {obj!r}")
        return obj

If we now run the command again, we get output that looks like this:

_var0 = getattr({}, '__class__')
_var1 = getattr(_var0, '__getitem__')
# ...
_var279 = _var1({_var8: 1, _var8: 1, 0: 0}, 0)
_var280 = _var1({0: 1, 0: 1, 0: 0}, 0)
# Lots of lines similar to the next one follow
# _varFOOBAR = _var1({_varFOO: 1, _varBAR: 1, 0: 0}, 0)

_var1 is the __getitem__ method of the dict class. So the code is basically doing this:

_var279 = {}.__class__.__getitem__({_var8: 1, _var8: 1, 0: 0}, 0)

which is the same as this:

_var279 = {_var8: 1, _var8: 1, 0: 0}[0]

If you are familiar with Python, you might notice that this is equivalent to this:

_var279 = 0

This is because the last value of a key in a dictionary is the one that is used ⁵.

So are all _varFOOBAR variables set to 0? No, because this is another bug in fickling, the order of dict entries is wrong.

_varFOOBAR = _var1({_varFOO: 1, _varBAR: 1, 0: 0}, 0)
# ^should really be this:
_varFOOBAR = _var1({0: 0, _varBAR: 1, _varFOO: 1}, 0)

Here is the PR that adds support for SHORT_BINSTRING: trailofbits/fickling#68.

Bugfixing fickling – Wrong Order of dict() Entries

How do we find the bug in fickling? We can enable tracing (fickling --trace) to find the instruction that builds the dictionary:

DICT
	Popped 1
	Popped _var8
	Popped 1
	Popped _var8
	Popped 0
	Popped 0
	Popped MARK
	Pushed {_var8: 1, _var8: 1, 0: 0}

If we cross-reference this with the documentation for the DICT opcode ⁶, we can see that the order of the entries is wrong:

Stack before: … markobject 1 2 3 ‘abc’
Stack after: … {1: 2, 3: ‘abc’}

In our case the stack looks like this:

Stack before: … markobject 0 0 _var8 1 _var8 1

And should look like this afterwards:

Stack after: … {0: 0, _var8: 1, _var8: 1}

The fix is pretty easy, we just have to reverse the keys and values in the fickling Dict class ⁷:

interpreter.stack.append(ast.Dict(keys=reversed(keys), values=reversed(values)))
# vs the original
interpreter.stack.append(ast.Dict(keys=keys, values=values))

Here is the PR that fixes the bug: trailofbits/fickling#67.

Cleaning Up the Output

After fixing all these bugs, we can finally look at the output which I’ve split into three parts:

initialization of variables
flag bit extraction
flag check

Make sure to save the whole output to a file (for example decompiled_code.py), because we will need it later or use my version as a reference.

Raw Output

# initialization of variables
_var0 = getattr({}, '__class__')
_var1 = getattr(_var0, '__getitem__')
_var2 = getattr(b'', '__class__')
_var3 = getattr(_var2, '__getitem__')
_var4 = getattr(0, '__class__')
_var5 = getattr(_var4, '__and__')
_var6 = getattr(_var4, '__rshift__')

# flag bit extraction
_var7 = _var3(b'CSR{ABCD_EFGH_I}', 0)
_var8 = _var5(_var7, 1)
_var9 = _var6(_var7, 1)
# ...

# flag check
_var279 = _var1({0: 0, _var8: 1, _var8: 1}, 0)
_var280 = _var1({0: 0, 0: 1, 0: 1}, 0)
# ...
result0 = _var50515

Initialization of Variables

_var0 = getattr({}, '__class__')
_var1 = getattr(_var0, '__getitem__')
_var2 = getattr(b'', '__class__')
_var3 = getattr(_var2, '__getitem__')
_var4 = getattr(0, '__class__')
_var5 = getattr(_var4, '__and__')
_var6 = getattr(_var4, '__rshift__')

_var1 allows us to get items from a dictionary, _var3 allows us to get items from a bytes object, _var5 allows us to do bitwise AND while _var6 allows us to do bitwise right shift.

Flag Bit Extraction

_var7 = _var3(b'CSR{ABCD_EFGH_I}', 0) # _var7 = b'CSR{ABCD_EFGH_I}'[0]
_var8 = _var5(_var7, 1)               # _var8 = _var7 & 1
_var9 = _var6(_var7, 1)               # _var9 = _var7 >> 1
_var10 = _var5(_var9, 1)              # _var10 = _var9 & 1
_var11 = _var6(_var9, 1)              # _var11 = _var9 >> 1
_var12 = _var5(_var11, 1)             # _var12 = _var11 & 1

_var7 is the first byte of the flag, _var8 is the first bit of the flag.
_var9 is the first byte of the flag shifted to the right, _var10 is therefore the second bit of the flag, and so on. This is done for all 8 bits of all 16 bytes of the flag.

Flag Check – Using dict() as NAND Gates

_var279 = _var1({0: 0, _var8: 1, _var8: 1}, 0)     # _var279 = {0: 0, _var8: 1, _var8: 1}[0]
_var280 = _var1({0: 0, 0: 1, 0: 1}, 0)             # _var280 = {0: 0, 0: 1, 0: 1}[0]
_var281 = _var1({0: 0, _var280: 1, _var279: 1}, 0) # _var281 = {0: 0, _var280: 1, _var279: 1}[0]
_var282 = _var1({0: 0, _var8: 1, 0: 1}, 0)         # _var282 = {0: 0, _var8: 1, 0: 1}[0]
# lots of lines similar to the above follow

If we take a closer look at the code, we can see that the dict is actually used as a NAND gate 🤯.

Let’s take a look at _var281 = {0: 0, _var280: 1, _var279: 1}[0]; we can create a truth table for this as the inputs are integers that can only be 0 or 1:

`_var279`	`_var280`	`_var281`/`result`
0	0	1
0	1	1
1	0	1
1	1	0

Only if both _var279 and _var280 are 1, _var281 is 0. Because then the dict will look like this: {0: 0, 1: 1, 1: 1} and getting the value for key 0 will return 0.

We can further confirm that this is a NAND gate by looking at the Wikipedia page for NAND gates and comparing the truth table with the one above.

Variable Cleanup

We can now start to clean up the code, by parsing the extracted python code as an AST and performing different transformations.

# Load the code from a file
with open("decompiled_code.py", "rt") as file:
    code = file.read()

# Parse the code into an AST
ast_tree = ast.parse(code)

# Cleanup the AST
ast_tree = ReplaceVar1Visitor().visit(ast_tree)
ast_tree = CleanupAssignmentsVisitor().visit(ast_tree)
ast_tree = ReplaceAssignVisitor().visit(ast_tree)
ast_tree = ReplaceNameVisitor().visit(ast_tree)

ReplaceVar1Visitor first replaces all calls of the form _var1({0: 0, _var280: 1, _var279: 1}, 0) with Nand(_var280, _var279). CleanupAssignmentsVisitor removes assignments that have become dead code and ReplaceAssignVisitor and ReplaceNameVisitor are used to replace _varXXX with flag_{char_index}_{bit_index} where appropriate.

Our code now looks much cleaner and we can reverse the NAND gates.

# OLD CODE                                         # NEW CODE
_var0 = getattr({}, '__class__')                   #
_var1 = getattr(_var0, '__getitem__')              #
_var2 = getattr(b'', '__class__')                  #
_var3 = getattr(_var2, '__getitem__')              #
_var4 = getattr(0, '__class__')                    #
_var5 = getattr(_var4, '__and__')                  #
_var6 = getattr(_var4, '__rshift__')               #
_var7 = _var3(b'CSR{ABCD_EFGH_I}', 0)              # 
_var8 = _var5(_var7, 1)                            # flag_0_0 = Bool('flag_0_0')
_var9 = _var6(_var7, 1)                            # 
_var10 = _var5(_var9, 1)                           # flag_0_1 = Bool('flag_0_1')
_var11 = _var6(_var9, 1)                           # 
_var12 = _var5(_var11, 1)                          # flag_0_2 = Bool('flag_0_2')
# ...
_var279 = _var1({0: 0, _var8: 1, _var8: 1}, 0)     # _var279 = Nand(flag_0_0, flag_0_0)
_var280 = _var1({0: 0, 0: 1, 0: 1}, 0)             # _var280 = Nand(0, 0)
_var281 = _var1({0: 0, _var280: 1, _var279: 1}, 0) # _var281 = Nand(_var280, _var279)
_var282 = _var1({0: 0, _var8: 1, 0: 1}, 0)         # _var282 = Nand(flag_0_0, 0)

NAND Gates Cleanup

If we look closely at the code, we can spot a certain repeating pattern:

_var285 = Nand(flag_0_1, flag_0_1)
_var286 = Nand(_var284, _var284)
_var287 = Nand(_var286, _var285)
_var288 = Nand(flag_0_1, _var284)
_var289 = Nand(_var287, _var288)
_var290 = Nand(_var289, _var289)

If we visualize all six NAND gates, we can see that this is actually an XOR gate. A XOR gate based on an XNOR gate and a NOT gate. The XNOR gate is realized using 5 NAND gates and the NOT gate is realized using 1 NAND gate."

If we repeatedly replace all lines that match the given structure (using some very ugly python code), we get 71 XOR gates and some NAND gates that are still left. We can identify them as a NOT gate and an AND gate such that we get this code:

_var284 = Xor(flag_0_0, 0)
_var290 = Xor(flag_0_1, _var284)
# 67 XOR gates ignored
_var698 = Xor(flag_15_5, _var692)
_var704 = Xor(flag_15_6, _var698)
_var705 = Not(_var704)
_var707 = And(_var705, 1)

In the end we have 128 blocks that each roughly contain 65 XOR gates. Each block has two inputs (in our case flag_0_0 and 0) and XORs them. Then XORs the result with another flag bit and so on. This effectively adds a different amount of flag bits (over the field GF(2)⁸), then optionally adds a 1 (a NOT is the same as adding/subtracting a 1 in GF(2)).

The result of each block is then ANDed with the result of the previous block such that the final result is the following:

_var707 = And(_var705, 1)
_var1094 = And(_var1092, _var707)
_var1498 = And(_var1496, _var1094)
# ...
_var49747 = And(_var49745, _var49366)
_var50116 = And(_var50114, _var49747)
_var50515 = And(_var50513, _var50116)
result0 = _var50515

We want result0 to be 1/True so every individual block must be 1/True.

A Linear Equation System Over GF(2)

In the end, all those ANDs define a linear equation system over GF(2).

If we extract the 128x128 matrix and 128 vector that is the right side of the equation, we can use sage to get a solution for it like this:

from sage.all import *
F = GF(2)
vec = vector(F, right_side)
mat = matrix(F, rows)
sol = mat.solve_right(vec)

print(int("".join(map(str, reversed(sol))), 2).to_bytes(16, "little"))

We finally get the flag: b'CSR{you_solved!}'.

(Full source code for the sage solve script.)

SMT and SAT Solvers – Why They Didn’t Work

Initially, I left all the NAND gates as is and didn’t realize that I’m actually solving a linear equation system. So using Z3 should be enough to give us an easy solve, right?

from z3 import *
# ...
_var50512 = Not(_var50511)
_var50513 = Not(_var50512)
_var50514 = Not(And(_var50116, _var50513))
_var50515 = Not(_var50514)
result0 = _var50515
s = Solver()
s.add(result0 != 0)

After running this for 2+ hours, I decided to instead transform the constraints to CNF (using Z3 for the conversion) and use the SAT solver CaDiCal.

Unfortunately, besides heating the room, they never solved the problem. This is because the solvers don’t see the problem as a linear equation system, but just as many “individual constraints” making it exponentially hard ⁹.

While writing this post, I realized that there exist SAT solvers that can detect XOR gates and perform Gauss-Jordan elimination on them! One such solver is cryptominisat.

Unfortunately there is another thing we have to be aware of. If we use Z3 with only a single constraint (s.add(result0 != 0)), transform to CNF and then use cryptominisat, we will NOT get a solution in a reasonable time. Instead, we have to introduce variables for every intermediate computation like this:

from z3 import *
# ...
_var50513 = Int('_var50513')
_var50514 = Int('_var50514')
_var50515 = Int('_var50515')
#
s = Solver()
s.add(_var50513 == Not(_var50512))
s.add(_var50514 == Not(And(_var50116, _var50513)))
s.add(_var50515 == Not(_var50514))
s.add(result0 == _var50515)
s.add(result0 != 0)

If we do this, we also get the flag b'CSR{you_solved!}' in about a minute.

(Full source code for the cryptominisat solve script.)

There are multiple protocols that evolved over time. The latest protocol is version 5, which was introduced in Python 3.8. The pickle protocol is documented in the pickle module documentation. ↩
More information about the PM and more information about the available instructions. ↩
Source ↩
Source ↩
If a key occurs more than once, the last value for that key becomes the corresponding value in the new dictionary. Source.

↩
Source ↩
Source ↩
GF(2): We have two numbers 0 and 1 such that 1+1=0. Adding two 1-bit numbers is therefore equivalent to XORing them. ↩
The larger hurdle is that once the XORs are in the CNF using the translation, the CNF becomes exponentially hard to solve using standard CDCL as used in most SAT solvers. This is because Gauss-Jordan elimination is exponentially hard for the standard CDCL to perform — but we need Gauss-Jordan elimination because the XORs will interact with each other, as we expect there to be many of them. Without being able to resolve the XORs with each other efficiently and derive information from them, it will be practically impossible to solve the CNF [emphasis mine]. Source

↩