Advent of Code 2020 Day 8 LLVM Compiler

When I got into this day’s puzzle, I was reminded of the horror that is called Intcode. Intcode was an imaginary architecture used in 2019’s AoC puzzles. While it was fun, I think it was horrendously overused, so you can imagine what I thought when I had first seen this puzzle.

But thankfully, as of date, no additions were made to the architecture used on Day 8. It had a refreshingly simple architecture.

Instruction	Description
nop	Does nothing, we just ignore it
acc	Increases the internal register
jmp	jumps to an instruction, relative from the current one

More detailed description can be found on the puzzle page.

I finished it as quickly as I can and after doing nothing for a whole day I had a thought: could I compile this into an executable?

I have heard about LLVM a lot, and decided that it’s finally time to try it out. The idea is simple: take an already existing language and turn it into an intermediate representation (LLVM IR). LLVM IR is the language that the LLVM compiler compiles into assembly for the destination processor.

It uses SSA (Static Single Assignment) which means that each variable can only be assigned once. This does some problems in your case, because our acc instruction needs a piece of memory (our one register) it could override. The problem can be solved with LLVM’s alloca command, that allocates a memory range, probably on the stack. This returns a pointer to that memory, so the data there can be overwritten at any time.

%reg = alloca i32                     ;Allocate memory for register
store i32 0, i32* %reg                ;Initialize register to 0

When we encounter an acc instruction, we can load the data at this memory into a variable, modify it and write it back.

%regvalue = load i32, i32* %reg       ;Load data from register
%regnewvalue = add i32 -13, %regvalue ;Add value to register (in this case -13)
store i32 %regnewvalue, i32* %reg     ;Write new value back into memory

The jmp instruction also poses some problems. LLVM only supports jumps to labels. In order to work around this, I first converted every jmp instruction into a version that jumps to an absolute position instead of a relative one. Then I looked at where these jumps point to and put a label before each of them.

Labels work slightly differently in LLVM than in most other assembly languages. Each label defines a new basic block. A basic block starts with a label or the function signature and ends with a command that redirects execution to another block. In the previous paragraph we didn’t do anything other than putting labels before certain instruction, but in reality this wouldn’t work. It would start a new basic block, leaving the previous one unfinished. This can be fixed easily by adding an unconditional jump before each label.

br label %label_6                     ;Unconditional jump
label_6:                              ;Label

Once we converted all instructions into LLVM IR commands, we can print out the contents of the register.

;Note: In reality this variable wouldn't be called called regvalue
;regvalue was already used before, and since variables can't be overwritten, 
;the compiler would report an error
%regvalue = load i32, i32* %reg
call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 %regvalue)
ret i32 0

Now I just had to write some code that translates these instructions into LLVM IR ones. When it was done, I got this as the output:

declare i32 @printf(i8*, ...) #1
@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1
define i32 @main(){
	%a1 = alloca i32
	store i32 0, i32* %a1
	br label %label_1
	label_1:
	%a2 = load i32, i32* %a1
	%a3 = add i32 1, %a2
	store i32 %a3, i32* %a1
	br label %label_6
	%a4 = load i32, i32* %a1
	%a5 = add i32 3, %a4
	store i32 %a5, i32* %a1
	br label %label_1
	%a6 = load i32, i32* %a1
	%a7 = add i32 -99, %a6
	store i32 %a7, i32* %a1
	br label %label_6
	label_6:
	%a8 = load i32, i32* %a1
	%a9 = add i32 1, %a8
	store i32 %a9, i32* %a1
	%a10 = load i32, i32* %a1
	%a11 = add i32 6, %a10
	store i32 %a11, i32* %a1
	%a12 = load i32, i32* %a1
	call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 %a12)
	ret i32 0
}

At this point something struck me: this isn’t anything more than a bunch of absolute jumps and additions. “Duh” you might say, but what if LLVM looks at this, reorders the basic blocks, combines the add commands and generates a single constant value. There is one way to find out.

I decided to use my puzzle input, run llc (the LLVM compiler) aaaaand…

pushq	%rax
movl	$2212, 4(%rsp)
leaq	.L.str(%rip), %rdi
movl	$2212, %esi
xorl	%eax, %eax
callq	printf@PLT
xorl	%eax, %eax
popq	%rcx
retq

Maybe not so surprisingly, running this code gave me the result: 2212.

Either way, if you want to give it a try, the source code is available in the blog’s repository.