Advent of Languages 2024, Day 1: C

Monday, the second of December, A.D. 2024

The C Programming Language
Experiments in C
The Puzzle, Part 1
Loading data
Splitting hairs strings
Integer-ation hell
Who needs arrays, anyway?
Sorting and finishing touches
Interlude: Arrays in C
Part 2
And That’s It

A s time goes on, it’s becoming increasingly clear to me that I’m a bit of a programming-language dilletante. I’m always finding weird niche languages like Pony or Roc, going “wow that looks cool,” spending a bunch of time reading the documentation, and then never actually using it and forgetting all about it for the next three years.

This year, I’ve decided I’m going either buck that trend or double down on it, depending on your point of view. Instead of not engaging at all with whatever random language strikes my fancy, I’m going to engage with it to the absolute minimum degree possible, then move on. Win-win, right? I get to feel like I’m being more than a dilletante, but I don’t have to do anything hard like really learn a new language.

I should probably mention here, as a disclaimer, that I’ve never gotten all the way through an AoC in my life, and there’s no way I’m going to do better with more problems to worry about. I’m guessing I’ll peter out by day 12 or so, that’s about as far as I usually get. Oh, and there’s no way I’m going to stick to the one-day cadence either. It’ll probably be May or so before I decide that enough is enough and I’m going to call it.1

1 It’s already December 2nd and I just finished the first half of Day 1, so clearly I’m shooting for more of a “slow and steady wins the race” cadence here.

Also, figuring out a new programming language every day is going to take enough time as it is, so I’m going to do this very stream-of-consciousness style. I apologize in advance for the haphazard organization of this and subsequent posts.

Anyway, I’ve decided to start with C, mostly because I’m scared of C and Day 1 of AoC is always the easiest, so I won’t have to really get into it at all.2

2 Ok, it’s also because I know that C doesn’t have much in the way of built-in datastructures like hash maps and whatnot, so if you need one you end up having to either figure out how to use third-party C libraries (eugh) or write your own (even worse).

The C Programming Language

C, of course, needs no introduction. It’s known for being small, fast, and the language in which Unix was implemented, not to mention most (all?) other major OS kernels. It’s also known for being the Abode of Monsters, i.e. there are few-to-no safeguards, and if you screw up the consequences might range from bad (segfaults in userland), to worse (kernel panics), to catastrophic (your program barfs out millions of users’ highly sensitive data to anyone who asks).3

3 To be fair, this last category isn’t limited to C. Any language can be insecure if you try hard enough. Yes, even Rust.

I’ve seen it described as “Like strapping a jet engine to a skateboard - you’ll go really fast, but if you screw up you’ll end up splattered all over the sidewalk.”4

4 Sadly I can no longer locate the original source for this, but I have a vague recollection of it being somewhere on the varnish website, or possibly on the blog of someone connected with Varnish.

All of which explains why I’m just a tad bit apprehensive to dip my toes into C. Thing is, for all its downsides, C is everywhere. Not only does it form the base layer of most computing infrastructure like OS kernels and network appliances, it’s also far and away the most common language used for all the little computers that form parts of larger systems these days. You know, like cars, industrial controllers, robotics, and so on. So I feel like it would behoove me to at least acquire a passing familiarity with C one of these days, if only to be able to say that I have.

Oh, but to make it extra fun, I’ve decided to try to get through at least the first part of Day 1 without using any references at all, beyond what’s already available on my computer (like manpages and help messages of commands). This is a terrible idea. Don’t do things this way. Also, if you’re in any way, shape, or form competent in C, please don’t read the rest of this post, for your own safety and mine. Thank you.

Experiments in C

Ok, let’s get the basics out of the way first. Given a program, can I actually compile it and make it run? Let’s try:

#include "stdio.h" // pretty sure I've seen this a lot, I think it's for stuff like reading from stdin and writing to stdout

int main() { // the `int` means that this function returns an int, I think?
    printf("hello, world!");
}

Now, I’m not terribly familiar with C toolchains, having mostly used them from several layers of abstraction up, but I’m pretty sure I can’t just compile this and run it, right? I think compiling will turn this into “object code”, which has all the right bits in it that the computer needs to run it, but in order to put it all in a format that can actually be executed I need to “link” it, right?

Anyway, let’s just try it and see.

$ cc 01.c
$ ls

>>> 01.c  a.out

$ ./a.out

>>> "hello, world!"

Well, what do you know. It actually worked.5

5 Amusingly, I realized later that it was totally by accident that I forgot to put a semicolone after the #include, but apparently this is the correct syntax so it just worked.

I guess the linking part is only necessary if you have multiple source files, or something?

The Puzzle, Part 1

This is pretty encouraging, so let’s tackle the actual puzzle for Day 1. There’s a bunch of framing story like there always is, but the upshot is that we’re given two lists arranged side by side, and asked to match up the smallest number in the first with the smallest number in the second, the second-smallest in the first with the second-smallest in the second, etc. Then we have to find out how far apart each of those pairs is, then add up all of those distances, and the total is our puzzle answer.

This is conceptually very easy, of course (it’s only Day 1, after all). Just sort the two lists, iterate over them to grab the pairs, take abs(a - b) for each pair, and sum those all up. Piece of cake.

Except of course, that this is C, and I haven’t the first idea how to do most of those things in C.6

6 Ok, I’m pretty sure I could handle summing up an array of numbers in C. But how to create the array, or how to populate it with the numbers in question? Not a clue.

Loading data

Ok, so first off we’ll need to read in the data from a file. That shouldn’t be too hard, right? I know fopen is a thing, and I am (thankfully) on Linux, so I can just man fopen and see what I get, right? type type Aha, yes! half a moment, I’ll be back.

Mmmk, so man fopen gives me these very helpful snippets:

SYNOPSIS
       #include <stdio.h>

       FILE *fopen(const char *pathname, const char *mode);

(...)

The argument mode points to a string beginning with one of the following sequences (possibly followed by additional characters, as described below):

r      Open text file for reading.  The stream is positioned at the beginning of the file.

(...)

Ok, so let’s just try opening the file and then dumping the pointer to console to see what we have.

#include "stdio.h"

int main() {
    int f_ptr = fopen("data/01.txt", "r");
    printf(f_ptr);
}

$ cc 01.c

>>> 01.c: In function ‘main’:
01.c:4:17: warning: initialization of ‘int’ from ‘FILE *’ makes integer from pointer without a cast [-Wint-conversion]
    4 |     int f_ptr = fopen("data/01.txt", "r");
      |                 ^~~~~
01.c:5:12: warning: passing argument 1 of ‘printf’ makes pointer from integer without a cast [-Wint-conversion]
    5 |     printf(f_ptr);
      |            ^~~~~
      |            |
      |            int
In file included from 01.c:1:
/usr/include/stdio.h:356:43: note: expected ‘const char * restrict’ but argument is of type ‘int’
  356 | extern int printf (const char *__restrict __format, ...);
      |                    ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~
01.c:5:5: warning: format not a string literal and no format arguments [-Wformat-security]
    5 |     printf(f_ptr);

…oh that’s right, this is C. we can’t just print an integer, it would interpret that integer as a pointer to a string and probably segfault. In fact…

$ ./a.out

>>> Segmentation fault (core dumped)

Right. Ok, well, man was our friend last time, maybe it can help here too?

man printf

Why, yes! Yes it—oh wait, no. No, this isn’t right at all.

Oh yeah, printf is also a standard Unix shell command, so man printf gives you the documentation for that. I guess man fopen only worked because fopen is a syscall, as well as a library function. Oh well, let’s just see if we can guess the right syntax.

#include "stdio.h"

int main() {
    int f_ptr = fopen("data/01.txt", "r");
    printf("%i", f_ptr);
}

$ cc 01.c
$ ./a.out

>>> 832311968

Hey, would you look at that! Weirdly enough, so far it’s been my Python experience that’s helped most, first with the fopen flags and now this. I guess Python wears its C heritage with pride.

I’m cheating a little, by the way. Well, kind of a lot. I switched editors recently and am now using Zed primarily (for languages it supports, at least), and Zed automatically runs a C language server by default when you’re working in C.7

7 Actually, it might be a C++ language server? At least, it keeps suggesting things from std:: which I think is a C++ thing.

Which is pretty helpful, because now I know the proper (at least, more proper) way to do this is:

FILE *file = fopen("data/01.txt", "r");

so now we have a pointer to a FILE struct, which we can give to fread() I think? man fread gives us this:

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);

Which means, I think, that fread() accepts a pointer to a region of memory into which it’s reading data, an item size and a number of items,8

8 Which surprised me, I was expecting just a number of bytes to read from the stream. Probably, again, because of my Python experience (e.g. this is how sockets work in Python).

and of course the pointer to the FILE struct.

Ok, great. Before we can do that, though, we need to get that first pointer, the one for the destination. man malloc is helpful, telling me that I just need to give it a number and it gives me back a void * pointer. I think it’s void because it doesn’t really have a type—it’s a pointer to uninitialized memory, so you can write to it, but if you try to read from it or otherwise interpret it as being of any particular type it might blow up in your face.

Anyway:

#include "stdio.h"
#include "stdlib.h"

int main() {
    FILE *file = fopen("data/01.txt", "r");
    void *data = malloc(16384);
    size_t data_len = fread(data, 1, 16384, file);
    printf("%zu", n_read);
}
}

I happen to know that my personal puzzle input is 14KB, so this will be enough. If the file were bigger, I’d have to either allocate more memory or read it in multiple passes. Oh, the joys of working in a non-memory-managed language.

Running this outputs 14000, so I think it worked. I’m not sure if there’s a performance penalty for using an item size of 1 with fread, but I’m guessing not. I highly doubt, for instance, that under the hood this is translating to 14,000 individual syscalls, because that would be a) completely bonkers and b) unnecessary since it already knows ahead of time what the max size of the read operation is going to be.9

9 Reading up on this further suggests that the signature of fread is mostly a historical accident, and most people either do fread(ptr, 1, bufsize, file) (if reading less than the maximum size is acceptable) or fread(ptr, bufsize, 1, file) (if incomplete reads are to be avoided.)

Splitting hairs strings

Ok, next up we’re going to have to a) somehow split the file into lines, b) split each line on the whitespace that separates the two columns, and c) parse those strings as integers. Some poking at the language server yields references to strsep, which appears to do exactly what I’m looking for:

char *strsep(char **stringp, const char *delim);

If  *stringp  is NULL, the strsep() function returns NULL and does nothing else.
Otherwise, this function finds the first token in the string *stringp, that is
delimited by one of the bytes in the string delim.  This token is terminated by
overwriting the delimiter with a null byte (''), and *stringp is updated to
point past the token.

I’m not quite sure what that **stringp business is, though. It wants a pointer to a pointer, I guess?10

10 Coming back to this later, I realized: I think this is just how you do mutable arguments in C? If you just pass in a regular argument it seems to get copied, so changes to it aren’t visible to the caller. So instead you pass in a pointer. But in this case, what needs to be mutated is already a pointer, so you have to pass a pointer to a pointer.

The language server suggests that & is how you create a pointer to something that you already have, so let’s try that (by the way, I’m going to stop including all the headers and the int main() and all, and just include the relevant bits from now on):

#include "string.h"

char *test = "hello.world";
char* res = strsep(&test, ".");

$ cc 01.c && ./a.out

>>> Segmentation fault (core dumped)

Hmm. That doesn’t look too good.

However, further reflection suggests that my issue may just be that I’m using a string literal as stringp here, which means (I think) that it’s going to be encoded into the data section of my executable, which makes it not writable when the program is running. So you know what, let’s just YOLO it. Going back to what we had:

FILE *file = fopen("data/01.txt", "r");
void *data = malloc(16384);
size_t data_len = fread(data, 1, 16384, file);

char* first_line = strsep(&data, "\n");
printf("%s", first_line);

Compiling this generates dire warnings about passing an argument of type void ** to a function expecting char **, but this is C, so I can just ignore those and it will operate on the assumption that I know what I’m doing11

11 Oh, you sweet, simple summer child.

and treat that pointer as if it were char ** anyway. And lo and behold:

$ ./a.out

>>> 88450   63363

It works!

Next question: Can I strsep on a multi-character delimiter?

char* first_word = strsep(&first_line, "   ");
printf("%s", first_word);

$ cc 01.cc && ./a.out

>>> 88450   6336388450

Aw, it didn’t w—wait, no it did. It’s just still printing first_line from above, and not printing a newline after that, so first_word gets jammed right up against it. Hooray!

Integer-ation hell

Ok, last piece of the data-loading puzzle is to convert that string to an integer. I’m pretty sure I remember seeing a strtoi function in C examples that I’ve seen before, so let’s try that.

Wait, no. There is no strtoi, but there is a strtol (“string to long integer”), so let’s try that instead.

int i = strtol(first_word, NULL, 10);
printf("%i", i);

and…

$ cc 01.c && ./a.out

>>> 88450

Aww yeah. We got integers, baby! (Apparently int and long are synonymous? At least, they are for me right now on this machine, which is enough to be going on with.)

That second argument to strtol, by the way, is apparently endptr, about which the manpage has this to say:

If  endptr  is  not NULL, strtol() stores the address of the first invalid character
in *endptr.  If there were no digits at all, strtol() stores the original value of
nptr in *endptr (and returns 0).  In particular, if *nptr is not '\0' but **endptr
is '\0' on return, the entire string is valid.

Sounds kind of like we could use that to avoid a second call to strsep, but it seems like a six-of-one, half-a-dozen-of-the-other situation, and I’m too lazy to figure it out, so whatever.

Who needs arrays, anyway?

Ok, so we have the basic shape of how to parse our input data. Now we just need somewhere to put it, and for that we’re obviously going to need an array. Now, as I understand it, C arrays are basically just pointers. The compiler keeps track of the size of the type being pointed to, so when you access an array you’re literally just multiplying the index by the item size, adding that to the pointer that marks the start of the array, and praying the whole thing doesn’t come crashing down around your ears.

I’m not sure of the appropriate way to create a new array, but I’m pretty sure malloc is going to have to be involved somehow, so let’s just force the issue. sizeof tells me that an int (or long) has a size of 4 (so it’s a 32-bit integer). I don’t know exactly how many integers are in my puzzle input, but I know that it’s 14,000 bytes long, and each line must consume at least 6 bytes (first number, three spaces, second number, newline), so the absolute upper bound on how many lines I’m dealing with is 2333.333… etc. Since each integer is 4 bytes that means each array will need to be just under 10 KB, but I think it’s standard practice to allocate in powersof 2, so whatever, let’s just do 16 KiB again.

Not gonna lie here, this one kind of kicked my butt. I would have expected the syntax for declaring an array to be int[] arr = ..., but apparently no, it’s actually int arr[] = .... Ok, that’s fine, but int arr[] = malloc(16384) gets me error: Invalid initializer, without telling me what the initializer is.

Okay, fine. I’ll look up the proper syntax for Part 2. For now let’s just use pointers for everything. Whee! Who now? Safety? Never heard of her. BEHOLD!

void* nums_l = malloc(16384);
void* nums_r = malloc(16384);

int nlines = 0;
while (1) {
    char* line = strsep(&data, "\n");
    int left = strtol(line, &line, 10);
    int right = strtol(line, NULL, 10);

    // if `strtol` fails, it apparently just returns 0, how helpful
    if (left == 0 && right == 0) {
        break;
    }

    int *addr_l = (int *)(nums_l + nlines * 4);
    *addr_l = left;

    int *addr_r = (int *)(nums_r + nlines * 4);
    *addr_r = right;

    nlines++;
    }

Doesn’t that just fill you with warm cozy feelings? No? Huh, must be just me then.

Oh yeah, I did end up figuring out how to do the endptr thing with strtol, it wasn’t too hard.

Sorting and finishing touches

Ok, next up, we have to sort these arrays. Is there even a sorting algorithm of any kind in the C standard library? I can’t find anything promising from the autosuggestions that show up when I type #include ", and I don’t feel like trying to implement quicksort without even knowing the proper syntax for declaring an array,12

12 Or having a very deep understanding of quicksort, for that matter.

so I guess it’s To The Googles We Must Go.

…Gosh darn it, it’s literally just called qsort. Ok, fine, at least I won’t use Google for the usage.

You have to pass it a comparison function, which, sure, but that function accepts two arguments of type const void *, which makes the compiler scream at me when I attempt to a) pass it a function that takes integer pointers instead, or b) cast the void pointers to integers. Not sure of the proper way to do this so I’m just going to ignore the warnings for now because it seems to work, and…

int cmp(const int *a, const int *b) {
    int _a = (int)(*a);
    if (*a > *b) {
        return 1;
    }
    else if (*a == *b) {
        return 0;
    }
    else {
        return -1;
    }
}

// later, in main()

qsort(nums_l, nlines, 4, cmp);
qsort(nums_r, nlines, 4, cmp);

int sum = 0;
for (int i = 0; i < nlines - 1; i++) {
    int *left = (int *)(nums_l + i * 4);
    int *right = (int *)(nums_r + i * 4);

    int diff = *left - *right;
    if (diff < 0) {
        diff = diff * -1;
    }

    sum += diff;
}

printf("%i", sum);

$ cc 01.c && ./a.out

>>> (compiler warnings)
2580759

Could it be? Is this it?

…nope. Knew it was too good to be true.

Wait, why am I using nlines - 1 as my upper bound? I was trying to avoid an off-by-one error, because of course the “array” is “zero-indexed” (or would be if I were using arrays properly) and I didn’t want to go past the end. But, of course, I forgot that i < nlines will already stop the loop after the iteration where i = 999. Duh. That’s not even a C thing, I could have made that mistake in Javascript. Golly. I guess my excuse is that I’m so busy focusing on How To C that I’m forgetting things I already knew?

Anyway, after correcting that error, my answer does in fact validate, so hooray! Part 1 complete!

Ok, before I go on to Part 2, I am definitely looking up how to do arrays.

Interlude: Arrays in C

Ok, so turns out there are fixed-size arrays (where the size is known at compile time), and there are dynamically-sized arrays, and they work a little differently. Fixed-sized arrays can be declared like this:

int arr[] = {1, 2, 3, 4}; // array-literal syntax
int arr[100]; // declaring an array of a certain size, but uninitialized

Then you have dynamically-sized arrays, where the size of the array might not be known until runtime, so you have to use malloc (or alloca I guess, but you have to be real careful not to overflow your stack when you do that):

int *arr = malloc(1000 * sizeof(int));

That’s it, you’re done. Apparently the fact that arr is declared as what looks like (to me, anyhow) a pointer to an int is enough to tell the compiler, when accessed with square brackets, that this is actually a pointer to an array of ints, and to multiply the index by the appropriate size (so I guess 4 in this case) to get the element at that array index.

Interestingly, with the above snippet, when I started accessing various indexes over 1000 to see what would happen, I got all the way to 32768 before it started to segfault.13

13 The actual boundary is somewhere between 33000 and 34000, not sure where exactly because I got bored trying different indices.

I guess malloc doesn’t even get out of bed for allocations less than 128 KiB?14

14 Actually, what’s probably happening is that malloc is requesting a bigger chunk from the system in order to speed up any future allocations I might want to do. So if I were to call malloc again, it would just give me another chunk from that same region of memory. But of course C doesn’t care what memory I access, it’s only the system that enforces memory boundaries, and those are process-wide, so as long as I’m within that same chunk I’m fine. Just a guess, though.

Armed with this new superpower, my full solution to Part 1 becomes:15

15 I could (and probably should) also be using fixed-size arrays here, since it doesn’t seem like there’s any advantage to using malloc.

#include "stdio.h"
#include "stdlib.h"
#include "string.h"

int cmp(const int *a, const int *b) {
    if (*a > *b) {
        return 1;
    }
    else if (*a == *b) {
        return 0;
    }
    else {
        return -1;
    }
}

int main() {
    FILE *file = fopen("data/01.txt", "r");
    char *data = (char *)malloc(16384);
    size_t data_len = fread(data, 1, 16384, file);

    int *nums_l = malloc(16384);
    int *nums_r = malloc(16384);

    int nlines = 0;
    while (1) {
        char* line = strsep(&data, "\n");
        int left = strtol(line, &line, 10);
        int right = strtol(line, NULL, 10);

        // if `strtol` fails, it apparently just returns 0, how helpful
        if (left == 0 && right == 0) {
            break;
        }

        nums_l[nlines] = left;
        nums_r[nlines] = right;

        nlines++;
    }

    qsort(nums_l, nlines, 4, cmp);
    qsort(nums_r, nlines, 4, cmp);

    int sum = 0;
    for (int i = 0; i < nlines; i++) {
        int diff = nums_l[i] - nums_r[i];
        if (diff < 0) {
            diff = diff * -1;
        }
        sum += diff;
    }

    printf("%i", sum);
}

Still getting compiler warnings about cmp not matching the required signature, though. Maybe I’ll figure that out for Part 2.16

16 Later addendum: I did end up figuring this out. Short version, I was just forgetting that the arguments are pointers. Instead of casting to int I needed to cast to int *.

Part 2

Part 2 has us counting frequencies instead of doing one-for-one comparisons. For each integer in the left column, we need to multiply it by the number of times it occurs in the right column, then add all those products together.

Obviously the right way to do this would be to count occurrences for every integer in the right column, store those counts in a hash table, and then use those counts as we work through the left column. But C doesn’t have a native hash table, and I don’t particularly feel like trying to implement one (although I’m sure I would learn a lot more about C that way). But you know what? C is fast, and our arrays of numbers here are only 4 KB. My CPU has 64 KiB of L1 cache, so I’m pretty sure that I can just be super-duper naive about this and iterate over the entirety of the right column for every value in the left column. Sure, it’s O(N^2), but N in this case is 1000, and a million operations on data in L1 cache isn’t going to take hardly any time at all. So let’s give it a shot.

int count_occurrences(int num, int arr[], int len) {
    int total = 0;
    for (int i = 0; i < len; i++) {
        if (arr[i] == num) {
            total++;
        }
    }
    return total;
}

int part2(int nums_l[], int nums_r[], int len) {
    int score = 0;
    for (int i = 0; i < len; i++) {
        score += nums_l[i] * count_occurrences(nums_l[i], nums_r, len);
    }
    return score;
}

int main() {
    // ...

    int solution_2 = part2(nums_l, nums_r, nlines);
    printf("Part 2: %i\n", solution_2);
}

And what do you know? It works!

And it takes about 30ms to run. Shlemiel the Painter? Never heard of him.

I was going to make impressed comments here about how fast C is, but then I decided to try it in Python, and it takes less then a second there too, so… you know. Day 1 is just easy, even for brute-force solutions.

And That’s It

Hey, that wasn’t so bad! I’m sure it would have been a lot harder had I waited until one of the later days, but even so, I can kind of see where C lovers are coming from now. It’s a little bit freeing to be able to just throw pointers around and cast types to other types because hey, they’re all just bytes in the end. I’m sure if I tried to do anything really complex in C, or read someone else’s code, it would start to fall apart pretty quickly, but for quick-and-dirty one-off stuff—it’s actually pretty good! Plus all the *nix system interfaces are C-native, so next time I’m fiddling with something at the system level I might just whip out the ol’ cc and start throwing stuff at the wall to see what sticks.

By the way, if you’d like to hear more of my thoughts on C, I expect to be invited to speak at at least three major C conferences next year17

17 Are there even three major C conferences? I know there are lots for C++, but C++ is a lot more complex than C.

since I am now a Certified Expert Practitioner, and my schedule is filling up fast. Talk to my secretary before it’s too late!