Advent of Languages 2024, Day 1: C
Monday, the second of December, A.D. 2024
A s time goes on, it’s becoming increasingly clear to me that I’m a bit of a programming-language dilletante. I’m always finding weird niche languages like Pony or Roc, going “wow that looks cool,” spending a bunch of time reading the documentation, and then never actually using it and forgetting all about it for the next three years.
This year, I’ve decided I’m going either buck that trend or double down on it, depending on your point of view. Instead of not engaging at all with whatever random language strikes my fancy, I’m going to engage with it to the absolute minimum degree possible, then move on. Win-win, right? I get to feel like I’m being more than a dilletante, but I don’t have to do anything hard like really learn a new language.
I should probably mention here, as a disclaimer, that I’ve never gotten all the way through an AoC in my life, and there’s no way I’m going to do better with more problems to worry about. I’m guessing I’ll peter out by day 12 or so, that’s about as far as I usually get. Oh, and there’s no way I’m going to stick to the one-day cadence either. It’ll probably be May or so before I decide that enough is enough and I’m going to call it.
Anyway, I’ve decided to start with C, mostly because I’m scared of C and Day 1 of AoC is always the easiest, so I won’t have to really get into it at all.
The C Programming Language
C, of course, needs no introduction. It’s known for being small, fast, and the language in which Unix was implemented, not to mention most (all?) other major OS kernels. It’s also known for being the Abode of Monsters, i.e. there are few-to-no safeguards, and if you screw up the consequences might range from bad (segfaults in userland), to worse (kernel panics), to catastrophic (your program barfs out millions of users’ highly sensitive data to anyone who asks).
All of which explains why I’m just a tad bit apprehensive to dip my toes into C. Thing is, for all its downsides, C is everywhere. Not only does it form the base layer of most computing infrastructure like OS kernels and network appliances, it’s also far and away the most common language used for all the little computers that form parts of larger systems these days. You know, like cars, industrial controllers, robotics, and so on. So I feel like it would behoove me to at least acquire a passing familiarity with C one of these days, if only to be able to say that I have.
Oh, but to make it extra fun, I’ve decided to try to get through at least the first part of Day 1 without using any references at all, beyond what’s already available on my computer (like manpages and help messages of commands). This is a terrible idea. Don’t do things this way. Also, if you’re in any way, shape, or form competent in C, please don’t read the rest of this post, for your own safety and mine. Thank you.
Experiments in C
Ok, let’s get the basics out of the way first. Given a program, can I actually compile it and make it run? Let’s try:
#include "stdio.h" // pretty sure I've seen this a lot, I think it's for stuff like reading from stdin and writing to stdout
int main() { // the `int` means that this function returns an int, I think?
printf("hello, world!");
}
Now, I’m not terribly familiar with C toolchains, having mostly used them from several layers of abstraction up, but I’m pretty sure I can’t just compile this and run it, right? I think compiling will turn this into “object code”, which has all the right bits in it that the computer needs to run it, but in order to put it all in a format that can actually be executed I need to “link” it, right?
Anyway, let’s just try it and see.
$ cc 01.c
$ ls
>>> 01.c a.out
$ ./a.out
>>> "hello, world!"
Well, what do you know. It actually worked.
#include
, but apparently this is the correct syntax so it just worked.The Puzzle, Part 1
This is pretty encouraging, so let’s tackle the actual puzzle for Day 1. There’s a bunch of framing story like there always is, but the upshot is that we’re given two lists arranged side by side, and asked to match up the smallest number in the first with the smallest number in the second, the second-smallest in the first with the second-smallest in the second, etc. Then we have to find out how far apart each of those pairs is, then add up all of those distances, and the total is our puzzle answer.
This is conceptually very easy, of course (it’s only Day 1, after all). Just sort the two lists, iterate over them to grab the pairs, take abs(a - b)
for each pair, and sum those all up. Piece of cake.
Except of course, that this is C, and I haven’t the first idea how to do most of those things in C.
Loading data
Ok, so first off we’ll need to read in the data from a file. That shouldn’t be too hard, right? I know fopen
is a thing, and I am (thankfully) on Linux, so I can just man fopen
and see what I get, right? type type Aha, yes! half a moment, I’ll be back.
Mmmk, so man fopen
gives me these very helpful snippets:
SYNOPSIS
#include <stdio.h>
FILE *fopen(const char *pathname, const char *mode);
(...)
The argument mode points to a string beginning with one of the following sequences (possibly followed by additional characters, as described below):
r Open text file for reading. The stream is positioned at the beginning of the file.
(...)
Ok, so let’s just try opening the file and then dumping the pointer to console to see what we have.
#include "stdio.h"
int main() {
int f_ptr = fopen("data/01.txt", "r");
printf(f_ptr);
}
$ cc 01.c
>>> 01.c: In function ‘main’:
01.c:4:17: warning: initialization of ‘int’ from ‘FILE *’ makes integer from pointer without a cast [-Wint-conversion]
4 | int f_ptr = fopen("data/01.txt", "r");
| ^~~~~
01.c:5:12: warning: passing argument 1 of ‘printf’ makes pointer from integer without a cast [-Wint-conversion]
5 | printf(f_ptr);
| ^~~~~
| |
| int
In file included from 01.c:1:
/usr/include/stdio.h:356:43: note: expected ‘const char * restrict’ but argument is of type ‘int’
356 | extern int printf (const char *__restrict __format, ...);
| ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~
01.c:5:5: warning: format not a string literal and no format arguments [-Wformat-security]
5 | printf(f_ptr);
…oh that’s right, this is C. we can’t just print an integer, it would interpret that integer as a pointer to a string and probably segfault. In fact…
$ ./a.out
>>> Segmentation fault (core dumped)
Right. Ok, well, man
was our friend last time, maybe it can help here too?
man printf
Why, yes! Yes it—oh wait, no. No, this isn’t right at all.
Oh yeah, printf
is also a standard Unix shell command, so man printf
gives you the documentation for that. I guess man fopen
only worked because fopen
is a syscall, as well as a library function. Oh well, let’s just see if we can guess the right syntax.
#include "stdio.h"
int main() {
int f_ptr = fopen("data/01.txt", "r");
printf("%i", f_ptr);
}
$ cc 01.c
$ ./a.out
>>> 832311968
Hey, would you look at that! Weirdly enough, so far it’s been my Python experience that’s helped most, first with the fopen
flags and now this. I guess Python wears its C heritage with pride.
I’m cheating a little, by the way. Well, kind of a lot. I switched editors recently and am now using Zed primarily (for languages it supports, at least), and Zed automatically runs a C language server by default when you’re working in C.
std::
which I think is a C++ thing.FILE *file = fopen("data/01.txt", "r");
so now we have a pointer to a FILE
struct, which we can give to fread()
I think? man fread
gives us this:
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
Which means, I think, that fread()
accepts a pointer to a region of memory into which it’s reading data, an item size and a number of items,
FILE
struct. Ok, great. Before we can do that, though, we need to get that first pointer, the one for the destination. man malloc
is helpful, telling me that I just need to give it a number and it gives me back a void *
pointer. I think it’s void
because it doesn’t really have a type—it’s a pointer to uninitialized memory, so you can write to it, but if you try to read from it or otherwise interpret it as being of any particular type it might blow up in your face.
Anyway:
#include "stdio.h"
#include "stdlib.h"
int main() {
FILE *file = fopen("data/01.txt", "r");
void *data = malloc(16384);
size_t data_len = fread(data, 1, 16384, file);
printf("%zu", n_read);
}
}
I happen to know that my personal puzzle input is 14KB, so this will be enough. If the file were bigger, I’d have to either allocate more memory or read it in multiple passes. Oh, the joys of working in a non-memory-managed language.
Running this outputs 14000
, so I think it worked. I’m not sure if there’s a performance penalty for using an item size of 1 with fread
, but I’m guessing not. I highly doubt, for instance, that under the hood this is translating to 14,000 individual syscalls, because that would be a) completely bonkers and b) unnecessary since it already knows ahead of time what the max size of the read operation is going to be.
fread
is mostly a historical accident, and most people either do fread(ptr, 1, bufsize, file)
(if reading less than the maximum size is acceptable) or fread(ptr, bufsize, 1, file)
(if incomplete reads are to be avoided.)Splitting hairs strings
Ok, next up we’re going to have to a) somehow split the file into lines, b) split each line on the whitespace that separates the two columns, and c) parse those strings as integers. Some poking at the language server yields references to strsep
, which appears to do exactly what I’m looking for:
char *strsep(char **stringp, const char *delim);
If *stringp is NULL, the strsep() function returns NULL and does nothing else.
Otherwise, this function finds the first token in the string *stringp, that is
delimited by one of the bytes in the string delim. This token is terminated by
overwriting the delimiter with a null byte ('