Advent of Languages 2024, Day 4: Fortran

Tuesday, the tenth of December, A.D. 2024

The old that is strong does not wither
On the four(tran)th day of Advent, my mainframe gave to me
The Puzzle
Loading data is hard this time
The fun part
Part 2
Lessons and carols

O h, you thought we were done going back in time? Well I’ve got news for you, Doc Brown, you’d better not mothball the ol’ time machine just yet, because we’re going back even further. That’s right, for Day 4 I’ve decided to use Fortran!1

1 Apparently it’s officially called Fortran now and not FORTRAN like it was in days of yore, and has been ever since the 1990s. That’s right, when most languages I’ve used were just getting their start, Fortran was going through its mid-life identity crisis.

2 When I told my wife that I was going to be using a language that came out in the 1950s, she wanted to know if the next one would be expressed in Egyptian hieroglyphs.

Really, though, it’s because this is day four, and I had to replace all those missed Forth jokes with something.

The old that is strong does not wither

Fortran dates back to 1958, making it the oldest programming language still in widespread use.3

3 Says Wikipedia, at least. Not in the article about Fotran, for some reason, but in the one about Lisp.

Exactly how widespread is debatable—the TIOBE index puts it at #8, but the TIOBE index also puts Delphi Pascal at #11 and Assembly at #19, so it might have a different idea of what makes a language “popular” than you or I.4

4 For contrast, Stack Overflow puts it at #38, right below Julia and Zig, which sounds a little more realistic to me.

Regardless, it’s undeniable that it gets pretty heavy use even today—much more than Forth, I suspect—because of its ubiquity in the scientific and HPC sectors. The website mentions “numerical weather and ocean prediction, computational fluid dynamics, applied math, statistics, and finance” as particularly strong areas. My guess is that this largely comes down to intertia, plus Fortran being “good enough” for the things people wanted to use it for that it was easier to keep updating Fortran than to switch to something else wholesale.5

5 Unlike, say, BASIC, which is so gimped by modern standards that it doesn’t even have a call stack. That’s right, you can’t do recursion in BASIC, at least not without managing the stack yourself.

And update they have! Wikipedia lists 12 major versions of Fortran, with the most recent being Fortran 2023. That’s a pretty impressive history for a programming language. It’s old enough to retire!

The later versions of Fortran have added all sorts of modern conveniences, like else-if conditionals (77), properly namespaced modules (90), growable arrays (also 90), local variables (2008), and finally, just last year, ternary expressions and the ability infer the length of a string variable from a string literal! Wow!

I have to say, just reading up on Fortran is already feeling modern than it did for Forth, or even C/C++. It’s got a snazzy website6

6 With a dark/light mode switcher, so you know it’s hip.

with obvious links to documentation, sitewide search, and even an online playground. This really isn’t doing any favors for my former impression of Fortran as a doddering almost-septegenarian with one foot in the grave and the other on a banana peel.

On the four(tran)th day of Advent, my mainframe gave to me

The Fortran getting-started guide literally gives you hello-world, so I won’t bore you with that here. Instead I’ll just note some interesting aspects of the language that jumped out at me:

There’s no main() function like C and a lot of other compiled languages, but there are mandatory program <name> ... end program delimiters at the start and end of your outermost layer of execution. Modules are defined outside of the program ... end program block. Not sure yet whether you can have multiple program blocks, but I’m leaning towards no?
Variables are declared up-front, and are prefixed with their type name followed by ::. You can leave out the type qualifier, in which case the type of the variable will be inferred not from the value to which it is first assigned, but from its first letter: variables whose names start with i, j, k, l, m, n are integers, everything else is a real (floating-point). Really not sure what drove that decision, but it’s described as deprecated, legacy behavior anyway, so I plan to ignore it.
Arrays are 1-indexed. Also, multi-dimensional arrays are a native feature! I’m starting to see that built-for-numerical-workloads heritage.
It has break and continue, but they’re named exit and cycle.
There’s a built-in parallel-loop construct,7

7 It uses different syntax to define its index and limit. That’s what happens when your language development is spread over the last 65 years, I guess.

which “informs the compiler that it may use parallelization/SIMD to speed up execution”. I’ve only ever seen this done at the library level before. If you’re lucky your language has enough of a macro system to make it look semi-natural, otherwise, well, I hope you like map/reduce.
It has functions, but it also has “subroutines”. The difference is that functions return values and are expected not to modify their arguments, and subroutines don’t return values but may modify their arguments. I guess you’re out of luck if you want to modify an argument and return a value (say, a status code or something).
Function and subroutine arguments are mentioned in the function signature (which looks like it does in most languages), but you really get down to brass tacks in the function body itself, which is where you specify the type and in-or-out-ness of the parameters. Reminds me of PowerShell, of all things.
The operator for accessing struct fields is %. Where other languages do sometype.field, in Fortran you’d do sometype%field.
Hey look, it’s OOP! We can have methods! Also inheritance, sure, whatever.

Ok, I’m starting to get stuck in the infinite docs-reading rut for which I criticized myself at the start of this series, so buckle up, we’re going in.

The Puzzle

We’re given a two-dimensional array of characters and asked to find the word XMAS everywhere it occurs, like those word search puzzles you see on the sheets of paper they hand to kids at restaurants in a vain attempt to keep them occupied so their parents can have a chance to enjoy their meal.

Hey, Fortran might actually be pretty good at this! At least, multi-dimensional arrays are built in, so I’m definitely going to use those.

First things first, though, we have to load the data before we can start working on it.8

8 Getting a Fortran compiler turned out to be as simple as apt install gfortran.

My word-search grid appears to be 140 characters by 140, so I’m just going to hard-code that as the dimensions of my array. I’m sure there’s a way to size arrays dynamically, but life’s too short.

Loading data is hard this time

Not gonna lie here, this part took me way longer than I expected it to. See, the standard way to read a file in Fortran is with the read() statement. (It looks like a function call, but it’s not.) You use it something like this:

read(file_handle, *) somevar, anothervar, anothervar2

Or at least, that’s one way of using it. But here’s the problem: by default, Fortran expects to read data stored in a “record-based” format. In short, this means that it’s expected to consist of lines, and each line will be parsed as a “record”. Records consist of some number of elements, separated by whitespace. The “format” of the record, i.e. how the line should be parsed, can either be explicitly specified in a slightly arcane mini-language reminiscent of string-interpolation placeholders (just in reverse), or it can be inferred from the number and types of the variables specified after read().

Initially, I thought I might be able to do this:

character, dimension(140, 140) :: grid

! ...later
read(file_handle, *) grid

The top line is just declaring grid as a 2-dimensional array characters, 140 rows by 140 columns. Neat, huh?

But sadly, this kept spitting out errors about how it had encountered the end of the file unexpectedly. I think what was happening was that when you give read() an array, it expects to populate each element of the array with one record from the file, and remember records are separated by lines, so this was trying to assign one line per array element. My file had 140 lines, but my array had 140 * 140 elements, so this was never going to work.

My next try looked something like this:

do row = 1, 100
    read(file_handle, *) grid(row, :)
end do

But this also resulted in end-of-file errors. Eventually I got smart and tried running this read statement just once, and discovered that it was populating the first row of the array with the first letter of each line in the input file. I think what’s going on here is that grid(1, :) creates a slice of the array that’s 1 row by the full width (so 140), and the read() statement sees that and assumes that it needs to pull 140 records from the file each time this statement is executed. But records are (still) separated by newlines, so the first call to read() pulls all 140 rows, dumps everything but the first character from each (because, I think, the type of the array elements is character), puts that in and continues on. So after just a single call to read() it’s read every line but dumped most of the data.

I’m pretty sure the proper way to do this would be to figure out how to set the record separator, but it’s tricky because the “records” (if we want each character to be treated as a record) within each line are smashed right up against each other, but have newline characters in between lines. So I’d have to specify that the separator is sometimes nothing, and sometimes \n, and I didn’t feel like figuring that out because all of the references I could find about Fortran format specifiers were from ancient plain-HTML pages titled things like “FORTRAN 77 INTRINSIC SUBROUTINES REFERENCE” and hosted on sites like web.math.utk.edu where they probably do date back to something approaching 1977.

So instead, I decided to just make it dumber.

program advent04
    implicit none

    character, dimension(140, 140) :: grid
    integer :: i

    grid = load()
    do i = 1, 140
        print *, grid(i, :)
    end do

    contains

    function load() result(grid)
        implicit none

        integer :: handle
        character, dimension(140, 140) :: grid
        character(140) :: line
        integer :: row
        integer :: col

        open(newunit=handle, file="data/04.txt", status="old", action="read")

        do row = 1, 140
            ! `line` is a `character(140)` variable, so Fortran knows to look for 140 characters I guess
            read(handle, *) line
            do col = 1, 140
                ! just assign each character of the line to array elements individually
                grid(row, col) = line(col:col)
            end do
        end do

        close(handle)
    end function load
end program advent04

I am more than sure that there are several dozen vastly better ways of accomplishing this, but look, it works and I’m tired of fighting Fortran. I want to go on to the fun part!

The fun part

The puzzle specifies that occurrences of XMAS can be horizontal, verical, or even diagonal, and can be written either forwards or backwards. The obvious way to do this would be to scan through the array, stop on every X character and cheak for the complete word XMAS in each of the eight directions individually, with a bunch of loops. Simple, easy, and probably more than performant enough because this grid is only 140x140, after all.9

9 Although AoC has a way of making the second part of the puzzle punish you if you were lazy and went with the brute-force approach for the first part, so we’ll see how this holds up when we get there.

But! This is Fortran, and Fortran’s whole shtick is operations on arrays, especially multidimensional arrays. So I think we can make this a lot more interesting. Let’s create a “test grid” that looks like this:

S . . S . . S
. A . A . A .
. . M M M . .
S A M X M A S
. . M M M . .
. A . A . A .
S . . S . . S

Which has all 8 possible orientationS of the word XMAS starting from the central X. Then, we can just take a sliding “window” of the same size into our puzzle grid and compare it to the test grid. This is a native operation in Fortran—comparing two arrays of the same size results in a third array whose elements are the result of each individual comparison from the original arrays. Then we can just call count() on the resulting array to get the number of true values, and we know how many characters matched up. Subtract 1 for the central X we already knew about, then divide by 3 since there are 3 letters remaining in each occurrence of XMAS, and Bob’s your uncle, right?

…Wait, no. That won’t work because it doesn’t account for partial matches. Say we had a “window” that looked like this (I’m only showing the bottom-right quadrant of the window for simplicity):

X M X S
S . . .
A . . .
X . . .

If we were to apply the process I just described to this piece of the grid, we would come away thinking there was 1 full match of XMAS, because there are one each of X, M, A, and S in the right positions. Problem is, they aren’t all in the right places to be part of the same XMAS, meaning that there isn’t actually a match here at all.

To do this properly, we need some way of distinguishing the individual “rays” of the “star”, which is how I’ve started thinking about the test grid up above, so that we know whether all of any given “ray” is present. So what if we do it this way?

Apply the mask to the grid as before, but this time, instead of just counting the matches, we’re going to convert them all to 1s. Non-matches will be converted to 0.
Pick a prime number for each “ray” of the “star”. We can just use the first 8 prime numbers (excluding 1, of course). Create a second mask with these values subbed in for each ray, and 1 in the middle. So the ray extending from the central X directly to the right, for instance, would look like this, assuming we start assigning our primes from the top-left ray and move clockwise: 1 7 7 7
Multiply this array by the array that we got from our initial masking operation. Now any matched characters will be represented by a prime number specific to that ray of the star.
Convert all the remaining 0s in the resulting array to 1s, then take the product of all values in the array.
Test whether that product is divisible by the cube of each of the primes used. E.g. if it’s divisible by 8, we know that there must have been three 2’s in the array, so we know that the top-left ray is entirely present. So we can add 1 to our count of valid XMASes originating at this point.

Will this work? Is it even marginally more efficient than the stupidly obvious way of just using umpty-gazillion nested for loops—excuse me, “do loops”—to test each ray individually? No idea! It sure does sound like a lot more fun, though.

Ok, first things first. Let’s adjust the data-loading code to pad the grid with 3 bogus values on each edge, so that we can still generate our window correctly when we’re looking at a point near the edge of the grid.

grid = '.' ! probably wouldn't matter if we skipped this, uninitialized memory just makes me nervous

open(newunit=handle, file="data/04.txt", status="old", action="read")
do row = 4, 143
    read(handle, *) line
    do col = 1, 140
        grid(row, col + 3) = line(col:col)
    end do
end do

Turns out assigning a value element to an array of that type of value (like grid = '.' above) just sets every array element to that value, which is very convenient.

Now let’s work on the whole masking thing.

Uhhhh. Wait. We might have a problem here. When we take the product of all values in the array after the various masking and prime-ization stuff, we could _conceivably end up multiplying the cubes of the first 8 prime numbers. What’s the product of the cubes of the first 8 prime numbers?

912585499096480209000

Hm, ok, and what’s the max value of a 64-bit integer?

9223372036854775807

Oh. Oh, noooo.

It’s okay, I mean, uh, it’s not that much higher. Only two orders of magnitude, and what are the odds of all eight versions of XMAS appearing in the same window, anyway? Something like 1/4²⁵? Maybe we can still make this work.

integer function count_xmas(row, col) result(count)
    implicit none

    integer, intent(in) :: row, col
    integer :: i
    integer(8) :: prod
    integer(8), dimension(8) :: primes
    character, dimension(7, 7) :: test_grid, window
    integer(8), dimension(7, 7) :: prime_mask, matches, matches_prime

    test_grid = reshape( &
        [&
            'S', '.', '.', 'S', '.', '.', 'S', &
            '.', 'A', '.', 'A', '.', 'A', '.', &
            '.', '.', 'M', 'M', 'M', '.', '.', &
            'S', 'A', 'M', 'X', 'M', 'A', 'S', &
            '.', '.', 'M', 'M', 'M', '.', '.', &
            '.', 'A', '.', 'A', '.', 'A', '.', &
            'S', '.', '.', 'S', '.', '.', 'S' &
        ], &
        shape(test_grid) &
    )

    primes = [2, 3, 5, 7, 11, 13, 17, 19]

    prime_mask = reshape( &
        [ &
            2,  1,  1,  3,  1,  1,  5, &
            1,  2,  1,  3,  1,  5,  1, &
            1,  1,  2,  3,  5,  1,  1, &
            19, 19, 19, 1,  7,  7,  7, &
            1,  1,  17, 13, 11, 1,  1, &
            1,  17,  1, 13, 1,  11, 1, &
            17,  1,  1, 13, 1,  1,  11 &
        ], &
        shape(prime_mask) &
    )

    window = grid(row - 3:row + 3, col - 3:col + 3)
    matches = logical_to_int64(window == test_grid)
    matches_prime = matches * prime_mask
    prod = product(zero_to_one(matches_prime))

    count = 0
    do i = 1, 8
        if (mod(prod, primes(i) ** 3) == 0) then
            count = count + 1
        end if
    end do
end function count_xmas

elemental integer(8) function logical_to_int64(b) result(i)
    implicit none

    logical, intent(in) :: b

    if (b) then
        i = 1
    else
        i = 0
    end if
end function logical_to_int64

elemental integer(8) function zero_to_one(x) result(y)
    implicit none

    integer(8), intent(in) :: x

    if (x == 0) then
        y = 1
    else
        y = x
    end if
end function zero_to_one

Those &s are line-continuation characters, by the way. Apparently you can’t have newlines inside a function call or array literal without them. And the whole reshape business is a workaround for the fact that there isn’t actually a literal syntax for multi-dimensional arrays, so instead you have to create a 1-dimensional array and “reshape” it into the desired shape.

Now we just have to put it all together:

total = 0
do col = 4, 143
    do row = 4, 143
        if (grid(row, col) == 'X') then
            total = total + count_xmas(row, col)
        end if
    end do
end do

print *, total

These elemental functions, by the way, are functions you can ~~explain to Watson~~ apply to an array element-wise. So logical_to_int64(array) returns an array of the same shape with all the “logical” (boolean) values replaced by 1s and 0s.

This actually works! Guess I dodged a bullet with that 64-bit integer thing.10

10 Of course I discovered later, right before posting this article, that Fortran totally has support for 128-bit integers, so I could have just used those and not worried about any of this.

I did have to go back through and switch out all the integer variables in count_xmas() with integer(8)s (except for the loop counter, of course). This changed my answer significantly. I can only assume that calling product() on an array of 32-bit integers, then sticking the result in a 64-bit integer, does the multiplication as 32-bit first and only then converts to 64-bit, after however much rolling-over has happened. Makes sense, I guess.

Ok, great! On to part 2!

Part 2

It’s not actually too bad! I was really worried that it was going to tell me to discount all the occurrences of XMAS that overlapped with another one, and that was going to be a royal pain the butt with this methodology. But thankfully, all we have to do is change our search to look for two occurrences of the sequence M-A-S arranged in an X shape, like this:

M . S
. A .
M . S

This isn’t too difficult with our current approach. Unfortunately it will require four test grids applied in sequence, rather than just one, because again the sequence can be written either forwards or backwards, and we have to try all the permutations. On the plus side, we can skip the whole prime-masking thing, because each test grid is going to be all-or-nothing now. In fact, we can even skip checking any remaining test grids whenver we find a match, because there’s no way the same window could match more than one.

Hmm, I wonder if there’s a way to take a single starting test grid and manipulate it to reorganize the characters into the other shapes we need?

Turns out, yes! Yes there is. We can use a combination of slicing with a negative step, and transposing, which switches rows with columns, effectively rotating and flipping the array. So setting up our test grids looks like this:

character, dimension(3, 3) :: window, t1, t2, t3, t4

t1 = reshape( &
    [ &
        'M', '.', 'S', &
        '.', 'A', '.', &
        'M', '.', 'S' &
    ], &
    shape(t1) &
)
t2 = t1(3:1:-1, :) ! flip t1 top-to-bottom
t3 = transpose(t1) ! swap t1 rows for columns
t4 = t3(:, 3:1:-1) ! flip t3 left-to-right

Then we can just compare the window to each test grid:

window = grid(row - 1:row + 1, col - 1:col + 1)
if ( &
    count_matches(window, t1) == 5 &
    .or. count_matches(window, t2) == 5 &
    .or. count_matches(window, t3) == 5 &
    .or. count_matches(window, t4) == 5 &
) then
    count = 1
else
    count = 0
end if

To my complete and utter astonishment, this actualy worked the first time I tried it, once I had figured out all of the array-flipping-and-rotating I needed to create the test grids. It always makes me suspicious when that happens, but Advent of Code confirmed it, so I guess we’re good!11

11 Or I just managed to make multiple errors that all cancelled each other out.

It did expose a surprisingly weird limitation in the Fortran parser, though. Initially I kept trying to write the conditions like this: if(count(window == t1) == 5), and couldn’t understand the syntax errors it was throwing. Finally I factored out count(array1 == array2) into a separate function, and everything worked beautifully. My best guess is that the presence of two == operators inside a single if condition, not separated by .and. or .or., is just a no-no. The the things we learn.

Lessons and carols

(Whoa now, we’re not that far into Advent yet.)

Despite being one of the oldest programming languages still in serious use, Fortran manages to feel surprisingly familiar. There are definite archaisms, like having to define the types of all your variables at the start of your program/module/function,12

12 Even throwaway stuff like loop counters and temporary values.

, having to declare function/subroutine names at the beginning and end, and the use of the word “subroutine”. But overall it’s kept up surprisingly well, given—and I can’t stress this enough—that it’s sixty-six years old. It isn’t even using CAPITAL LETTERS for everything any more,13

13 Although the language is pretty much case-insensitive so you can still use CAPITALS if you want.

which puts it ahead of SQL,14

14 Actually, I suspect the reason the CAPITALS have stuck around in SQL is that more than most languages, you frequently find yourself writing SQL in a string from another language. Occasionally editors will be smart enough to syntax-highlight it as SQL for you, but for the times they aren’t, using CAPITALS for all the KEYWORDS serves as a sort of minimal DIY syntax highlighting. That’s what I think, at least.

and SQL is 10+ years younger.

It still has support for a lot of really old stuff. For instance, you can label statements with numbers and then go to a numbered statement, but there’s really no use for that in new code. We have functions, subroutines, loops, if-else-if-else conditionals—basically everything you would (as I understand it) use goto for back in the day.

Runs pretty fast, too. I realized after I already had a working solution that I had been compiling without optimizations the whole time, so I decided to try enabling them, only to discover that the actual execution time wasn’t appreciably different. I figured the overhead of spawning a process was probably eating the difference, so I tried timing just the execution of the main loop and sure enough, without optimizations it took about 2 milliseconds whereas with optimizations it was 690 microseconds. Whee! Native-compiled languages are so fun. I’m too lazy to try rewriting this in Python just to see how much slower it would be, but I’m pretty sure that this time it would be quite noticeable.

Anyway, that about wraps it up for Fortran. My only remaining question is: What is the appropriate demonym for users of Fortran? Python has Pythonistas, Rust has Rustaceans, and so on. I was going to suggest “trannies” for Fortran users, but everyone kept giving me weird looks for some reason.