Going down a rabbithole…

Good gracious. This morning I thought I’d just do a quick task on my migrated Lightsail sites: setting up a Lambda function to check every 5 minutes, see if the page contains some specific text, and send me an alarm if the site isn’t up. My first thought was to use CloudWatch Synthetics, but the pricing is a lot higher than just doing it yourself with a Lambda function (though you don’t get spiffy screenshots and such). I kept it simple and happily discovered there’s an existing lambda-canary blueprint available. So I set that up, pointed it at https://www.roalddahlfans.com, and tested it out. It worked great!

Then the Snook, looking over my shoulder, said, “That’ll be going through CloudFront. Why don’t you point it at the origin subdomain so you know it’s hitting the real WordPress underneath?” Okay, sure. Just to double-check it was working, I went to the origin subdomain in the browser… and was redirected to www. What. Uh, that’s not good. Thinking I had screwed something up, I tried hitting the origin subdomain of web-goddess.org… and that worked correctly, not redirecting to www. What the hell. Why were my two sites behaving differently? And why wasn’t RoaldDahlFans’s CloudFront distro barfing that I had set it up with an origin that was redirecting to itself??

Over the next seven hours – seriously – the Snook and I beat our heads against this problem. I tried turning on and off various plugins; I grepped both filesystems multiple times looking for differences; I completely rebuilt the CloudFront distribution for RoaldDahlFans; I turned SSL off and on repeatedly; I fiddled with heaps of htaccess settings… and we got nowhere. We determined that on web-goddess, if I went to https://web-goddess.org, it would be redirected to www; but for any other subdomain (foo.web-goddess.org, etc) it would not. But on RoaldDahlFans, it would always go to www regardless of whether you used a subdomain or not. It was so frustrating.

https://twitter.com/web_goddess/status/1282169357104103424

Various people chimed in pointing us to various things, without much success. My buddy Peter Wilson mentioned thinking that WordPress had some special behaviour to redirect between www and non-www domains, which eventually ended up in us poring over the redirect_canonical code. The Snook noticed that another thing this module does is try to redirect you to the correct page if you type in a URL wrong. For example, if you try to access https://web-goddess.org/about, WordPress will automatically redirect you to https://www.web-goddess.org/about-me (which is the real address). However, if you do that on any subdomain other than www or non-www, it gives a 404. He went to test whether that was also held true for RoaldDahlFans, and to his surprise, the origin was not redirected! What the hell.

We determined that the origin subdomain was only redirecting to www on RoaldDahlFans on the homepage. Every other page on the origin subdomain would not redirect. So what’s special about the homepage for RoaldDahlFans.com compared to web-goddess? Well, web-goddess has the homepage set to show the most recent posts, but RoaldDahlFans uses a static page. I changed RoaldDahlFans.com to use the most recent posts, hit up the origin subdomain, and it did NOT redirect. But when I changed it back to a static page, it went back to redirecting.

SO – there is something in the way WordPress handles sites with static homepages that causes them to be redirected to the Site URL, even if you’re using a random subdomain. If you add anything to the path – subdomain.roalddahlfans.com/index.php, it won’t trigger the redirect. How weird is that?

Okay, so that finally explains the difference in behaviour between the two sites. My origin subdomain for RoaldDahlFans.com was going to redirect requests for the homepage to www, and there was nothing I could do about it. Which meant that when CloudFront needed to refresh its cache for the homepage, it would hit the origin subdomain… and be redirected to itself? Why wasn’t I seeing an infinite redirect loop crashing my site?

Cue another hour of poking around. The only way it wasn’t going to crash, the Snook reasoned, was if CloudFront was passing the Host header through to the origin as part of the request. I was not aware of telling it to do that, but…

CloudFront behaviour

It turns out that the AWS WordPress plugin, when it set up my CloudFront distribution, helpfully whitelisted the Host header as part of the default behaviour for the site. This is why CloudFront isn’t barfing every time the homepage cache expires.

So there you have it. What I thought would be a fifteen-minute task sent us down a rabbithole of WordPress, redirects, and content delivery network intricacies. The irony is that after all that, nothing is actually incorrect on my site! It’s all working as intended. We just didn’t know how. The only catch is that if I myself want to bypass CloudFront on RoaldDahlFans.com, I need to append /index.php when I hit the origin subdomain.

This was not how I intended to spend my Sunday… 😅

Categories:

Tags: