It was coming down to the wire: the new Spreedly site was launching in less than 48 hours, and I really wanted to be able to try out the way we were going to be stitching two different applications together on one domain. It was the one piece of the puzzle that we couldn’t “pre-deploy” to production, since turning it on by definition would launch the site publicly everyone. So I started hacking together a Rack-based proxy (more about that in the future) to simulate the production setup, and within an hour or so I had something that was mostly working.
The problem was the “mostly” part. The proxy was working, the right things were loading on the right paths, but I couldn’t log into the proxied application. Logging in was actually working and redirecting me into the app, but then I was immediately being redirected back to the login page. It was like I couldn’t write to the session. Maybe a cookie problem?
I do most of my browsing with Chrome, and am generally a fan of the developer tools it ships with. So I started poking, going back and forth between the server logs and the browser’s request log, trying to figure out what was going on. It didn’t take long to confirm that the server was definitely writing out to the session correctly, but that for some reason the browser wasn’t returning the right session information back to the browser. I also confirmed that the site worked fine when not proxied, and it was something proxy related that was breaking things.
And then I spent multiple hours spinning my wheels, trying to figure out what was different between the working direct requests and the broken proxied requests. I tried looking at requests in Chrome for a while, and then I switched to Firefox and tried a few different tools there as well. It was getting late, and I was burning out, and I was making no progress, and I should just quit…
But my conundrum was that while I was pretty sure this was a problem that was specific to my hacked up proxy, I wasn’t 100% sure, and I didn’t want to flip the switch on Monday in production only to find out that nobody could log in. But finally, with no real progress to show for a few hours of debugging, I gave in and gave it a rest.
And sure enough, while I didn’t figure the answer out after quitting, I did realize I’d been approaching my debugging the wrong way. Debugging is an art, and I’d been doing the equivalent of trying to play a violin with a hack saw. What I needed was a tool that would let me textually compare the headers of the working and broken requests.
curl, which I should’ve pulled out right away. I knew at this point that something was up with the cookies, but I had no earthly idea what. So I dug into the curl man page and started reading through the options (there are a lot of them) looking for how I could see what was happening with the cookies. OK, actually I just ran
man curl | grep -A10 cookie since I didn’t have time to read the whole tome.
Working with what I learned from the man page, I first got myself set up so I could easily run the same request against the (working) unproxied app and the (broken) proxied app. Then I used curl’s handy
-c option to have it write out a cookiefile when I made a request:
$ curl \ -i \ -c cookies.working \ -d "login=bob" -d "password=test" \ http://workingapp/login
Nothing magic here;
-i tells curl to print out the headers along with the response,
-c stores the cookies in a curl-formatted file, and the
-d switches pass in the login information (and make
curl do a POST). I then repeated this against the broken version of the app, storing the cookies in a different file. By looking at the headers and responses from the two requests, I could see that both redirected me into the app, indicating that as far as the server was concerned, I had auth’d successfully.
Next step was to verify that curl would re-present the cookies correctly, and that it could reproduce the login issue. To do that I simply passed the written out cookie file to the app and noted whether I got redirected (indicating a missing session) or a 200 OK:
$ curl \ -i \ -b cookies.working \ http://workingapp/innerpage
-b switch does the magic here, reading the cookie file written out by the previous command and passing the cookies to the app just like a browser would. And sure enough, the working version gave me back a 200 and the broken version redirected me back to the login page.
Now to dig into the cookie files themselves and see what was different about them! It took a few minutes of staring and poking, but very quickly I noticed that the broken version only had one cookie in its file, while the working version had written 4+ cookies. How was that happening?
Long story short, rack-proxy uses Ruby’s Net::HTTP heavily, and Net::HTTP tries to treat headers like a hash. But headers aren’t a hash; you can pass multiple headers of the same type, and that’s exactly how Set-Cookie works. Net::HTTP papers over this by joining with a comma the values of header keys there’s more than one. And thus I was ending up with one big invalid cookie rather than the four cookies I should’ve had.
Now it was as simple as figuring out how to get Net::HTTP to give me the header values individually and submitting a pull request with a fix. I locked rack-proxy to the fixed revision in our Gemfile, and everything just worked. And while this problem wouldn’t have affected us in production (since nginx is doing all the proxying for us), I went into the launch on Monday morning much more confident since I knew this wouldn’t bite us when we flipped the switch.
Moral of the story? Browsers are the wrong tool for debugging HTTP. You need to be able to see and textually compare what’s traversing the wire, and even though the developer tools provided by modern browsers can give you a peek, they often obfuscate issues rather than making them obvious. curl and other text-based tools like it are the best way to see what’s really happening and quickly track down the problems that arise with HTTP.