Last year, a major train operator in the Eastern US region came to us and asked to help build a system on Acquia Cloud for their train API. They wanted to open up their train schedules, arrival times, delay timings, all that information for everyone. This meant that the infrastructure behind the API had to withstand a huge flurry of requests every second, while remaining very responsive. And in the meanwhile we also had to make sure that only authorized applications can access this API, applications that this operator approved.
The requests would look something like this:
Our client had their production service API ready, however there was a desire to shield it from direct hits, to introduce an additional layer which could cache and respond to queries without bothering the backend (imagine 100 eager metro riders requesting the same status - a very easy caching task). To put this into drawings, the architecture at this point would look something like this:
To manage the API keys the client needed user accounts, content management. It is easy to see why Drupal was an obvious choice for these tasks. Performance, more specifically the responses to the service requests would be difficult to scale purely with Drupal - each request triggering a Drupal bootstrap would make many colleagues at Operations sad.
We need to introduce a cache layer, but that could not be Drupal. Obviously the same train information could be requested by different applications (different application means different API key - different request), but for the backend its the same request all the same (eg apps A and B requesting the same arrival times - the applications differ, but the information does not). The difference between the requests is only the API key passed along with the requests, so we would need to discard this once we verified that a certain API key is valid.
If only somehow we could separate the part that validates the client request from retrieving the real data from the rail backend! Thankfully Varnish, which is part of the Acquia Cloud fits this role perfectly, albeit with a little twist. The authorization step must act as some sort of router, telling Varnish that the request is valid, so the real rail backend can be accessed:
But how do we know that the authorization was a success? How do we know where to go on the rail backend? This has to be somehow embedded into the response made by the authorization app, which was built in Drupal and some custom PHP for performance.
Initially we have attempted to use ESI, returning a single ESI tag into the Drupal/PHP response. This was then intercepted in Varnish and the backend was swapped in vcl_recv, based on the request path. ESI unfortunately was not designed for this and this decision turned out to be inadequate as it also stripped the response headers from the Rail backend, which we needed to forward to the app.
So returning a 200 status with the page containing an ESI tag from Acquia Drupal does not work. What if we use a different status and intercept the success message in Varnish based on that? We could send the new URL for the Rail backend as part of the header, ignoring the response body completely.
All we had to do to set up the script to return the particular HTTP status and the new URL for the backend request, then Varnish was configured to restart the whole request process, and to direct the new request to the backend, with a different TTL than for all other requests otherwise.
Once the VCL was complete we have deployed it on the spare Varnish instance on our active - hot spare pair for the project, and began testing. As the request from the backend were simply returned with no modification, we were very easily able to verify all requirements, for instance TTL, authorization and so on.
The end result was the best we could hope for: a very happy client and a lightweight and fast stack, with Drupal storing and managing the keys, the custom PHP script validating the keys and Varnish caching and routing the requests.