Web Services 101

January 26, 2016
13 minute read

Web developers are discovering that Web services have become critical to interacting with third parties -- whether on Web sites or in applications.

Many Drupal developers now have the need to expose content and features on their site via an API. Fortunately, Drupal 8 now has this capability in core. And some contributed modules are attempting to make it even better.

In this 4-part series, Larry Garfield, from the Chicago-based Web strategy, development, and design firm Palantir.net, and Kyle Browning, from Acquia, will discuss what you can do with Web services in Drupal 8. This includes capabilities that are in Drupal 8 core, and those that are in contributed modules.

Topics covered will include what is new with REST in Drupal 8, and what it aims to solve.

But first, Larry Garfield takes a step back and provides a broad view of what we mean by "Web services."

Introducing Web services

One of the major new features people often talk about in Drupal 8 is Web services. Drupal 8 is so much better at Web services! Drupal 8 enables Web services!

That's the tagline, but what does that mean in practice? What exactly is it that Drupal 8 enables, and why is it such a big deal?

In this series, we'll cover what Web services actually are (and are not), how Drupal 8 core makes them first-class citizens of the Drupalverse, and multiple ways to leverage that new capability for both developers and site builders.

But first, there's actually quite a bit of confusion about what exactly this marketing buzzword means, so let's take a moment to clarify what exactly "Web services" means in 2016.

The Internet

"The Internet" is, roughly, defined as the public global TCP/IP network. It is a loose confederation of networks and computers all around the world, all of them communicating with each other over the TCP/IP protocol stack. All kinds of information is exchanged on the Internet, in all sorts of ways. In general, a "server" opens a "port," which means it is simply a computer waiting for someone to contact it at its IP address on a given channel. Each channel is identified by a port number and other computers, called "clients", can connect to that address and port to make a connection. Once that connection is established, the two computers can talk back and forth pretty much however they want.

(There is also UDP, which unlike TCP does not have an established connection but is otherwise the same concept. But that's getting too far into the weeds for our purposes.)

The Web

A subset of that Internet traffic is "the Web." Roughly speaking, "the Web" is traffic over the Internet that uses the Hypertext Transfer Protocol (HTTP). HTTP is one particular set of rules for how two computers talk to each other after they've established a TCP connection. HTTP is usually setup to use port 80 (or 443 if it's encrypted), but that's not a hard requirement. Anything that transfers over HTTP is, technically, "the Web". That includes HTML pages, CSS files, Javascript, images, ZIP file downloads, or anything else.

It's important to note, therefore, that the Web is not the whole Internet. In fact, it's not even half the Internet. Email servers communicate over a protocol called Simple Mail Transfer Protocol (SMTP), which is not part of the Web. FTP (File Transfer Protocol) is not part of the Web. NNTP (Network News Transfer Protocol), SSH (Secure SHell), IMAP, POP, and a whole host of other fancy acronyms are all other alternate networks built on top of the Internet (TCP/IP), and live next to the Web (HTTP).

Web services

As its name suggests, HTTP was originally designed to transfer hypertext documents, specifically HTML pages, from a server to a web browser on someone's desktop computer. Over time, though, people realized that they could send all sorts of other types of files over HTTP (and thus over "the Web"). There are many reasons to do so. HTTP is, compared to many alternatives, a fairly simple protocol. It also has a lot of really nice features baked in, such as caching and validation. There's no central authority which means anyone can setup a client or server and it "just works." HTTP uses ad-hoc, as-needed connections, which means it scales extremely well. Perhaps most importantly, though, many corporate networks in the late 1990s blocked nearly all outgoing traffic in the name of security, except for HTTP over port 80. That meant for many applications, using HTTP was the only way they could get through corporate firewalls.

For various reasons, then, developers started sending messages back and forth over HTTP that were not HTML pages destined for browsers, but arbitrary data intended for other programs to read and respond to. Collectively, these are known as "Web services," that is, a "service" that a program can connect to "over the Web" and do stuff. Strictly speaking, browsers and HTML pages are a subset of that (the service being a web server, the client being a browser, and "do stuff" being "send an HTML page"), but in practice a "Web service" has come to mean "doing stuff over HTTP that is not showing an HTML page in a web browser."

That leaves a whole host of options. Various formal or informal specifications for communicating non-HTML-stuff over HTTP sprang up, such as SOAP or XML-RPC. Those are both web-service formats (essentially alternatives to HTML that mean different things), but only examples. These days both are rarely used, in favor of a variety of more loosely defined approaches that fit better with the way HTTP works.

REpresentational State Transfer

One particular style of Web service approach is called REST, short for REpresentational State Transfer. REST is not a specific API format per se; rather, it is an abstract model of how an API could be designed. HTTP 1.1 itself was deliberately written to support a REST-style architecture, but not everything that happens over HTTP follows that architectural style.

The RESTful architectural style includes a number of features:

It is a client-server architecture; any client can connect to any server over a known, specified protocol.
It's stateless; Or rather, the server's knowledge of the client is stateless. Every time a client connects to a server, the server starts with no knowledge of any prior communication. Keeping track of state or history is the client's job. That's part of what makes HTTP so scalable.
Responses are cacheable; because communication is stateless, an identical request should, in most cases, return an identical response. We can rely on that fact to allow either a client or an intermediary server to cache responses, vastly improving performance.
Uniformly identified resources; every "resource" in a RESTful network is uniquely identified by some address, and that address is unique to that resource. On the Web, this means a URL, or as it's generally called formally: a URI (Uniform Resource Identifier). The URI, for example: http://www.drupal.org/download has a strict 1:1 relationship with "the download page of Drupal."
Resources are represented in some format, and that's how clients talk to the server. That format could be HTML, but also a JSON string, or a PDF, or an SVG image. All are just representations of the underlying platonic "resource thing."
Messages are self-descriptive; what the message wants to do, what representation it's using, and so on are contained in the message itself rather than relying on some external context or assumption. (This goes back to the stateless part.)
Hypermedia links; A server informs a client what it can do to a given resource by means of links to other resources that are included with the resource. The client knows only what the links tell it, and has no implicit knowledge of what else it can do.

Or in graphical form:

a diagram of internet services

That sounds like a lot of heavy mumbo-jumbo, but in practice we've all seen this pattern a million times. An ordinary web page follows every single one of those requirements: It exists at a unique address (URI), it's accessed in a stateless fashion by a client (browser) in a representation (HTML) using a self-descriptive message (HTTP), and contains links, for example <a>tags</a>, to other resources that the client can simply traverse (click on), and in most cases it's cacheable.

Your website is, by design, a valid REST service!

Most Web services in widespread use today do not fully follow all the principles of REST, and even many web pages go out of their way to break one or more assumptions, usually to their detriment. Many early Web services (especially those based on SOAP or XML-RPC) eschewed that design entirely, and even today many Web services violate one (or many) of the design principles of HTTP. The common way of classifying Web services is with something called the Richardson Maturity Model, which has 4 levels of API:

POX, "Plain Old XML"; this generally involves sending every message as a POST request and treating it as an RPC call. In essence, HTTP is adding no value at all at this point. (This applies even if the message body is JSON. Just using JSON doesn't make an API RESTful.)
Resources; every resource has a unique URI.
HTTP Verbs: When communicating over the API, use the methods built into HTTP correctly: GET (for retrieving a resource), PUT (overwrite a resource), DELETE (remove a resource), and POST (send instructions to a resource, usually a create command or form submission).
Hypermedia; including links between resources and using those to drive the interaction.

In practice, very few APIs in the wild reach all the way to level 3 and do it well. In fact, not all need to. Not all Web service APIs should, necessarily, be RESTful. However, HTTP provides such useful features when you let it that I would argue all APIs should strive to, at least, reach level 2; unique resources at their own address and use HTTP verbs correctly.

Sadly, in practice, many people casually use "REST" to mean any Web service that isn't SOAP or XML-RPC. That usage has become so common that many developers have started using the phrase "Hypermedia API" to refer to an API that reaches all the way to level 3 of the Richardson Maturity Model, that is, what REST meant originally.

In practice, if someone starts talking about a REST API make sure of what they mean; do they just mean a Web service that isn't XML, or do they really mean unique resource URIs manipulated through HTTP verbs? A good hint is the word "endpoint." There is no such thing in REST. Endpoints are an RPC concept, and are more akin to a function name. REST doesn't expose function names, it exposes resources, which are more akin to data objects.

An API with RPC endpoints is not a bad thing; in fact, that's often the best way to solve a given problem. But that API is not, by definition, RESTful.