There's a big hole in reusability on the web. An entertaining statistic - not the most accurate but still fascinating - was generated by Simon Wardley from a Twitter poll. He calculated that basic user registration had been written over a million times. The average developer had written user registration about 5 times. I'm sure you've built it a few times yourself.The solution to this would be to create a reusable 'package' for user registration. So why isn't there such a thing? It's a really interesting question, because if you could fix it, you could make web development so much faster.
Three kinds of application
The first problem is that user registration has both front end and back end elements. So you would need a package that included both of these. I think the closest to that would be a Wordpress plugin. PHP (on which Wordpress is built) mixes markup and processing on the back end. Although PHP doesn't have a neat way of packaging a group of pages and back end, you can do this in Wordpress. The problem though is that then you have to do everything else on your site using Wordpress. Empirically, people are still building registration over and over again, so the numbers tell us that's still quite a problem. But even so, this is key to the huge success of Wordpress on the web.
Prebuilt application, plugin components
The limitation Wordpress has is that it's a container that contains plugins. Whatever you build with Wordpress, it will still be a Wordpress site containing plugins, bits of your code, and your content. It won't be a mobile app (although it could get close these days). It won't be a data-backed web application. With Wordpress or with other conventional Content Management Systems, or with systems like Salesforce or Sharepoint, what you have is basically an application which you are able to extend and vary with plugins. You're limited in what functionality you can reuse with plugins, and because the application around them is fixed, you're limited in what the final application can do.This is the least flexible kind of environment I would identify for reuse. There are two other more flexible types.
Custom application, library components
The next is a module library for a programming language. The classic example I'd use here is NPM, the Node Package Manager, which has a huge range of almost entirely open source packages for programming in Javascript. Most programming languages have some equivalent. NPM itself reports the staggering statistic that its code makes up 97% of web applications which use it. Which if you think about it means that it makes those applications 33 times cheaper to develop.Clearly this is a very successful model of reuse. It does have some limitations. A code package clearly can only be used in the environment of the language in which it is written. However this is an environment which will let you build almost anything.Interestingly, despite being much more flexible in this way than something like Wordpress, I've never seen a software package for a whole user registration process. The reason is that packages tend to be either for front end or back end. More generally, there's still (even if it's only 3% of the total) quite a bit of custom coding required to glue code packages together, and a lot of this glue is very repetitive. This is mainly because code packages need a frame built around them to turn them into applications.
Custom application, application components
On to the next reuse environment. This is an environment for composing small applications into larger ones. A well-known example of this is shell scripting in Unix/Linux. It lets you script Linux commands, which are mostly little applications, into a more complex command i.e. application. These scripts are also a very successful model of reuse, having been used widely for decades. They are still in 2020 the most effective way of automating administrative work, a testament to how powerful they are. I don't know if it's ever been done, but I would guess the ratio of the lines of code in a script to the lines of code required to write the script's function from scratch would be better than 1:100.
Pipelines
If we look at the characteristics of Bash scripts, one remarkable property of them is how much work can be done with how little scripting. The model of composition they use is involves creating . This is a very common idiom widely used in many fields: the key idea is to have a range of functional components with a common interface between them. Obviously plumbing is the source of the term and plumbing components work very much like this. Electrical circuits, even lego bricks, make use of this concept. Its value is to create the highest possible degree of interoperability and interchangeability between components since because of the uniform interface, any group of components can be connected together. And also the new component formed by connecting others again has the same interface as an individual component itself.
The 3 models of reuse
So to summarise we have 3 models of reuse:
- Application with plugins as components
- Application built from library components
- Application built from application-components
My argument here is that the last of these 3 is the most powerful. It is much more flexible and general than the first, because you can build any application rather than modifying one existing application by choosing plugins. It requires less effort and expertise than the second, because you don't have to write much code to glue everything together and to provide the wrapper needed to take a library into being an application. However you retain the flexibility of being able to build anything, especially when you allow for the custom coding of constituent applications. Pipelines are a key means of making it possible to compose applications like this.
Pipelines on the web
This all suggests that if there were a way of building pipelines on the web, and composing services to form bigger services, this could be a powerful way of approaching building for the web. The web is built on the client-server model. The client application (often a web browser) makes a request to a server which hosts the service, and receives a response. Requests and responses are both messages in the HTTP protocol.With only a minor restructure, this fits perfectly into the pipeline model because you have a common interface between your functional units which is the HTTP message. By sending the response from one service as a request to the next, you have a pipeline. And like with all good ideas, this has been done before. In fact with some success, but as yet in a very limited way.
Yahoo Pipes
This popular and innovative service was born from the enthusiasm for 'mashups' in 2007, combining data from multiple web sites and services. The service was directly inspired by Unix pipes (see the about this). It created pipelines from single or multiple sources on the web and supplied built in processors to generate outputs. Sadly shut down in 2015, it inspired services such as IFTTT and Zapier which are popular today. of another service, import.io, describes this change.While Pipes was a successful tool, it was limited in how it fulfilled the possibilities of pipes and application composition on the web.
- While it could take input from the web, it couldn't output to the web or process using web services
- While it had the concept of pipes, it didn't let you use them to link web services.
- It was a closed system, not open to the web.
IFTTT and Zapier
These platforms which are widely used today work by creating event driven pipelines between services. They often form part of the back end functionality of no code projects. They wrap third party services with a common input/output interface, allowing the building chains of processors to for instance trigger SMS or emails when a certain log message arrives in a logging services. While these services create pipelines out of web services, they still have limitations:
- They can connect existing services, but they can't be used to create new APIs or services.
- The wrappers around services are not open and based on HTTP, but closed and only connectable within these platforms.
A possible reason for the limitations of these services is the main difficulty with pipelines on the web: latency. Latency is the time between a request and the first arrival of the response. When we get to tens or hundreds of services in a pipeline, this delay builds up to the point where the pipeline is extremely slow. By containing the pipelines being built within the platform, these systems minimise this problem as latency is tiny within the same service. If they were open and the pipeline connections were being made between different services on the web, the much greater latency would become more of an issue.
Serverless functions
Another model which comes close to the ideal of pipelines in the web is the idea of serverless functions such as AWS Lambda. In fact the 'serverless' aspect of these functions is not the interesting part in relation to this. The concept of a function which transforms a web request into a web response and which is published on the web and accessible at a URL is a core element of what I am proposing. Many have expected serverless functions to form a basis for reuse on the web, however this hasn't materialised to any real degree. This is due to these limitations of serverless functions:
- While latency between serverless functions within the same cloud platform is low, they have a so-called 'cold start' delay. The platform on which these functions run has to start the code for the function on a physical server every time it is called. It then keeps that code around for a short while before removing it again: if another request for that function comes in during that time it can immediately satisfy it, but otherwise it has to pay the startup cost again. This delay while counted in milliseconds can be signficant and is quite variable. A chain of serverless functions could create a very significant delay.
- Another issue with serverless functions calling one another is that they are paid for on the basis of the time they are running. When one invokes another and it has to wait for the response, both are active and being paid for. This is less of a problem for building pipelines as a pipeline manager can call one function, get the result, end the function and send the result to the next etc.
- The Yahoo Pipes based platforms all provide a number of internal basic services for e.g. transforming data between one step and the next. There's nothing of this kind for Serverless functions as they were never really conceived of as an environment for composition.
- Setting up a serverless function takes a signficant time and while AWS has a service called Step Functions which helps with composition, working with serverless functions is not nearly as simple as the drag and drop style functionality provided by Yahoo Pipes and friends.
The new model
Combining the best aspects of these prior examples and adding some new ideas leads to a new model of pipelines for the web which resolves their limitations and brings this concept to its full power. A name for this model is a service space.
Runtime
The functions of this new model are provided by a runtime, i.e. a piece of code which can be installed on different servers with different characteristics, or even run within a serverless function. This means it's possible either to both provide a platform which manages the runtime and the servers on which it is running (making it a serverless service), or to control how and where the runtime runs yourself, adding more flexibility.
Multiple services on one server
The runtime hosts a number of services on one server in one process. These services have internal routing which lets them message each other with zero overhead. Latency cost only exists for calls from one server to another. This greatly reduces the issue of latency in a complex service pipeline. The services which exist, the url paths on which they are accessed, and their configuration are specified in a configuration file.
Services provide an HTTP interface
The interface to every service is that it receives an HTTP message and emits another HTTP message, allowing them to be pipelined together. HTTP messages are standarised somewhat to improve interoperability. The key standardisation is on JSON as the data representation. HTTP message can then have essentially 3 types of body: binary data, text data or JSON.
Services are hosted on URLs and open to the web
Every service on a server is configured on a URL within that server's domain, and so can be called across the web. Authentication is built-in and can be used to restrict access where required. This means it's possible to call any service or pipeline made of services like any other service on the web.
Referential transparency
The model provides this useful quality of a distributed system which means that if you have a reference to a service (which is its URL), you don't care about where it is running. This is provided by a simple built-in service that simply forwards a web request from one server to another. This makes it possible to move a service on a runtime on one server to the runtime on another server simply by replacing it with this forwarding service which means the service you moved can still be found on the same URL. This ability to transparently distribute services where you like simplifies scaling and helps to run workloads on appropriate infrastructure.
Library of core and custom-built services
The model requires that you have a central public library of service code. A runtime can pull the code for any service it needs to run from this library. The runtime itself contains a core set of services that provide for such things as constructing pipelines from other services, storing data, authentication etc. The library can be extended by user contribution. Private libraries can exist for teams to share their service code.It will be possible for all or nearly all the services needed to be library services, just leaving the job of configuring the services themselves.
Another desirable property of this platform is that it should define consistent APIs to common functionality. For instance, the basic API for data storage should be that data is written to a URL via a PUT request, then read back from the URL via a GET request. The best way to structure this is that a service exists to provide this interface, into which you then plug drivers for different underlying data sources, whether it be file storage, a relational database, a NoSQL database etc. You can then access the more powerful features of the underlying data source such as querying or search though a separate service.
User interface
Services contain their own documentation and advertise which consistent APIs they provide. This then enables a modular user interface which is built of modules which provide user interaction with those APIs, as well as overall configuration for runtimes. As an example of a consistent API, many services will store files and data for their use in different ways, and expose these on URLs. The UI should allow those files and data to be navigated, edited, created and deleted.
Discussion and examples
As part of creating and investigating this model, a runtime named Restspace has been created (). The documentation can give an idea of how a platform like this can work in practice. It's a work in progress and lacks certain aspects of the model as yet.It's striking how fast it is possible to build a wide range of web services, APIs and websites using a system like this.So how do you reuse user registration? Here's a way it could work with Restspace.
- A special service mediates changes to the configuration of each server. When a service is added via this service, it ensures any other services it depends on are also present in the configuration.
- Adding user registration simply involves simply adding a user registration service.
- This provides a url on which there is a script file which builds the registration user interface on the browser.
- As dependencies of the user registration service we have the authentication service and the user store service which are added if necessary when the user registration service is added.
- The script knows where it can find the user store (which includes a JSON Schema which specifies the fields of a user record) and adds a new user record when a site user registers.
Conclusion
A number of past and present web platforms have attempted to foster reuse and composition of web services, but have had partial success. The most successful approaches to reuse involve composition of applications. In the realm of the web, composition is best done by the creation of pipelines of web services. A system which fully realises the possibilities in this space needs a number of elements and capabilities in place as listed above. I propose the way forward is the model outlined in this article.
Previously published at