Dr. Dobb's Digest June 2009
Eric Bruno is a technology consultant specializing in software architecture, design, and development. Write to us at iweekletters@techweb.com.
In considering the history of distributed computing, CIOs should focus on how companies knew that it was the right time to make a big architectural bet on the next wave of technology. This past experience could help CIOs who are being asked to consider a similar move today. In the 1970s, for instance, Reuters introduced Monitor, through which journalists entered information via dumb terminals and a mainframe computer sent it out to readers.
In the early 1990s, about the time I joined Reuters as a developer, it was taking that distributed computing concept further by building a pioneering electronic trading system, Globex, for the Chicago Mercantile Exchange, drawing on a mainframe and Windows-based PCs. As Globex's costs grew with its popularity, CME moved the Globex architecture to an even more distributed model: a pair of mainframe-class computers coupled with about 1,500 workstation-class servers running Linux and Solaris.
Now a new question is arising for CME: What role will cloud computing play in Globex and other platforms?
IT leaders face real pressures in today's economy to push the limits of technologies such as virtualization and cloud computing that extend distributed computing models. The decisions CME and Reuters faced in choosing which distributed model to adopt, and when, are much the same as the ones companies face today. One difference, however, is that economic pressures are forcing companies to consider emerging, often immature, distributed computing technologies.
Distributed computing refers broadly to applications that adhere to the client-server model, a cluster, an N-tier architecture, or some combination of them.
While there are variations on these base models, what they have in common is that they divide computing across multiple computers to achieve greater application scale and availability. Large Web sites like eBay use a combination of these models, with database and app servers clustered within each tier of the design.
With the increasing use of Ajax at the browser level, many sites have added a client-server element to the mix. As a result, large-scale distributed applications such as Google and Yahoo leverage all three computing models.
Web services take the distributed model a step further, sharing the data processing load. Since Web services are based on HTTP, it's straightforward to deploy a single service to multiple servers to share the load. This design lets developers distribute even single application components, resulting in greater scalability, more code reuse, and reduced costs. Thanks to open standards and Web services, message formats and protocols can be defined in XML, C++, or Java, then easily implemented on other platforms, reducing costs further.
But the biggest recent change to the distributed computing model is virtualization, letting IT divide a physical server into multiple virtual servers. Beyond the oft-cited hardware, energy, and space savings, using virtualization to divide a physical server into virtual ones helps solve the problem of getting the most value from multicore computers.
For instance, even if individual software components aren't yet written to take advantage of multicore architectures, developers should still consider using virtualization to run multiple software components on one physical computer as though they were each running on separate computers. This maintains the security and robustness of the software, while squeezing the most value from multithreaded-capable multicore computers.
As a result, individual application components can execute on multiple virtual servers, all running on a single physical server. One approach is to pair applications that communicate as much as possible, thereby eliminating network-induced latency that might be present if those apps are separated.
Combining virtualization with the other models of distributed computing can result in a cost-effective, scalable architecture.
But effectively monitoring virtualization requires administrator oversight, a time-consuming process that companies should look to minimize using configuration management and other life-cycle tools.
Instead of manually building configurations, for instance, administrators can link multiple templates containing the operating system, and build scripts and applications to a new virtual machine volume. Companies that need to speed up the process also should provide developers with self-service facilities for submitting and acting on virtual resource requests, instead of formal submission processes and the delays they entail.
While virtualization is the here and now of distributed computing, cloud computing is its future. Although its full impact has yet to be realized, it's becoming clear that cloud computing will be the service-delivery option of choice for distributed applications, thanks in part to today's CPU, memory, and bandwidth capabilities. Once vendors can combine a high level of developer control, a reasonable and flexible cost model, and compelling services, cloud computing will change the face of distributed computing.
When will that happen for enterprise systems? Some of the most important factors to watch are design decisions for security, data protection and recovery guidelines, and application architectures. For instance, developers working with cloud computing need to contend not only with familiar dedicated security problems such as firewalls and intrusion detection, but with security in a shared cloud environment as well. Heavy lifting in these areas is being done by groups such as the Cloud Security Alliance, which has released its "Security Guidance for Critical Areas of Focus in Cloud Computing," a best-practices document that provides security guidelines for enterprises cloud computing.
Perhaps most importantly, as with other distributed computing models, cloud computing isn't an all-or-nothing proposition. Hybrid models of dedicated/cloud resource implementations may be just the thing for applications with bursty usage patterns.
With Globex, when CME distributed the architecture from a central mainframe to a pair of mainframe-class computers coupled with workstation-class servers running Linux and Solaris, it was able to handle more than 2 million transaction requests daily, with response times reduced to 150 milliseconds or less. To maintain an advantage in terms of cost, customer response time, and overall reliability, CME relies heavily on open source software and low-cost, increasingly powerful x86-based servers. CME also uses Novell's ZENworks configuration management software for distributed-application management and as a step toward introducing virtualization into the architecture. What's next? Like a lot of companies, CME has its eye on cloud computing, but it doesn't have a detailed plan it's ready to share since, like a lot of companies, it's taking small steps while waiting for cloud standards to mature.
Another distributed computing project worth examining is eBay. Although users sell millions of dollars of goods through the system daily, eBay's product is a Web site that it must maintain with the most efficient software architecture possible. That architecture has grown to become a three-tier design with business logic, presentation code, and database code separated into different physical tiers.
Over time, servers have been continuously added to each tier, including the database, which has been partitioned and distributed among many servers across multiple geographic locations. The application tier has been rewritten in Java to enable even more computing distribution. Entire feature sets -- search, indexing, transaction management, billing, and fraud detection -- have been moved out of the core application and into separate pools of servers. The architecture has moved to this level of distribution so that eBay can keep up with growth in the number of users and active auctions. Like CME, eBay hasn't said of how it will use cloud resources, but it's a founding member of the Cloud Security Alliance.
As companies consider moving distributed computing to cloud resources, Forrester Research's Ted Schadler advises that they launch pilot projects that have milestones for measuring use, renegotiating pricing, and increasing employee training.
The cloud is tomorrow's promise, but biggest current gains come from increasing levels of speed and parallelization of low-cost servers that let them reduce costs and meet increasing demands. There's no magic in this; higher demands have been placed on networking and communication technologies. These demands are being met with software advances in the form of enterprise service buses, Web services, and virtualization.
Tools for building highly scalable applications based on the distributed computing design methodologies are readily available, many as open source. But open source isn't a panacea when it comes to building distributed systems. While freely available open source software can cut licensing costs, it also can drive up staff costs because of the special skills and knowledge required to work in an environment where formal technical support may not be available.
An example of how developers can use open source to extend distributed computing is detailed below. Open source application servers such as Sun Microsystems' GlassFish and the Sun Cloud Computer Service let developers write and integrate application components in Java, Python, and Ruby. Sun -- at least before its planned acquisition by Oracle -- said its Sun Cloud would be a public cloud for developers. To further attract developers, Sun is also providing a set of REST-based open APIs published under the Creative Commons license, letting anyone use them at no cost. Developers can immediately deploy applications to the Sun Cloud, by leveraging prepackaged virtual machine images of Sun's open source software.
Say your developers want to build an application using a set of Web services written in Ruby, application logic written in Java and PHP, presentation logic in JavaServer Pages, and running on OpenSolaris (or Linux). To add both desktop and mobile device support, they write a JavaFX application as an alternate presentation layer. With a combination of XML and an enterprise service bus, they can reliably tie all these components into one scalable application that runs on a farm of cheap x86 servers.
To illustrate, the following sample application -- a company's research portal for investment bankers -- combines HTML, an XML Web service, Ajax/JavaScript, Java, and JavaFX. (Source code that implements this application is available at informationweek.com/1231/ddj/code.htm.) The application consists of three Web services:
The key to this architecture is that not only is it distributed by design, but it's also based on both open standards and open source. Thanks in part to XML and HTTP, the result is platform and language independence for each component at every tier. Some of the components, such as the Quotes and News services, use a messaging protocol (JMS) to distribute and scale their internal components. For instance, the Quotes service is distributed across multiple nodes for redundancy, and uses a reliable message queue to coordinate and distribute requests amongst them
Multiple Java Servlets are deployed to serve requests for quote data over HTTP. This data is retrieved from an internal cache, if available and up to date. Otherwise a request to a third-party provider is enqueued on a reliable JMS queue. Separately, one of the redundant Quote Requestor components will dequeue the request, retrieve the data from the provider, and then return it to the client in the form of XML. This distributed architecture provides scalability to the quotes service (multiple Quote Servlets share the load), and reliability (the queue has guaranteed delivery, and multiple queue readers ensure uptime).
Whether companies are implementing mission-critical service-based applications that span virtual servers around the globe or prototyping a simple application based on Web services and widgets like this, the underlying technology is fundamentally the same. Of course, the scale and complexity are different. But with advances in CPU, memory, and bandwidth capabilities, the promise of distributed computing is on the way to becoming fully realized.