A distributed system is a computing environment in which various components are spread across multiple computers (or other computing devices) on a network. These devices split up the work, coordinating their efforts to complete the job more efficiently than if a single device had been responsible for the task.
Distributed systems are an important development for IT and computer science as an increasing number of related jobs are so massive and complex that it would be impossible for a single computer to handle them alone. But distributed computing also offers additional advantages over traditional computing environments. Distributed systems reduce the risks involved with having a single point of failure, bolstering reliability and fault tolerance. Modern distributed systems are generally designed to be scalable in near real-time; also, you can spin up additional computing resources on the fly, increasing performance and further reducing time to completion.
Historically, distributed computing was expensive, complex to configure and difficult to manage. But thanks to software as a service (SaaS) platforms that offer expanded functionality, distributed computing has become more streamlined and affordable for businesses large and small. As a result, all types of computing jobs — from database management to video games — use distributed computing. In fact, many types of software, such as cryptocurrency systems, scientific simulations, blockchain technologies and AI platforms, wouldn’t be possible at all without these platforms.
In this article, we’ll explore the operation of such systems, the challenges and risks of these platforms, and the myriad benefits of distributed computing.
What Are Distributed Systems: Contents
Distributed systems have evolved over time, but today’s most common implementations are largely designed to operate via the internet and, more specifically, the cloud. A distributed system begins with a task, such as rendering a video to create a finished product ready for release. The web application, or distributed applications, managing this task — like a video editor on a client computer — splits the job into pieces. In this simple example, the algorithm that gives one frame of the video to each of a dozen different computers (or nodes) to complete the rendering. Once the frame is complete, the managing application gives the node a new frame to work on. This process continues until the video is finished and all the pieces are put back together. A system like this doesn’t have to stop at just 12 nodes — the job may be distributed among hundreds or even thousands of nodes, turning a task that might have taken days for a single computer to complete into one that is finished in a matter of minutes.
There are many models and architectures of distributed systems in use today. Client-server systems, the most traditional and simple type of distributed system, involve a multitude of networked computers that interact with a central server for data storage, processing or other common goal. Cell phone networks are an advanced type of distributed system that share workloads among handsets, switching systems and internet-based devices. Peer-to-peer networks, in which workloads are distributed among hundreds or thousands of computers all running the same software, are another example of a distributed system architecture. The most common forms of distributed systems in the enterprise today are those that operate over the web, handing off workloads to dozens of cloud-based virtual server instances that are created as needed, then terminated when the task is complete.
Distributed systems are commonly defined by the following key characteristics and features:
Distributed tracing, sometimes called distributed request tracing, is a method for monitoring applications — typically those built on a microservices architecture — which are commonly deployed on distributed systems. Distributed tracing is essentially a form of distributed computing in that it’s commonly used to monitor the operations of applications running on distributed systems.
In software development and operations, tracing is used to follow the course of a transaction as it travels through an application — an online credit card transaction as it winds its way from a customer’s initial purchase to the verification and approval process to the completion of the transaction, for example. A tracing system monitors this process step by step, helping a developer to uncover bugs, bottlenecks, latency or other problems with the application.
Distributed tracing is necessary because of the considerable complexity of modern software architectures. A distributed tracing system is designed to operate on a distributed services infrastructure, where it can track multiple applications and processes simultaneously across numerous concurrent nodes and computing environments. Without distributed tracing, an application built on a microservices architecture and running on a system as large and complex as a globally distributed system environment would be impossible to monitor effectively.
A software design pattern is a programming language defined as an ideal solution to a contextualized programming problem. Patterns are reusable solutions to common problems that represent the best practices available at the time, and while they don’t provide finished code, they provide replication capabilities and offer guidance on how to solve a certain issue or implement a needed feature.
When thinking about the challenges of a distributed computing platform, the trick is to break it down into a series of interconnected patterns; simplifying the system into smaller, more manageable and more easily understood components helps abstract a complicated architecture. Patterns are commonly used to describe distributed systems, such as command and query responsibility segregation (CQRS) and two-phase commit (2PC). Different combinations of patterns are used to design distributed systems, and each approach has unique benefits and drawbacks.
Distributed systems offer a number of advantages over monolithic, or single, systems, including:
Distributed systems are considerably more complex than monolithic computing environments, and raise a number of challenges around design, operations and maintenance. These include:
The challenges of distributed systems as outlined above create a number of correlating risks. These include:
Administrators use a variety of approaches to manage access control in distributed computing environments, ranging from traditional access control lists (ACLs) to role-based access control (RBAC). One of the most promising access control mechanisms for distributed systems is attribute-based access control (ABAC), which controls access to objects and processes using rules that include information about the user, the action requested and the environment of that request. Administrators can also refine these types of roles to restrict access to certain times of day or certain locations.
Distributed systems are used when a workload is too great for a single computer or device to handle. They’re also helpful in situations when the workload is subject to change, such as e-commerce traffic on Cyber Monday. Today, virtually every internet-connected web application that exists is built on top of some form of distributed system.
Some of the most common examples of distributed systems:
Distributed deployments can range from tiny, single department deployments on local area networks to large-scale, global deployments. In addition to their size and overall complexity, organizations can consider deployments based on the size and capacity of their computer network, the amount of data they’ll consume, how frequently they run processes, whether they’ll be scheduled or ad hoc, the number of users accessing the system, capacity of their data center and the necessary data fidelity and availability requirements.
Based on these considerations, distributed deployments are categorized as departmental, small enterprise, medium enterprise or large enterprise. While there are no official taxonomies delineating what separates a medium enterprise from a large enterprise, these categories represent a starting point for planning the needed resources to implement a distributed computing system. Distributed systems can also evolve over time, transitioning from departmental to small enterprise as the enterprise grows and expands.
Modern computing wouldn’t be possible without distributed systems. They’re essential to the operations of wireless networks, cloud computing services and the internet. If distributed systems didn’t exist, neither would any of these technologies.
But do we still need distributed systems for enterprise-level jobs that don’t have the complexity of an entire telecommunications network? In most cases, the answer is yes. Distributed systems provide scalability and improved performance in ways that monolithic systems can’t, and because they can draw on the capabilities of other computing devices and processes, distributed systems can offer features that would be difficult or impossible to develop on a single system.
This includes things like performing an off-site server and application backup — if the master catalog doesn’t see the segment bits it needs for a restore, it can ask the other off-site node or nodes to send the segments. Virtually everything you do now with a computing device takes advantage of the power of distributed systems, whether that’s sending an email, playing a game or reading this article on the web.
Distributed systems are well-positioned to dominate computing as we know it for the foreseeable future, and almost any type of application or service will incorporate some form of distributed computing. The need for always-on, available-anywhere computing is driving this trend, particularly as users increasingly turn to mobile devices for daily tasks. Looking ahead, distributed systems are certain to cement their importance in global computing as enterprise developers increasingly rely on distributed tools to streamline development, deploy systems and infrastructure, facilitate operations and manage applications.