Observability and Telecommunications Network Management [Part 1]

Observability January 15, 2024 William Cappelli

The border between the management of telecommunications networks and the services that they support and the management of IT infrastructures and the applications that they support has always been a porous one. One might say that they are like two dialects of the same language rather than different languages. Nonetheless, these areas, whether characterised by technology or practice, are different and have, for the most part, been served by different vendors and products. This difference has persisted even in the wake of the profound digitalisation of telecommunications networks that has been going since the year 2000 and, although from time to time, gestures towards convergence are made, they have all eventually been withdrawn.

The management of IT infrastructure and applications has, however, undergone a major revolution over the last 7 years with the abandonment of the old four-part monitoring design plan based on sampling, event records, batch processing, and pre-defined models in favour of design plans inspired by the concept of Observability lifted from Optimal Control Theory. These design plans dictate that a monitoring technology should eschew sampling, provide direct access to system telemetry in the form of metrics, traces, and logs, and surface data-driven patterns in real-time utilising machine learning-generated algorithms and other AI techniques. This revolution, one may recall, is itself the result of a radical change in the underlying IT infrastructure and application architecture, a change that rendered that old design plan obsolete.

So, it can be asked, have telecommunications networks undergone a parallel change and does that parallel change require a parallel revolution in network management? Let’s quickly review the nature of the architectural change on the IT side of the picture and then see whether or not telecommunications networks have seen anything similar. On the IT side, the change may be characterised on two levels:

The first level is that of the systems themselves. Systems, whether characterised as infrastructure or application, have become orders of magnitude more modular, distributed, dynamic, and ephemeral.
The second level is that of the data generated by these systems which is then captured by monitoring technology and used as a basis for determining what is taking place within the digital environment. The data itself has increased (and continues to increase) by orders of magnitude, as has its noise level, its dimensionality (the number of independent attributes needed to distinguish one data item from another,) and (once the noise has been eliminated,) the data set’s entropy, i.e., the amount of genuinely new information contributed by each data item in the set.

Even from this brief description, one can see how a monitoring technology built along the lines of the original four-part design plan would fail utterly in its task to present and analyse the reasons for the flow of state changes in the digital environment.

Telecommunications Transformation

Telecommunications networks have undergone a similar transformation. In fact, in many ways, the transition to packet-switched network technology brought about similar levels of increased modularisation, distribution, dynamism, and ephemerality a decade before the change took place in the world of IT. Despite this, however, there has, to date, been little modification of the technologies and practises associated with network management.

Let us take a look at those technologies and practices and attempt to understand why the telecommunications world has proven to be so conservative. Since the early 1980s, telecommunications network management has been conceptually organised around five domains or dimensions originally articulated by the OSI de jure standard: F ault, C onfiguration, A sset, P erformance, and S ecurity, or FCAPS, for short.

Fault management deals with the discovery and remediation of operational incidents and problems occurring within the network. Faults are generally seen to be the result of discrete events occurring at specific points in time and specific places in the network topology. Hence, the primary aim of fault management is to discover where the events have occurred and to fix what is found in that location so that the fault is no longer present in the network.
Configuration management deals with the placement of network components both absolutely and in relation to one another and in the setting and changing of values of attributes possessed by those components. The idea of location here can have both a physical and a logical significance.
Asset management deals with the financial attributes of the components which have been configured in the network. The idea of financial attribute here is a broad one, including both on-going costs associated with the depreciation and maintenance of a component and with accounting and charging for the contribution of a component to a service provider’s revenue stream.
Performance management deals with ensuring that the services offered over a telecommunications network are behaving as expected and promised according to service level agreements and other internal and external metrics.
Finally, security management, unsurprisingly, deals with ensuring that the network infrastructure is protected against malicious interventions and that unauthorised individuals or organisations cannot access network services.

FCAPS Must Evolve

At a first glance, the FCAPS scheme appears anodyne, a bit of common sense really, imposing no particular constraints on practitioners. A closer look, however, suggests that FCAPS is, in fact, tied to an antiquated view of the communications environment and, at best, requires supplementation if telecommunications network management is to be effective in a digital world.

FCAPS envisions a largely static centralised infrastructure composed of long-lived components that are organised in an even longer-lived topology. Components may be added or removed and the values of their attributes changed but a) these change events are relatively rare and b) take place outside of the provision of any network service. The network services themselves are seen to be entities of a completely different nature than the underlying infrastructure that supports them and, hence, their management requires a completely separate set of technologies and processes. Finally, FCAPS insists on a segregation between operational and security-related aspects of network management, effectively assuming that the consequences of malicious intervention and the consequences of operational faults can be easily distinguished early in the process of observation and analysis.

The FCAPS vision is clearly a result of an era when infrastructure was perceived as a more or less permanent capital investment whose costs significantly outweighed the costs of actually delivering a service on top of that infrastructure. Of course, when we look at the reality of the current situation, we see that the FCAPS model of the world no longer holds for the most part. As is the case for IT-oriented digital environments, the components that make up the telecommunications network have become disaggregated, and virtualised, and, as a result, have multiplied greatly in number. The components’ life-spans have, on average shorted tremendously (although not yet having reached the micro-second levels found in some IT systems) and the concept of a rigid network topology only makes sense at the lowest physical levels of the network stack. The lines between service and infrastructure have long since blurred (so many network services are now essentially services that deliver infrastructures on top of which another layer of services can be delivered.) Finally, just as with the IT realm, although culture and history still segregate security and operations-oriented management - to the detriment of both - the truth of the matter is that it is almost impossible to make an early decision as to whether or not an incident is a consequence of malicious intervention or of operational breakdown. In fact, the one dimension that has not significantly changed over the past two decades is the degree of centralisation - both with regard to the actual networks themselves and the markets within which they earn their keep. The irony, here, is that, in this regard, the IT realm has come to resemble the telecommunications network realm as the large cloud service providers have come to dominate the commercial and technical landscape.

In any case, telecommunications networks have changed in much the same way as have IT systems, so the question arises - “should FCAPS be abandoned just like the four-part monitoring design plan has been abandoned?” The answer is a partial yes. For the moment, let us lay aside issues surrounding security management and asset management. Traditional network management, as we have seen, assumes that the bulk of network behaviour, good or bad, is driven by a configuration of effectively permanent components locked into an effectively rigid topology that determines the paths by which messages and causal impacts can travel from one component to another. Fault management and Performance management are then ultimately carried out with this configuration in mind.

It should be clear that the configuration is a close cousin of the predefined data model that sits at the centre of the legacy monitoring design plan. Events and performance behaviours are deemed problematic in so far as they depart from what the configuration leads one to expect and remedies are to be sought ultimately through interventions that will alter the configuration. Just as is the case in the IT realm, if, in fact, the underlying stack is not fixed but is in constant flux, any configuration model is likely to be out of date, from the moment it is brought online. The result is, of course, a significant uptick in the mismatches between expectations and reality because any match would be purely coincidental.

Style

two-column

No results

/en_us/blog/fragments/about-splunk

/en_us/blog/fragments/subscribe-footer