Riding the storm of uncertainty through the eyes of a network professional
The telecom world is one that is fraught with uncertainty. It is during such situations of uncertainty when the surge in demand from customers and load on the network increases exponentially, the true role of a network professional comes into the fray. During these moments, when the network is under stress, network professionals need to put in a tremendous effort to ensure 24×7 service continuity.
But what about ‘uncertainty,’ that might emerge from the likes of a natural calamity, or massive site outages, causes stress on the network?
Generally, a mobile network undergoes numerous amounts of changes. These could be planned changes like parameter tuning, physical optimization, new feature trial/implementation, SW upgrades, etc., most of which are targeted towards improving the network performance and end-user experience.
However, on the other side, it is the ‘unplanned’ changes that tend to throw a monkey wrench into the system. These changes could be seasonal or due to system-level fluctuations in network element(s) bringing severe degradation in terms of performance. If such an incident was to happen in the busiest hour of the day, then the impact will be felt in the business as well.
In both the cases, planned or unplanned, the action to resolve a problem could be targeted towards any domain of a network, e.g., radio, transport, core, etc. It would also require end-to-end troubleshooting and root-cause analysis.
Traditionally, network professionals manage the impact of such changes reactively, which often results in instances where the ‘mean time to acknowledge’ (MTTA) and ‘mean time to respond’ (MTTR) potentially take hours or, in some cases, days. In fact, some complex cross-domain cases can even remain unresolved.
Over the years, however, thanks to advancements in automation, network professionals have been able to evolve towards handling such issues proactively. By leveraging tools embedded with subject matter expertise, coupled with statistical methods, network professionals can bring a significant reduction in MTTA and MTTR. However, this approach is far from perfect as necessary action can only be taken after the impact on the network has already occurred.
To understand the right approach that needs to be taken, let us take a step back and delve into the life of a network planning and optimization expert involved in day-to-day performance management of the network.
Conventionally, a network professional performs network impact analysis on a set of critical KPIs and takes necessary steps to improve the network performance. A lot of effort is invested towards another round of analysis to conclude if the action had a cascading impact on any other major KPIs or not. Once the impact of the action taken is deemed successful, the analysis moves to the next set of actions. Very often, such actions are taken by multiple stakeholders to improve the performance of their respective regions/clusters/sites.
In a 4G network, there could be thousands of counters/KPIs and hundreds of parameters, making it almost impossible for a human to analyze the impact of a change of a set of parameters on all relevant counters/KPIs, end-to-end, daily. Services like video streaming and VoLTE often require cross-domain analysis to ensure seamless service continuity.
It is in such scenarios where Machine Learning (ML) starts cropping up as the answer to handling these complications from an end-to-end perspective.
However, the term ‘Machine Learning’ does have a way of creeping into such discussions. It is, in fact, almost to its own detriment at times, as machine learning is almost looked at as a work of magic, at best, or just another marketing gimmick, at its worst. Almost everyone brings it up at some point.
So, let me take a moment to defend why I bring it up now, and why I believe machine learning-driven advanced analytics can significantly solve some of the critical problems faced by network professionals.
The short two-word answer would be ‘Agility’ and ‘Accuracy.’ Allow me to elaborate.
A CSP can leverage ML models that will capture all types of network impacts consistently from the network data generated in the form of counters/KPIs and parameters on a daily basis and will help in providing actionable insights to the network professionals, which would otherwise go unnoticed.
Furthermore, these insights come in especially handy when used in planning and managing special events like the screening of a global sports event. In such an event, where the surge in traffic is expected to rise, the model will learn from the previous event’s performance and will provide critical inputs for event planning.
By the same logic, ML models will generate actionable insights when a crisis strikes and will equip network professionals to deal with any abnormal events happening in times of uncertainty. Furthermore, by leveraging ML models, network professionals can be equipped to run multiple simulations to accurately perform network impact analysis for any actions they intend to perform in the future. A few key scenarios that could be managed with better agility at any point of time, but take on an additional dimension of importance during a “network under stress” situation, would be:
- Accurately forecasting the peak traffic, resource utilization, and application usage to plan well in advance
- Optimizing and fine-tuning network parameters for capacity expansion and quality improvement in times of abnormal network trends
- Improving customer experience by performing Quality of Experience (QoE) correlation with end to end QoS for streaming services like full HD video.
- Maintaining seamless end to end VoLTE service continuity.
As they say, unique situations call for unique measures, and during times when the network is under stress, network professionals need to put a whole lot of effort just to keep the network afloat. It is an effort that often goes unnoticed, but it is one that helps keep customers connected and the company’s reputation intact. Machine Learning can take on an essential role in transforming these ‘network heroes’ into ‘network superheroes’ by equipping them with powers of enhanced agility and accuracy.
The truth is that the telecom industry is at an inflection point where decision-makers need to take a call on what capabilities they must bring into the network and how they can equip their network heroes today. These critical decisions taken now will provide benefits for years to come.
To understand the challenges faced by CSPs in offering quality video content over mobile devices