As one of the largest capital line items in every telecom operator’s budget, Network Capex continues to drive large numbers every year since network augments and migrations to newer technologies are unavoidable budget items. Today operators are spending huge sums of money on new network infrastructure for advanced telecom services like LTE / 4G, IPX, etc. without adequate visibility on revenue growth. A recent survey points out 20% of the assets fail to return cost of capital and 5-15% of these network assets are ‘stranded’.
Hence, effective capital expenditure and network asset lifecycle management are rapidly becoming a big boardroom issue for telecoms operators. This is only possible when all functions work together to maximize the returns from their investments. Both the CFO’s and the CTO’s teams in a telecoms operator should have a holistic and collaborative view on the network asset investment.
The urgent need is to have a strategic approach to asset assurance program which manages and reduces network capex substantially. ROC Asset Assurance is different from ERP services because of its workflow and analytics elements. It can initiate workflow to ensure that all the applicable assets are procured and deployed when needed. ROC Asset Assurance helps the CFO & CTO function within the operator company to tackle the following pain-points:
Planning of capital spend vs budget
Tracking deployed assets and ROI on those assets
What to buy, when to buy, where to buy, and for what reason?
Information related to assets
Ensuring usage of all available assets at the utmost efficiency
Network resource capacity and the need to respond
To learn how an effective asset assurance program will provide complete confidence to operators that their network will grow to meet market demands while also guaranteeing optimal value for every dollar of capital budget spent, download our newsletter: Asset Assurance – Preserving Capex While Improving Network Efficiencyfeaturing research from leading Analyst firm Gartner.
In continuation to my previous blog on Network Discovery and Analytics, which mostly revolved around some of the core but basic functionality that a Discovery system needs to possess. In this post lets discuss some of the key challenges that discovery systems have to face when they first get deployed.
1) Resistance from network operations teams.
Network operations for example would like for the network not to be disturbed during peak traffic hours or during maintenance windows. Towards that intent, the Network operations teams would like to be confident that a discovery system can be set up such that the system will at no cost touch the network for a configured duration. The network operations team would expect to have confidence in the discovery system that irrespective of the stage at which a network polling activity is in, the activity needs to be suspended on the event of hitting a black out window.
Non – intrusive discovery is what a network operations team would be confident about. By non-intrusive discovery, I mean that a device which is already overloaded with respect to its resource consumption should not be further affected by querying for information to perform a discovery
2) Resistance from IT teams managing N/EMS systems.
In the event of discovery from NBI of N/EMS or from gateways that maintain stateful sessions, in a lot of situations it would be required of the discovery system to not consume more than a configured number of sessions. This may be required to make sure that other northbound OSS systems a not denied a session request from the N/EMS or gateways. It may also be because only a few NBI sessions have been purchased from the equipment vendor and using more number of sessions than purchased may lead to a contract violation or error in discovery.
The OSS Network discovery applications have to evolve and stand up against the above challenges by introducing functionality like blackout support and network hit throttling capability.
Moving forward to the next stage of the evolution. Tier-1 or Tier-2 operators have discovery systems deployed and use this information to keep inventories in sych with the real world. But inventories typically store services or end-to-end circuits. And what is discovered are individual components of a service or circuit which have been provisioned within a network element. It is important from the perspective of the service provider to view the current state of a circuit in the Network, compare it with the inventory to fix misalignment. This brings in the responsibility within the Discovery system to be able to assimilate service components discovered within each network element along with the network element interconnectivity information and be able to plot end to end circuits for various technologies.
As of today, a few tier 1 operators who are pretty mature in their process are looking towards evolving the discovery system’s capabilities to be in near real time synch with the state of the network. A system which is able to listen to traps/events from the network and refresh its database with the latest state of the network. And use the near real time discovery system to evolve the inventories to be near real time up-to-date with the reality in the network.
It’s another thing that a lot of inventories even today are far from accurate despite a whole lot of tools out there in the market specifically designed to solve data integrity issues of Inventories. And the reason for that is the lack of a sound practice around the usage of these tools or a lack of commitment to adhere to a data integrity program or a failed OSS inventory transformation project etc
In the interim, while operators are trying to get their inventories cleaned up we believe that the Discovery system along with intelligent and actionable analytics has a lot more to offer to the planning and service assurance teams which has been mentioned below.
1) The discovered physical fabric can be compared and/or enriched with OSS inventories and ERP systems to build a 3600 view the asset. The 3600 view when recoded for a period of time becomes a very power source to a bunch of analytical function which would provide actionable intelligence to planning and network operation teams in helping with Capex avoidance.
2) The discovered logical/service component information when captured for a period of time can again be a source to bunch of other analytical functions (Time series trending and forecasting, What-If modeling and optimization) to help network operations and planning teams to perform network capacity management on accurate and as discovered information.
3) The discovered information when assimilated with topological analytics to calculate end-to-end circuits/services can be a powerful source of information to Service assurance teams to overlay alarms and generate a service impact view in their NOC.
Most CSPs today adopt a traditional capacity management approach that consists of planning their network resource requirements over the next 12 months based on past consumer trends.
Reality check! In today’s fast paced end-user consumption and service demand, trying to predict resource needs 12 months out based on past end-user behavior is like playing a lottery based on past outcomes in hopes to hit it big – More often than not you will lose big time.
The reality is that the past doesn’t predict the future anymore and that end-users are causing unpredictable shifts in resource consumption in the network as they tune into major events and build their life around real-time communications. Oh sure, many CSPs will read this and think, we have the latest probes in the network giving us loads of real-time data and complex flows where we know if packets are traveling left or right in the network. And yet with all this information CSP still can’t keep ahead of today’s data tsunami, without being concurrently choked by escalating CapEx. Having loads of low-level information more often than not causes data overload: You have so much raw data that you don’t know what it means from an overall congestion perspective without weeks or months of analysis. Or even worse, you may interpret trends differently depending on the data sample you examine, making it virtually impossible to project congestion and business impacts. Many CSPs with whom I have spoken face the same problem: When in doubt, pour more CapEx into the network, in the hope of adding the right resources to alleviate congestion.
What if there is a way to more precisely target the CapEx spend needed in the network, to deliver the services the CSPs need to thrive? In fact, there is, and it’s called “Real-time Capacity Analytics”!
Real-time Capacity Analytics is about understanding all capacity-related data rather than looking at it on a per attribute or device perspective – which provides little more clarity than just a blip in an ocean of traffic – instead looking at capacity consumption as it relates to end-to-end path and services to end-users. It is amazing how CSPs are concerned about how capacity congestion affects their subscribers and yet most solutions today fail to look at capacity from an end-to-end end-user perspective. Without an end-to-end view and understanding of how different segments of the network path and services affect congestion, CSPs may be spending CapEx in portions of the network that may temporarily relieve the symptoms of congestion rather than resolving the root cause.
So as a CSP, the next time you face customer impact based on congestion, ask yourself: Did I see it coming? Did I get the right warning signs that congestion was building up over time? Did I get a read of time to exhaustion that could have helped me plan added capacity before impacting my customers? Is my solution pinpointing where to target my Cap Ex? And finally, do I have a solution that can tell me if my network can accommodate new subscribers or services, and, if not, where will the congestion hot-spots occur and how much Cap Ex is needed?
If your current solution isn’t helping you answer any of the above questions, it is time to consider Real-Time Analytics for Capacity Management before your business gets swept away by the tides of capacity congestion.
Now, referring to the title, you may be thinking: That’s a rather cheeky thing to say given the high direct and indirect costs of errant data incurred by virtually all operators. You might cite the significant Opex penalty related to reworking designs and to service activation fallout. I get that. What about the millions of USD in stranded Capex most operators have in their networks? Check. My personal favorite comes from Larry English, a leading expert on information quality, who has ranked poor quality information as the second biggest threat to mankind after global warming. And here I was worried about a looming global economic collapse!
My point is actually that the discrepancies themselves have no business value. They are simply an indicator of things gone bad. The canary in the coal mine. These “things” are likely some combination of people, processes and system transactions, of course. Yet many operators make finding and reporting discrepancies the primary focus of their data quality efforts. Let’s face it, anyone with modest Excel skills can bash two data sets together with MATCH and VLOOKUP functions and bask in the glow of everything that doesn’t line up. Sound familiar?
For context, I am mostly referring to mismatches between the network and how the network is represented in back-office systems like Inventory—but the observations I will share can be applied to other domains. Data anomalies, for example, are all too common when attempting to align subscriber orders and billing records in the Revenue Assurance domain.
Too often, Data Integrity Management (DIM) programs start with gusto and end with a fizzle, placed on a shelf so that shinier (and easier!) objects can be chased. Why is this? Understanding that I am now on the spot to answer my own rhetorical question, let me give it a go.
The scourge of false positives: There are few things as frustrating as chasing one’s tail. Yet that is the feeling when you find that a high percentage of your “discrepancies” are not material discrepancies (i.e. an object in the Network but not in Inventory) but simply mismatches in naming conventions. A DIM solution must profile and normalize the data that are compared so as not to spew out a lot of noise.
The allure of objects in the mirror that are closer than they appear: OK, not sure this aphorism works but I trust you to hang with me. I am referring to misplaced priorities— paying attention to one (closer, easier) set of discrepancies while ignoring another set that might yield a bigger business impact once corrected. Data quality issues must be prioritized, with priorities established based upon clear and measurable KPI targets. If you wish to move the needle on service activation fallout rates, for example, you need to understand the underlying root causes and be deliberate about going after those for correction. Clearly, you should not place as much value on finding ‘stranded” common equipment cards as on recovering high-value optics that can be provisioned for new services.
The tyranny of haphazard correction: I’m alluding here to the process and discipline of DIM. Filtered and prioritized discrepancies should be wrapped with workflow and case management in a repeatable and efficient manner. The goals are to reduce the cost and time related to correction of data quality issues. If data cleanse activities are unstructured and not monitored by rigorous reporting, the business targets for your DIM program are unlikely to be met.
The failure to toot one’s own horn: Let’s say that your data integrity efforts have met with some success. Do you have precise measurements of that success? What is the value of recovered assets? How many hours have been saved in reduced truck rolls related to on-demand audits? Have order cycle times improved? By how much? Ideally, can you show how your DIM program has improved metrics that appear on the enterprise scorecard? It is critical that the business stakeholders and the executive team have visibility to the value returned by the DIM program. Not only does this enable continued funding but it could set the stage for “self-funding” using a portion of the cost savings.
The bane of “one and done”: For a DIM program to succeed in the long run, I suggest drawing from forensic science and tracing bad data to underlying pathologies… i.e. people, process and/or system breakdowns. A formal data governance program that harnesses analytics to spotlight these breakdowns and foster preventive measures is highly recommended. The true power of DIM is in prevention of future data issues so that the current efforts to cleanse data will not simply be erased by the passage of time.
Identifying data discrepancies is a good first step. Correcting and preventing them is even better. Institutionalizing DIM via continuously measuring and reporting your successes… well, you get the idea.
As I interact with more and more service providers about their network capacity issues, I’ve become sure about one thing – what worked before, isn’t really working anymore. The CapEx requirement for network equipment just to keep up with the exponential growth in data traffic (i.e., Data Tsunami) is still not getting them ahead of significant congestion issues and customer impacting events. Why? Traditional capacity management paradigms are not working.
Essentially, feedback from carriers of all sizes and types has exposed one of the most significant shifts in thinking regarding how to go about managing and planning for network capacity. They know that the rules are all changing and today’s content demands are outpacing the CSPs ability to keep pace. The first key question is how to get back in front of the capacity demand (we’ll talk about monetization next…stay tuned). So, why aren’t today’s processes scaling?
CSPs use a multitude of human resources and manual processes to manage network capacity. This may have scaled under slower and more predictable capacity growth curves, but thanks to services like You-Tube & Netflix, entire network capacity is shifting in quantum leaps.
Solutions provided by equipment vendors are often platform specific, and reinforce a silo approach to Capacity Management when a holistic view is needed. Service demand congestion is a network phenomenon which doesn’t care about individual equipment vendors or devices.
CSP planning groups leverage data and make decisions based on systems which have 20 – 40% inaccuracy in comparison to the actual capacity availability in the network.
Today’s CSP solution approach is often homegrown where 90% of the time is spent on acquiring and understanding raw data. As a whole, everyone is trying to answer the question of how to proactively eliminate the possibility of congestion, but most are still focused on addressing the symptoms and not preventing the problem
It is surprising to note that even top tier/technology leaders cannot accurately predict where and when capacity issues will impact their networks. This lack of visibility hurts CSPs considerably because as per our own studies, network events are behind can account for up to 50% of customer churn in high value mobile data services.
And the Capacity Management problem doesn’t really end there; in many ways it’s like a supply chain process. Marketing owns the function of forecasting where service uptake will drive capacity needs across the network. When Marketing underestimates service uptake, there is a real and significant impact to potential revenue: On average, it can take about 3 months from when capacity is fully tapped in a Central Office (CO) to when new capacity can be added to your network. During that time, customers expecting service availability become hugely frustrated and begin to churn. Engineering groups are pushed into panic-mode, trying to react as fast as possible – often putting capacity in the wrong places due to inaccurate data – resulting in further congestion, service degradation, an inefficient use of capital.
The message from CXO’s is crystal clear – there is an urgent and dire need to find new ways of monetizing the data crossing their networks. This need is exacerbated with OTT content and net-neutrality. SLA and authentication based revenue models are absolutely dependent on knowing what types of content/services are traversing your network, how much capacity they consume, and how utilization is driven by your consumer’s interests and activities. This type of analysis requires a critical and intelligent binding of network and services data with business data to truly assess the financial impact to the CSP. Many Business Intelligence (BI) solution leaders will lay claim to abilities here, but actually fall very short of the mark. Instead, real experience suggests that solutions in the marketplace today either:
Can handle the financial aspects of your business but have no understanding of today’s network dynamics in terms of capacity issues and services;
Can handle parts of your network very deeply, but do not correlate or provide a holistic view at the service level; or,
Can collect some network and service level information, but have no ability to incorporate business data to understand the impact to the business – i.e,. cost, subscriber behavior, propensities
All the above challenges bring us to the inevitable question – what kind of approach does one take in order to tackle capacity management issues? How does one stop chasing traffic and focus on flattening the CapEx curve instead? In order to attain ‘Capacity Management Nirvana‘, a proactive and scalable approach needs to be adopted by CSPs. An approach which not only intelligently binds network and business strategies based on the Data Tsunami realities but also brings proactive and predictive capacity management to the table. At the end of the day, a CSP should have access to all their capacity, the ability to leverage real and immediate feedback on the change in capacity as service uptake increases, and finally, the right tools and intelligence to get in front of what’s coming.