SIX Swiss Exchange suffered an unfortunate outage last week, the worst in over a decade. To reduce outages, a good problem-management process, as part of an ITIL-based IT support model, is key. This article provides insight into:
- What a good problem-management process includes.
- What a problem incident report should include.
- When they should be expected.
Editor: Antony Fung, Marketing Manager at CJC Ltd.
SIX Swiss Exchange suffered an unfortunate outage last week, the worst in over a decade. Unscheduled outages are every IT engineer’s worst nightmare and can slip through even the best-architected, robust, and resilient infrastructure. A good problem-management process is vital to reduce outages.
Problem management is an overarching extension of incident management, which aims to restore service operations swiftly and minimise business impacts, focusing on the root cause investigations of Severity 1 and 2 incidents. It subsequently recommends preventative measures to avoid future recurrence.
Specialist Support for Capital Markets IT Infrastructures
CJC supports real-time market data infrastructures for the world’s leading Market Data Providers, Exchanges, Vendors, Investment Banks, Brokers, and buy-side institutions. Our hundreds of clients are supported by our global, 24x7 IT teams. CJC is powered by expertise, underpinned by a world-class IT Service Management framework.
All CJC's technical staff are ITIL accredited and CJC monitors IT environments for more than 400 clients 24/7, with an average of 500,000 monitoring points per client - half a million things that could go wrong every day.
Coordinating with all parties, from end-users to vendors, CJC performs root cause analysis to find and recommends remedial solutions to the problem. Instances are tracked until closure, followed by a post-mortem report as required.
The problem management process includes:
- Correlating and organising incidents to create a Problem Record.
- Analysis of the Problem Record, including available history.
- Characterising and prioritising the Problem Record and determining the appropriate actions.
- Providing actionable recommendations to the client.
- Working with the client’s Change Management team to deploy a permanent fix.
- Providing root cause analysis for Severity 1 Incidents.
- Close the Problem Record on resolving.
Formal and detailed incident reports are available to clients that experienced a Severity 1 incident where CJC is accountable for the root cause. The reports are designed to provide clients with complete transparency on significant incidents and inform clients of the proposed plan to reduce the risk of recurrence.
Each incident report provides a summary of the event, including:
- Details of the incident and impact.
- Description of the root cause.
- Details of the action plan to reduce recurrence.
CJC aims to deliver 3 updated reports in the following timeframes:
- Initial incident report published on the same business day, or within 24 hours after reaching an incident resolution.
- Updated (root cause) incident report within 3 business days of the incident.
- Final (root cause and action plan) incident report within 5 business days of the incident.
How CJC Can Help:
CJC is the leading market data technology consultancy and service provider for global financial markets. CJC provides multi-award-winning consultancy, managed services, cloud solutions, observability, and professional commercial management services for mission-critical market data systems. CJC is ISO 27001 certified, enabling CJC’s partners the freedom to focus on their core business.
For more information, contact us or:
Email: [email protected]
Tel: +44(0) 203 328 7600