“Nothing has such power to broaden the mind as the ability to investigate systematically and truly all that comes under thy observation in life” - Marcus Aurelius
CJC support a selection of the capital markets most vital IT infrastructures. The low latency, high throughput arena of market data distribution is essentially a globally interconnected network of servers all continually tuned and monitored over a 24x7 period, year in, year out.
To support these infrastructures, most firms use the well-known discipline of monitoring. CJC leverage a range of powerful IT monitoring platforms and tools to adhere to our strict SLA and KPI agreements with clients.
Monitoring is essentially a real-time speciality. If something goes wrong, the monitoring platform will alert (red) and when the issue is resolved, the issue will disappear (green). You can view the system behaviour and see how a CPU or application statistic changes in real-time. However, when an engineer wants to review how an infrastructure is behaving over a period, such as a week, month, or year – they are no longer monitoring. This is now the discipline of observability. This is now commonly referred to as IT Analytics.
By storing these Infrastructure, networking, operating system and application metrics from an IT monitoring platform, support teams suddenly have an enhanced level of understanding at their disposal. For an IT team using just monitoring, even the most capable IT engineers could not provide an answer to the following question: “How have the server CPUs behaved over the last 6 months?”. Without storing the data, the engineer could only answer based on human memory-based observations.
Many monitoring tools provide either limited or no capabilities in this regard or provide a restful API for the customer to provide a solution to this themselves. True IT analytics needs long term granular storage of the data, which many databases are not fit for purpose. A powerful query system is demanded along with the capability to present the data effectively. Without these, the capable engineer is again let down in answering simple questions.
With good IT Analytics in place, such as the CJC mosaicOA platform, the engineer can finally be able to:
Data Visualisation: Understanding behaviour of Infrastructure on various application patterns to determine the optimal behaviour of the infra and technological components, benchmarks, and baselines over or under utilisation.
Real-time Application Behaviour Learning: Learns & correlates the behaviour of applications based on user pattern and underlying Infrastructure on various application patterns.
Capacity Management: Accurately predicts future system states across infrastructure, networking, and application.
Root Cause Analysis: The IT critical/warning events created by infrastructure or application stack being stored long term, can help engineers chronologically pinpoint root causes and compare historical system behaviour pathologies.
Reporting: Creation and visualisation of service dependency maps, application/networking/architecture topologies for IT executives/engineers to review.
Observability adds analytics capabilities that engineers have not been able to use before. Using mosaicOA as an example:
A Server lifespan of CPU data
Since this server became live in April 2017 to its decommission in January 2021 – every single moment has been stored and understood. The client has been able to achieve a stable system for almost 4 years and has always been able to keep the server in tolerance levels by observing its behaviour through analytics.
Business insights from data-driven applications is a paramount requirement of capital markets support teams. Application statistics such as market data update rates are a key capacity indicator.
A server’s lifetime of market data volatility peaks
Observability of IT metrics is vital as Market data update rates and volatility are doubling every 2 years. Understanding how this is affects your infrastructure can only be done via observation and analytics.
The understanding of your system’s performance both right now and over time can aid in capacity planning of future strategies. Right now, firms are migrating and transforming their critical infrastructures to the public cloud, to achieve improved costs and enablement. However, digital transformation can only be successfully achieved through building an architecture that is sized correctly. Many firms attempt to size based on a limited data snapshot built over small time periods. Many well-known retail banks have had notable cloud outages for weeks based on not correctly sizing their infrastructure.
The cloud provides great enablement and access to machine learning capabilities. IT analytics is one area where machine learning, AI engineering and data science are pioneering. Firms are looking for their IT systems to both be able to detect and fix issues along with providing capacity measures in an automated manner.
Firms can easily plug into tools such as Google Big Query, Databricks or data robot. The mosaicOA platform comes with advanced techniques such as principal component analysis – this is capacity management done continually on all client servers to highlight which ones are over or underutilised. It would take a vast amount of human capital to recreate these insights. Also available is the new mean shift anomaly detection tooling – where mosaic continually tracks a server or applications ‘mean’ behaviour and provides escalation if that behaviour increases over time.
Observability does not replace monitoring, these are tools that complement each other to provide the ultimate stability, uptime, and future-proofing for your infrastructure.
If you would like to discuss our services and products further – please contact [email protected]