October 20, 2023
Monitoring Azure Kubernetes, Confluent Kafka with ITRS – A Proof of Concept
Cloud Solutions,
Managed Services,
Observability,

Register to receive insights from CJC

Get notified of the latest news, insights, and upcoming industry events.

Ensuring Next Generation Tooling & Expertise

Image
Steve Moreton

In the fast-paced world of capital markets, where timely and accurate information can make or break investment decisions, the role of real-time financial data cannot be overlooked. This article outlines CJC’s Proof-of-Concept (POC) initiative to build a Confluent Kafka-based setup in Microsoft Azure Kubernetes Services (AKS) with ITRS Geneos monitoring. Key points covered:

    1. The Evolution of Market Data Distribution.
    2. POC Goals and Implementation.
    3. ITRS and Prometheus Integration.
    4. How CJC Can Help.

Contributors (below) | Editor: Kelly Wong, Marketing Executive at CJC.

 

DOWNLOAD AS PDF

The Evolution of Market Data Distribution

The capital markets are built on market data, or to be specific, real-time financial information, often delivered in low latency and high throughput. Traditionally, market data distribution relied on specialist platforms designed by vendors such as Refinitiv, an LSEG business. However, just over a decade ago, there was a rise in ‘off-the-shelf’ systems, with many companies, including the notable example of RBCCM, moving to Solace.

In recent years, there has been a significant shift towards embracing cloud and open-source technologies with 94% of enterprises using cloud services. Many exchanges have transitioned to Kafka, an open-source messaging platform gaining traction in various global industries.   

Spearheaded by the global engineering team, CJC is actively collaborating with clients on Kafka-related projects. See how the team aspired to build a cloud-based Kafka setup and complement it with the flagship monitoring tool, ITRS Geneos, within a 2-month allocated period.

POC Goals & Implementation

After careful review, CJC decided to build a Confluent Kafka-based infrastructure in Azure Kubernetes Service (AKS) and establish an integrated ITRS monitoring environment within the AKS cluster. The ITRS plugin will be used to monitor AKS for insights into hardware, networking, and OS metrics. A JMX plugin will connect to Confluent Kafka, allowing the team to inspect the topics and queues inside. Real-time data will be stimulated using Wikipedia update logs. 

POC Success Criteria

  • Build a stable AKS environment running Confluent Kafka.
  • Configure a Kafka cluster mirroring a real-time market data environment.
  • Connect ITRS Geneos.
  • Connect ITRS to AKS – for Hardware / OS / Networking / top-level Kafka statistics.
  • Enable ITRS to access Confluent Kafka via JMX for insight into applications, queues, and topics.
  • Ensure system stability.
  • Confirm whether the setup meets the high standards of CJC’s managed services.

Azure, AKS, and Confluent Setup

CJC engineers initiated the process of setting up cloud infrastructure to host a Kafka environment in Microsoft Azure. This began with an internal ticket requesting budget approvals, which were granted upon review. Within the Azure subscription, a resource group named ‘ConfluentTest’ was developed to compartmentalise the essential Azure services required for the POC.

With the initial requirements created, the focus shifted to configuring AKS. The AKS cluster was named ‘ConfluentTestCluster’ and opted for default values during the initial setup. To streamline management within AKS, the namespace ‘Confluent’ was curated to host all Kubernetes pods associated with Confluent.

Source: CJC - Screenshot of Confluent for Kubernetes Control Center.

Source: CJC - Screenshot of ITRS connecting to Azure AKS and Confluent Kafka Environment.

The deployment of Confluent for Kubernetes Package was executed using Helm, a package manager that automates the creation, packaging, configuration, and deployment of Kubernetes applications. A noteworthy challenge arose when the team discovered that the memory requirements exceeded the capacity of the initial 5 AKS nodes. Consequently, the node count was increased to 10 to effectively stabilise the environment.

Once stabilised, CJC engineers proceeded to establish a connection with the Confluent for Kubernetes Control Center. Here, they meticulously configured and tested the Kafka cluster, ensuring that it closely mirrored setups employed by clients utilising similar market data systems such as RTDS/TREP or Solace.

As part of the comprehensive testing approach, a Wikimedia change handler and publisher was devised. This custom solution, built using IntelliJ IDEA 2022.3.2, was designed to continuously capture recent changes from Wikimedia’s data stream and publish this data into the Kafka cluster. This setup enabled opportunities to publish multiple partitions, subsequently subscribe to those partitions and enhance testing capabilities.

ITRS & Prometheus Integration

ITRS Setup

Source: CJC - Screenshot of Dynamic Entities

The introduction of Geneos into the Kubernetes environment revealed a need for increased memory in the nodes. To avoid horizontally scaling the nodes, the Kafka cluster was modified from an enterprise environment to a development environment. It’s worth noting that the plugins used had only been recently released in Q1 2023, to which ITRS provided exceptional support throughout the project.

The default 3 Kafka brokers, 3 zookeepers, 3 schema registries were all reduced to a single instance of each component. This transformation was achieved by modifying the YAML file provided by Confluent, with specific configuration overrides set in place to ensure that the Kafka cluster now operated with just one broker.

Source: CJC - Prometheus plugin

To achieve a dynamic state, the base configuration was loaded into all the nodes as part of a ‘Daemonset’ during startup. This configuration encompassed crucial elements, such as Gateway connection parameters, plugin selections, monitored namespaces, and, most importantly, the ‘dynamic entity’ mapping type specification. This mapping type was intricately connected to a series of individual mappings and samplers, all of which are referenced and embedded within the actual manifest used by the containers at runtime.

These individual mappings define how the data is read and manipulated for viewing within the Geneos Active Console. In the example (left), the Prometheus plugin is used to filter on a particular string and map various data items to either samples, attributes, or entities which can then be viewed in the Console - enhancing monitoring and analysis capabilities.

Other Tooling - Prometheus

CJC encountered a significant volume of system logs generated by Confluent Kafka, necessitating monitoring, storage, and review. The team recognised ITRS as the ideal solution for core monitoring and alerting of AKS / Confluent Kafka, however, these system logs needed to be stored for review. From experience, CJC knew that these are best stored in a time series database and decided to complement ITRS with Prometheus.

Source: CJC - Screenshot of ITRS connecting to Prometheus from POC

Prometheus, a widely used monitoring and alerting tool, was used to monitor Confluent Kafka running on Azure AKS. It is purpose-built for scraping and aggregating multiple platform metrics while scraping hundreds of endpoints. Like ITRS, Prometheus collects metrics from processes with JMX exports.  Additionally, Prometheus can be integrated with Grafana and other open telemetry tools, including CJC’s observability tooling, mosaicOA) to visualise the collected metrics.

In this scenario, CJC connected the ITRS FKM plugin to Prometheus and used its powerful regex queries to effectively review system logs. The engineering team were impressed with the setup, as it proved to be entirely dynamic and exposed all the metrics provided by Confluent via the JMX Prometheus exporter.

How CJC Can Help

Ensuring the compatibility of tooling and skills with next-generation technology is a vital research and development endeavour for CJC. These projects and technical journeys expose the team to new systems and challenges.

CJC’s key findings revealed that once the environments for Confluent and ITRS were correctly sized, the system maintained its stability. ITRS exhibited the necessary plugins and capabilities for seamless connection and monitoring of AKS and Confluent Kafka. The system remained stable and ITRS excelled in providing insights into the system’s behavior. To manage the vast system logs, the team integrated Prometheus, while ITRS was continually used to review this data.

As an enterprise-class, globally recognised platform, and a core skillset, ITRS serves as CJC’s gateway to client infrastructure. It was gratifying to see this platform seamlessly supporting the next generation of real-time data delivery and collaborating with other tooling. CJC look forward to assisting clients in transforming their infrastructure and middleware to the cloud, ensuring that these systems meet the high standard of managed services.

About CJC:

CJC is the leading market data technology consultancy and service provider for global financial markets. CJC provides multi-award-winning consultancy, managed services, cloud solutions, observability, and professional commercial management services for mission-critical market data systems. CJC is vendor-neutral and ISO 27001 certified, enabling CJC’s partners the freedom to focus on their core business.

For more information, contact us or:

Email: [email protected]
Tel: +44(0) 203 328 7600

Get In Touch

Get in touch with our experts to learn how we can help you optimize
your market data ecosystem!
Arrange a Meeting