A look at the importance and history of time-series data, CJC capabilities, and how we take time-series database to the next level.
Steven Moreton, Global Head of Product Management, CJC
Time-series databases are in a period of incredible demand which shows little sign of slowing. Firms are needing to store hundreds of thousands of metrics per second for analysis, be it market or non-traditional data. At CJC, we provide significant skillsets around the notable industry time-series databases. From support to migration tooling, and not to mention our own powerful IT analytics tool, mosaicOA. However, it would be slightly incongruous to write a post on time-series databases, without detailing its some of its rich history in the capital markets.
Perhaps we need to go back as far as the work of Charles Henry Dow, who of course with Edward D Jones formed Dow Jones and Company in 1882. Dow was a pioneer in technical analysis and between 1884 and 1896 created and refined the Dow Jones Industrial Average. Since its introduction, this has continually been measured against a time, from the Klondyke gold rush and the moon landings, to the global events happening throughout the world in 2020.
In the 80s and 90s, the users of the earliest market data terminals wanted to see market data visualized. CJC reached out to Glyn Bradney and Edward Birch, pioneers for graphics and time-series databases for Reuters at this time. Glyn and Edward revealed that relational databases were never an option. The first DOS and Windows 3.1 terminals achieved graphics with a flat-file, time-series database which was called the DBU -– Data Backup Unit. Over time, these DBUs moved from the user’s desktop and centralized both to far more powerful and shared time-series databases located at the end-user firm or hosted at the vendor’s data centres. Much work has been done over the years to collate and store this incredible historical time-series data. Refinitiv/Google now provides tick data going back to the 1990s.
As we move into the modern technology era, clients began to store market data to the full, unconflated tick level market data using various database types. Tick data could come from market data vendors or from internal sources. Tick data is frequently used for testing future or historical trading strategies. Firms potentially need to store full granularity for years. As before, traditional and relational databases, such as SQL quickly become bloated and overwhelmed. Market data provides databases with the combined challenge of storage and continuous queries in real-time; this is where time-series solutions have remained the database of choice.
Time-series database is a core skillset at CJC due to its importance in the capital markets. Our teams provide world-class support and expertise, looking after well-known platforms that are powered by leading open source and enterprise time-series solutions. At CJC, we’ve also built our own powerful IT analytics tool. Let’s look in some detail of our proven capabilities…
Our client’s systems demand a wide range of database expertise. SQL, mySQL, PostgreSQL, noSQL, Sybase, and Oracle are a few of the traditional databases under management. However, when it comes to storage of market data, KDB+ is a popular time-series choice and a skillset found at CJC.
A common requirement for CJC is ETL – Extract/Transform/Load. Commonly, firms may use one database for tick capture storage (such as flat files or Oracle) and need this into a more suitable time-series database – such as KDB+. Our tried and tested ETL processes extract the data, transform and load it into its new database format. CJC has done this for countless clients over the last 10 years.
At its heart, ETL is about taking control of your data and getting it into a format and location that best suits your business. With firms looking to leverage the scale and specialist technologies offered by public clouds for big data, we’re seeing increased interest in the creation of and migration to cloud data warehouses.
CJC has advanced cloud-native tooling such as Kubernetes, to enable and accelerate cloud adoption of complex workloads including database-driven systems – in short, we can reduce the timeframe of ETL from weeks to days or days to hours.
From an operational standpoint, a market data fed time-series database needs to handle vast amounts of unconflated data from exchange/vendor sources. Every tick sent must be captured. On paper, DBs such as InfluxDB/KDB can handle hundreds of thousands of updates per second; billions of updates per day. To achieve this in practice, CJC will architect a resilient database schema from a software and hardware perspective, be it on physical server technology or in private/public cloud. The market data sources, the database, and the underlying technology must all be resilient and robust to avoid data gaps.
Once the system is set up, CJC has expertise in the monitoring of time-series database platforms. This can be best demonstrated with visualizations from our time-series database powered product, mosaicOA. Pro-active monitoring is required to ensure that the system is up and running and inside tolerance. CJC monitor exchange and market data feeds to ensure they are up and not suffering from gaps.
The relationship between a market data source (publisher) and the database (subscriber) requires a messaging layer (pubSub). For market data, this could be TREP or Solace.
Using the time-series database to visualize the number of updates coming into the database
Monitoring and visualizing the history and present of time-series database with time-series database
CJC monitor thousands of database-related IT metrics, like the above, including the number of updates per second, write queues, and DB size, to name a few. We store the IT metrics from a time-series database to a time-series database - indefinitely.
Taking time-series database to the next level
Our expertise around time-series database led CJC to build our own product mosaicOA – a big data IT analytics visualization platform powered by time-series database InfluxDB. Everyday world-leading capital markets banks send billions of IT metrics from their market data/microservices infrastructure for visualization and indefinite storage.
To create this, CJC works alongside The Data Analysis Bureau (T-DAB) (t-dab.com) a leading data science and data engineering innovation partner specializing in machine learning.
Our team enhances the data from its raw storage level whilst the T-DAB team brings in a data-driven strategy and machine learning (https://t-dab.com/cjc-big-data-visualisation-with-machine-learning/) led analytics using techniques such as principal component analysis.
The product stores every raw update from the source system. It also stores the data, at the point of ingestion, into 15 minutes and daily summaries. This allows the precision of the data returned to be defined by the scope of the request.
I hope you have enjoyed this insight into the history of time-series databases in the capital markets, its uses, and the expertise that CJC provides to the community. CJC are located in major financial locations - New York, London, Hong Kong, and Singapore. If you have a challenge or requirement around time-series databases, please get in touch.
Steve and CJC would like to kindly thank Glyn Bradney and Edward Birch for their contributions to this post.