How to reconcile Real-Time and Batch processing using In-Memory technology

Software AG GSThis post was originally posted at Fabien Sanglier’s personal blog, here.

As you might remember, we(*) participated in a “Plugfest”(**) earlier this year in San Diego. Here is the summary post of what we built for that occasion: http://fsanglier.blogspot.com/2013/04/my-2013-afcea-san-diego-plugfest.html.

This time around, we entered the plugfest competition as a technology provider at the AFCEA Cyber Symposium, which happened last week (June 25-27 2013) in Baltimore. We not only provided technologies components and data feeds to the challengers (San Diego State University, GMU, Army PEO C3T milSuite), but also built a very cool Fraud Detection and Money Laundering demo which was 1 of the plugfest use case for this cyber event.

Our demo was centered around a fundamental “Big Data” question: How can you detect fraud on 100,000s transactions per seconds in real-time(which is absolutely critical if you don’t want to lose lots of $$$$ to fraud) while efficiently incorporating in that real-time process data from external systems (i.e. data warehouse or hadoop clusters).
Or in more general words: How to reconcile Real-time processing and Batch processing when dealing with large amounts of data.

To answer this question, we put together a demo centered around Terracotta’s In-Genius intelligence platform (http://terracotta.org/products/in-genius) which provides a highly scalable low-latency in-memory layer capable of “reconciling” the real-time processing needs (ultra low latency with large amounts of new transactions) with the traditional batch processing needs (100s of TB/PB processed in an asynchronous background jobs), all bundled in a simple software package deployable on any commodity hardware.

Here is the solution we assembled:

Cyber Plugfest Software Architecture
How to reconcile real-time and batch processing

A quick view at how it all works:

  1. A custom transaction simulator generates pseudo-random fictional credit card transactions and publish all of them onto a JMS topic (Terracotta Universal Messaging bus)
  2. Each JMS message is delivered through pub/sub messaging to both real-time and batch track:
    1. The Complex Event Processing (CEP) engine which will identify fraud in real-time through the use of continuous queries.
      1. See “Real-Time fraud detection route”
    2. Apache Flume, an open source platform which will efficiently and reliably route all the messages into HDFS for further batchprocessing.
      1. See “Batch Processing Route”
  3. Batch Processing Route:
    1. Apache hadoop to collect and store all the transaction data in its powerful batch-optimized file system
    2. Map-Reduce jobs to compute transaction trends (simplified rolling average in this demo case) on the full transaction data for each vendors, customer, or purchase types.
    3. Output of map-reduce jobs stored in Terracotta BigMemory Max In-Memory platform.
  4. Real-Time fraud detection route:
    1. CEP fraud detection queries fetch from Terracotta BigMemory Max (microsecond latency under load) the hadoop-calculated averages (in 3.2), and correlates those with the current incoming transaction to detect anomalies (potential fraud) in real-time.
    2. Mashzone, a mashup and data aggregation tool to provide visualization on detected fraud data.
    3. For other plugfest challengers and technology providers to be able to use our data, all our data feeds were also available in REST, SOAP, and Web socket formats (which were used by ESRI, Visual Analytics, and others)

As I hope you can see in this post, having a scalable and powerful in-memory layer acting as the middle man between Hadoop and CEP is the key to providing true real-time analysis while still taking advantage of all the powerful computing capabilities that Hadoop has to offer.

In further posts, I’ll explain in more detail all the components and code (code and configs are available on github athttps://github.com/lanimall/cyberplugfest).

Notes:

(*) “we” = The SoftwareAG Government Solutions team, which I’m part of…
(**) “Plugfest” = “collaborative competitive challenge where industry vendors, academic, and government teams work towards solving a specific set of “challenges” strictly using the RI2P industrial best practices (agile, open standard, SOA, cloud, etc.) for enterprise information system development and deployment.” (source: http://www.afcea.org/events/west/13/plugfest.asp)

Here’s a screen cap of the dashboard shown during the event.

AFCEA

Sign up for your free CTOvision Pro trial today for unique insights, exclusive content and special reporting.

CTOvision Pro Special Technology Assessments

We produce special technology reviews continuously updated for CTOvision Pro members. Categories we cover include:

  • Analytical Tools - With a special focus on technologies that can make dramatic positive improvements for enterprise analysts.
  • Big Data - We cover the technologies that help organizations deal with massive quantities of data.
  • Cloud Computing - We curate information on the technologies enabling enterprise use of the cloud.
  • Communications - Advances in communications are revolutionizing how data gets moved.
  • GreenIT - A great and virtuous reason to modernize!
  • Infrastructure  - Modernizing Infrastructure can have dramatic benefits on functionality while reducing operating costs.
  • Mobile - This revolution is empowering the workforce in ways few of us ever dreamed of.
  • Security  -  There are real needs for enhancements to security systems.
  • Visualization  - Connecting computers with humans.
  • Hot Technologies - Firms we believe warrant special attention.

 

Recent Research

What The Enterprise IT Professional Needs To Know About Git and GitHub

3D Printing… At Home?

Tech Firms Seeking To Serve Federal Missions: Here is how to follow the money

Creating The New Cyber Warrior: Eight South Carolina Universities Compete

Mobile Gamers: Fun-Seeking but Fickle

Update from DIA CTO, CIO and Chief Engineer on ICITE and Enterprise Apps

Pew Report: Increasing Technology Use among Seniors

Finding The Elusive Data Scientist In The Federal Space

DoD Public And Private Cloud Mandates: And insights from a deployed communications professional on why it matters

Intel CEO Brian Krzanich and Cloudera CSO Mike Olson on Intel and Cloudera’s Technology Collaboration

Watch For More Product Feature Enhancements for Actifio Following $100M Funding Round

Navy Information Dominance Corps: IT still searching for the right governance model

solid
About Ryan Kamauff

Ryan Kamauff is a senior analyst with Crucial Point LLC. He produces technology focused content for CTOvision.com and reports on analytical megatrends at the new analysis focused Analyst One.