I recently reviewed Patrick Keefe’s new book Chatter .... I won’t describe that book here, but I did post a critical review of it elsewhere. The book did make me think about the term and how it has come to be used to describe the activity of terror groups. How do we think about the changes in such signal intelligence over time, determine when changes are significant, and consider the causal effects that influence the changes?
Security agencies think about chatter in terms of terror level and how we should react, but many other industries are starting to have access to data streams that can be analyzed in terms of signals and their changes over time. Point-of-sale data sources, call centers, increasing use of sensors, video-image analysis and RFID tagging are producing an avalanche of rich data sources, which can be studied in real time. Even the notion of ‘real-time’ has started to evolve. Where in the past analysis of commercial data may have been performed monthly or quarterly, now transactions can be tapped into to provide data directly from the cash register or RFID scanner to use much smaller time slices.
So how do we know when an observed measure has changed? In the 1920’s Walter Shewhart of Bell Telephone Labs developed the Control Chart, a statistical means of observing such streams and determining the significance of changes. Later Deming was to lionize them for understanding the operation of processes. These methods are starting to re-emerge as a means for understanding data streams. These are powerful techniques, but also rely on statistical assumptions that may not hold for new forms of data.
How do we determine what has caused changes in a data stream? Intelligence communities have to worry that terrorists may be manipulating the chatter level. In the same way we need to understand what causal elements can be influencing any set of signals we are observing.
Also important is the idea that processes are not just all about a single measure, they are often dependent on multiple streams of information, with varying time dimensions, accuracy and availability. Newly purposed Bayesian techniques are now being used to model and analyze such multiple data-stream systems.
Companies are emerging to address these problems. Here is a recent article on some vendor activity in this area. Notable is the University of Illinois startup Riverglass, which "... has developed software that merges data from multiple, disparate data streams--including unstructured text and numeric data--and applies real-time data modeling and analysis techniques to those streams. The goal is to detect patterns in the data to identify potential investment risks and opportunities ..."
Chatter is now all around us, and as we continue to develop new means to generate data though sensor networks, we will need to understand what it all means, and link it to decisions we can ultimately make. There is much work to be done.
We're in an era of After-The-Fact and Just-In-Time everything. Folksonomy navigation a la Flickr is one example. Google is another (get the links, then rank them, then shuffle them again when queried).
The same "chatter extractors" that will work for intelligence will do just fine for seeing what people think of tonight's O.C. or the new soap I just launched.
I want it personally to help me find my way through the hundreds of listservs I subscribe to on a political campaign (maxed out my gmail account so that extra gig will come in handy), the abundance of messaging (email, blog, wiki, IM, phone) within and among my projects, and the conversations taking place on my 1000 favorite blogs.
You might look at software from Burning Glass (no relation to Riverglass, I think) that uses rules and genetic algorithms to match people to jobs. It was based on a systems they built to detect credit card fraud from the smallest clues. http://burningglass.com/
Posted by: Phil Wolff | April 04, 2005 at 09:37 PM
Phil,
Thanks for the thoughts, I will check out your suggestions, had not heard of that particular vendor before.
Posted by: Franz | April 04, 2005 at 10:01 PM