I recently reviewed Patrick Keefe’s new book Chatter .... I won’t describe that book here, but I did post a critical review of it elsewhere. The book did make me think about the term and how it has come to be used to describe the activity of terror groups. How do we think about the changes in such signal intelligence over time, determine when changes are significant, and consider the causal effects that influence the changes?
Security agencies think about chatter in terms of terror level and how we should react, but many other industries are starting to have access to data streams that can be analyzed in terms of signals and their changes over time. Point-of-sale data sources, call centers, increasing use of sensors, video-image analysis and RFID tagging are producing an avalanche of rich data sources, which can be studied in real time. Even the notion of ‘real-time’ has started to evolve. Where in the past analysis of commercial data may have been performed monthly or quarterly, now transactions can be tapped into to provide data directly from the cash register or RFID scanner to use much smaller time slices.
So how do we know when an observed measure has changed? In the 1920’s Walter Shewhart of Bell Telephone Labs developed the Control Chart, a statistical means of observing such streams and determining the significance of changes. Later Deming was to lionize them for understanding the operation of processes. These methods are starting to re-emerge as a means for understanding data streams. These are powerful techniques, but also rely on statistical assumptions that may not hold for new forms of data.
How do we determine what has caused changes in a data stream? Intelligence communities have to worry that terrorists may be manipulating the chatter level. In the same way we need to understand what causal elements can be influencing any set of signals we are observing.
Also important is the idea that processes are not just all about a single measure, they are often dependent on multiple streams of information, with varying time dimensions, accuracy and availability. Newly purposed Bayesian techniques are now being used to model and analyze such multiple data-stream systems.
Companies are emerging to address these problems. Here is a recent article on some vendor activity in this area. Notable is the University of Illinois startup Riverglass, which "... has developed software that merges data from multiple, disparate data streams--including unstructured text and numeric data--and applies real-time data modeling and analysis techniques to those streams. The goal is to detect patterns in the data to identify potential investment risks and opportunities ..."
Chatter is now all around us, and as we continue to develop new means to generate data though sensor networks, we will need to understand what it all means, and link it to decisions we can ultimately make. There is much work to be done.
Recent Comments