I am currently mildly obsessed with the idea of multivariate information decomposition. It is a computationally difficult problem which, when solved or even successfully approximated, can prove to be of great use in complex system analysis.
The idea is relatively straightforward. Let's say you have access to a bunch of signals describing a system, such as stock prices on an exchange. The activity of a single signal will be due to a mix of deterministic and novel (sometimes irrational) actions.
Deterministic actions from the perspective of a single stock price include all automated trades based on trading algorithms. The inputs to such an automated trading system do not need to be deterministic for the analogy to hold. The point is that trading algorithms process information, and in doing so transfer it from other sources to the stock price of interest.
Novel sources of information include investors acting on a whim. In some sense, the mix of emotion and thought that caused such actions resulted from a combination of other signals in the system, but this is computationally intractable.
Most of the activity on a stock exchange will be due to regurgitation of information in complex, extensive feedback loops. Real sources of original information, such as economic forecasts or other news events, is often rather limited. Furthermore, inputs that can be argued to have some rational bearing on the long-term prospects of a company, such as annual reports or special announcements, comprise a small subset of all data on which investors and algorithms typically act.
Imagine it was possible to trace all the various intricate, interacting sources of information that ended up affecting a single stock price. Such a deconvolution would allow us to draw a directed graph whose structure can be expected to include various feedback loops. If the scope was sufficiently broad, we could hope to see the actions of and interactions between automated trading algorithms as cycles in the graph. By examination of the hierarchy of such a graph, we could potentially identify nodes which broadcast information to the system but are relatively unaffected by other nodes inside the scope investigated. These are the real handles on the system, and their information contained real predictive relevance in retrospect.
Provided the structure of information flow does not change significantly over time, these high-ranking nodes can be expected to remain useful for sequence prediction. In the case of the stock market, information transfer network structural stability cannot be expected or used for forecasting purposes, unless perhaps over short periods by exploiting knowledge of the typical delay of expected knock-on effects. However, there is a broad range of potential applications where the structure is invariant and interactions intricate, such as chemical plants and human physiology. In such systems, even a crude approximation towards the identification of the most important sources of novel information can be extremely useful in applications ranging from fault detection and diagnosis, online monitoring and even new diagnostic sensor development.
In future posts, I will explore recent advances in this field and dig into some numerical examples. In the meantime, I recommend this introductory lecture on the topic:
A conceptual viewpoint on information decomposition