Total system entropy can serve as a single metric for describing the overall activity in a complex, interacting system, such as a crypto coin exchange:
Suppose you have a set of time series vectors describing a system. The values of the time series are stored as single-precision floats, defined by 32 binary characters, or 32 bits. Sampling this signal at one-second intervals results in a sampled data rate of 32 bits per second (bps) per variable.
The streaming data rate is not the same as the information rate of the signal. If the signals are all constant, each new sample is not adding any new information about the system, except perhaps that nothing has changed (which could also be a valuable data point in some scenarios).
Information theory defines the information of a signal as a measure of uncertainty. The more unpredictable a signal, the less we are sure about its next value, and the more information each new sample brings. Perceiving all signals encountered in your environment from this perspective can bring a bit of enlightenment.
Consider an unchanging message. Instead of recording a new floating point number for ever timestamp, we can only record the number once together with a rule stating that it remained constant between a certain starting and ending point in time. Recording this rule will require significantly less storage space than the set of raw samples. Finding repeating sequences and describing their occurrence is the fundamental principle behind compression.
Shannon introduced the idea that the theoretical limits of a signal's compressibility describe its information content. The math works very well for finite state systems fully described by discrete signals. For continuous processes with an infinite amount of possible states up to an arbitrary small resolution, such as the global weather system or a chemical plant, approximations are required involving numerous caveats.
Analytical solutions for the entropy of signals conforming to special statistical distributions have been found, including the Gaussian distribution. It is my goal to discover as much as possible from general sampled systems without enforcing any a priori assumptions about linearity or conforming to a specific distribution; thus I will refrain from using them except as a reference.
Without going into the details, the Kozachenko-Leonenko estimator is considered one of the most suitable estimators of the entropy of a sampled continuous signal. I will make use of the implementation in the excellent Java Information Dynamics Toolkit unless otherwise mentioned.
A practical example
Consider the historical prices of a select few cryptocurrencies on a famous exchange relative to Bitcoin (BTC):
We immediately note that a universal scale while plotting absolute values makes it difficult to visually inspect the patterns and relations of currencies with arbitrarily different unit values. Try double-clicking on the Ripple (XRP) label in the legend to plot it in isolation. Scale hid all this activity.
Real-life signals often require pre-processing before any meaningful analysis metrics can be computed. Typical operations performed during preprocessing include scaling and detrending, introducing the first round of caveats.
Standardisation is a common scaling strategy involving mean-centering and scaling by the standard deviation. Many researchers apply this rather blindly, as this particular form of scaling is almost always only justified if every signal's "excitement" share some common standard. Scaling warrants a discussion of its own.
Without offering any explanation, I have decided to scale the crypto prices by subtracting the median from each signal and scaling by the largest difference between the median and the 84th and 16th percentiles, respectively, resulting in the following significantly more visually pleasing chart:
Long term trends in data can influence results if you are mainly interested in modelling more instantaneous behaviour. Targeting pre-processing towards sensitivity for a particular band of frequencies is a topic for another day.
In this case, we are interested in the instantaneous activity of the crypto exchange, and one-step differencing is a simple but effective technique that meets our needs.
After taking the value at every timestamp to be the difference between it and the value at the previous sampled time of our scaled data:
Individual signal entropies
Although calculating signal entropies might be interesting, it is a significantly more useful to calculate it over multiple finely overlapping time bins and get a distribution of results. Most analysis exercises warrant some stability analysis, including simple correlation. More on this in a future post.
To observe trends in the averaged differential entropy, we calculate it for a thousand overlapping bins each spanning two days:
Negative values, yup. Estimating entropy of continuous sampled systems is hard. Convergence is not guaranteed. The differential entropy calculation failed for Ripple (XRP) in every single time bin, returning a negative infinity result. Future posts will address this detail, including alternative estimators that are parameter sensitive. For now just accept that, when the calculation converges, under the assumption that our scaling approach is sensical, higher values indicate more activity and lower values indicate less uncertainty.
System (multivariate) entropy
Entropy calculations rely on probability density function (PDF) estimates. Applying the entropy estimate functional on a joint distribution calculates the combined entropy of a set of signals. Perhaps surprisingly, this is often more stable.
The differential entropy of all signals under consideration over time produces a trend that can visually be related to the activity of the detrended scaled signals: