Sirius is an open source Python package and web application to support a novel method for analyzing high-dimensional data using mutual information feature networks. This project is a collaboration between the University of Vermont Complex Systems Center and the MassMutual Data Visualization team. Check out and clone the project via the GitHub page here!
Mutual information scores allow us to find feature pairs which are highly dependent. Feature pairs with high mutual information scores are connected in the resultant network graph.
The tool is designed to process data of continuous and discrete data types; mutual information can be calculated among homogenous or heterogeneous data type pairings.
Unlike correlation, mutual information allows us to find dependence among features which is non-linear. Considering the ‘Datasaurus Dozen’ dataset, a commonly used example to demonstrate data with similar summary statistics but recognizable difference in dependence, mutual information adds more nuance to our understanding of feature relationships through summary statistics.
Above: Some example charts output by the application, demonstrating: in the top row, feature pairs with high mutual information, and in the bottom row, feature pairs with low mutual information.
A backbone method is applied to the feature network which selects the most statistically significant edges and displays them in the resultant web-based application.
Stay tuned for the paper!