Abstract
While brain imaging tools like functional magnetic resonance imaging (fMRI) afford measurements of whole-brain activity, it remains unclear how best to interpret patterns found amid the data’s apparent self-organization. To clarify how patterns of brain activity support brain function, one might identify metric spaces that optimally distinguish brain states across experimentally defined conditions. Therefore, the present study considers the relative capacities of several metric spaces to disambiguate experimentally defined brain states. One fundamental metric space interprets fMRI data topographically, that is, as the vector of amplitudes of a multivariate signal, changing with time. Another perspective compares the brain’s functional connectivity, that is, the similarity matrix computed between signals from different brain regions. More recently, metric spaces that consider the data’s topology have become available. Such methods treat data as a sample drawn from an abstract geometric object. To recover the structure of that object, topological data analysis detects features that are invariant under continuous deformations (such as coordinate rotation and nodal misalignment). Moreover, the methods explicitly consider features that persist across multiple geometric scales. While, certainly, there are strengths and weaknesses of each brain dynamics metric space, wefind that those that track topological features optimally distinguish experimentally defined brain states.
Author Summary
Time-varying functional connectivity interprets brain function as time-varying patterns of coordinated brain activity. While many questions remain regarding how brain function emerges from multiregional interactions, advances in the mathematics of topological data analysis (TDA) may provide new insights. One tool from TDA, “persistent homology,” observes the occurrence and persistence of n-dimensional holes in a sequence of simplicial complexes extracted from a weighted graph. In the present study, we compare the use of persistent homology versus more traditional metrics at the task of segmenting brain states that differ across experimental conditions. We find that the structures identified by persistent homology more accurately segment the stimuli, more accurately segment high versus low performance levels under common stimuli, and generalize better across volunteers.
INTRODUCTION
One of the perennial questions in neuroscience concerns how neuronal signaling generates time-varying experiences. One foundation from which to address this question asserts that brain function emerges from neuronal communication within the context of multiscale neuronal networks. Having access to high-quality whole-brain imaging data, the field of time-varying functional connectivity (TVFC, or chronnectomics; Calhoun, Miller, Pearlson, & Adalı, 2014), offers an empirical approach to characterizing time-varying patterns of mesoscopic neuronal communication (Hansen, Battaglia, Spiegler, Deco, & Jirsa, 2015; Hutchison et al., 2013).
Early computational analysis of brain imaging data observed changes in vectors describing brain topography across conditions. FC instead defines a geometry among brain regions by computing pairwise similarities from their long-term spontaneous activity measures (Biswal, Zerrin Yetkin, Haughton, & Hyde, 1995). While the similarity between regions is often calculated using the Pearson correlation among spontaneous neuroimaging signals (Biswal et al., 1995; Buckner, Krienen, Castellanos, Diaz, & Yeo, 2011; Stoodley, Valera, & Schmahmann, 2010), in general, the idea of brain connectivity can apply to other methods of computing pairwise edges between nodes in the brain. For instance, the present study defines TVFC using instantaneous coherence.
But is the overt geometry of brain imaging data an optimal set of features through which to view and compare brain dynamics? Or, does FC geometry tend to be an idiosyncratic and volunteer-specific descriptor of the brain’s state (Finn et al., 2015)? An alternative perspective observes that an FC graph may be treated as a network. From here, the analyst may compute graph-theoretic summaries such as centrality, strength, small-worldness, and so forth (Bullmore & Sporns, 2009; Farahani, Karwowski, & Lighthall, 2019). However, it is not clear that network properties become clearer when segmenting the brain into more parcels. Rather, the observation of important network properties may require a precise parcellation schema (Gordon et al., 2016).
A more complete picture of neuronal dynamics should account for multiple scales of functional connectivity. One way to gain this perspective is to consider data as an approximate sampling of an underlying, typically low-dimensional, geometric object, that is, as a topological space. In this framework, we may describe the potentially many-body interactions between points or regions of interest using simplices. In the simplest and most abstract definition, a k-simplex σ = [p0, p1, …, pk] is a set of (k + 1) points pi with an ordering. The topology of a space is defined by collections of simplices, called simplicial complexes, that are closed under intersection (i.e., X is a simplicial complex if ∀σ, σ′ ∈ X; then also σ ∩ σ′ ∈ X). Disconnected holes and cavities are described by the homology groups Hk of the simplicial complex: H0 describes connected components of the complex, H1 its one-dimensional cycles, H2 three-dimensional cavities, and so on for higher ks.
Topological data analysis (TDA) attempts to reconstruct the data’s underlying abstract topological space by quantifying the presence and persistence of homological features across different scales (e.g., distances between points, or intensity of correlation between different regions in FC graphs). Such features may include connected regions of a topological space, and its holes in various dimensions, from one-dimensional cycles to higher dimensional cavities (Battiston et al., 2020; Phinyomark, Ibanez-Marcelo, & Petri, 2017). TDA has been described as “exactly that branch of mathematics which deals with qualitative geometric information” (Carlsson, 2009, p. 2). In practice, one does not focus on a single complex X but rather on a filtration 𝕏 = [X0, X1, X2, …, Xn], a sequence of nested simplicial complexes, such that Xi ∈ Xi+1, which approximates the topological structure at different scales. In this case, the analogues of homological groups are persistent homological groups, which capture not only the presence or absence of a hole, but also at what scale it appears and at what scale—if any—it disappears. In this way, persistent homology generates topological summaries, called persistence diagrams, that can then be used to compute topologically informed distances between datasets (see Methods).
Rethinking the more traditional brain dynamics metric spaces from the perspective of topology, values for nodal activity, edge weight, degree strength, and so on are properties that decorate k-simplices. Thus, we can consider more traditional metrics as adopting a “simplicial approach,” while a “topological approach” focuses on topological features associated with sequences of simplicial complexes. To compare simplicial and topological spaces of brain dynamics, we leverage preexisting rest and task fMRI data from 18 volunteers (Gonzalez-Castillo et al., 2015). We compare instantaneous brain images using each of six metric spaces—three simplicial metrics, and three topological metrics. Metric spaces are embedded onto two dimensions to facilitate statistical tests relating clusters of brain images with common experimental conditions (for more details, see Figure 1 and Methods). In part A of Figure 2, we report an instance of the embeddings output from the six brain dynamics metrics spaces, that is, the metric space from differential node topography, differential edge geometry, differential degree strength, and also the three topological distances between homology groups in dimensions 1, 2, and 3 (the homology groups H0, H1, and H2). Points often form dense regions associated with certain experimental stimuli. After 256 bootstrap samples of the embedding process, we find that the topological approach excels at distinguishing experimentally distinct brain states.
RESULTS
Volunteer-Wise Representation
As an initial test of the quality of each embedding space, we ask how well the clusters in each embedding generalize across volunteers. To do so, we count the number of points falling into clusters wherein between one and six volunteers contributed a not-insignificant number of points to each cluster. Figure 3 displays the results of this count as percentages with respect to the total number of time points in the test embedding. Following the subsampling and bootstraping schema described in the Methods section, volunteer-wise generalizability was assessed over 256 independently reinitialized embeddings. Bold lines in Figure 3 display the mean, while shaded regions show the 95% confidence interval. A right-skewed distribution indicates increased generalizability, because it means that the densest watershed regions are significantly populated with many volunteers. A left-skewed distribution indicates that most watershed regions are specific to one or few volunteers, that is, that observed brain dynamics are idiosyncratically related to specific volunteers.
Overall, topological metric spaces offer embeddings that generalize better across volunteers than the other metrics we consider. Not only does homology present right-skewed distributions in Figure 3, this category of metrics also aggregates significantly more points into embedding clusters that are general for all six volunteers.
It may be possible for metric spaces to generalize too well. For instance, the metric space differing node activity agglomerates the largest percentage of time points into bins having between four and six represented volunteers. However, as will become clear in the next section, this state generalizability comes at the cost of the capacity to distinguish between experimental conditions. Indeed, it appears that the node metric space produces embeddings with a single dense core, plus a few distant outliers.
Stimulus Segmentation
A central indicator of embedding quality is the degree to which time points colocalize when belonging to the same stimulus condition. Part B of Figure 2 shows an example result of testing watershed clusters against the hypothesis that a significant number of within-cluster points corresponds to any of the five experimental conditions. For each stimulus type, Figure 4 shows the percentage of points from that stimulus residing in clusters significantly associated with that stimulus (blue boxes). Here again, we report the result as a distribution after 256 independently reinitialized embeddings. Larger percentages of significantly colocalizing points indicate increased capacity to identify brainstates associated with experimental stimuli.
For comparison, we offer two null models computed from randomly permuted point labels. The first null distribution (yellow boxes) permutes point labels among the significant clusters defined previously. It reflects the expected number of points that would randomly collect into the preidentified set of significant clusters. The inclusion of this null model is motivated by the fact that some embeddings clump more points than others into the same watershed region, and would thus hold a larger percentage of points from any experimental condition by default. The effect size (Cohen’s d) between this null distribution and the real distribution provides an indication of how well each embedding isolates brain states induced by distinct experimental stimuli. The second null distribution simply permutes point labels before attempting to find watershed clusters having a significant number of points from any of the five experimental conditions (black boxes). This second null distribution provides a good check on the rate of false positives.
Here again, the homology-based embeddings perform very well compared with embeddings constructed from simplicial overlap. This is especially the case for the H0 metric space which tends to present, over all stimuli, the highest effect sizes. The second-highest effect size is found from the H1 metric space, and the third from the strength metric space.
It is interesting to note that, of all the homology-based metrics, the embeddings using Wasserstein distances in H2 provide the worst segmentation over stimuli. While this may indicate that aspects of TVFC topology are restricted to very low dimensions, the computationally motivated coarsening of voxelwise information into 328 brain regions also limits the appearance of high-dimensional homologies.
The embeddings over nodes produce states that are highly generalizable across volunteers, but that are very poor at distinguishing experimental conditions. In direct contrast, the embeddings over edges are the least generalizable across volunteers, but produce embeddings wherein many time points are found in watershed clusters with correctly labeled experimental conditions.
Task Performance
Assuming that differences in performance should be detectable as different brain states under common stimuli, we expect to see large differences between measures of brain dynamics during task time points in which volunteers made fewer or more correct responses. We can test this because the experimental design includes performance metrics for each task, especially the percentage of correct responses for each task block. To do this we computed “mean performance graphs” for each task and each valenced performance level (see Methods). Within each task, performance was valenced as having either more correct responses, or fewer correct responses with respect to a mean split of the performance characteristics for that task from the entire dataset.
Part B of Figure 5 displays distances between pairs of mean graphs (across metric spaces and performance levels). Of particular note are the distances computed across the valenced performance levels, but within the same category of metric space (Figure 5, white annotations). These values directly measure the sensitivity of each metric space to distinguishing different brain states under common stimuli. Overall, the distance between valenced mean graphs is largest with respect to the topological metric spaces. This is especially true from the perspective of the Jaccard distance (part B of Figure 5, lower triangles). From the perspective of the Wasserstein distance in H0 (upper triangles), the strength metric also demonstrates strong cross-valence differences.
The values in part B of the figure should be compared against summary statistics in part A, and to Table 1. Displaying the root mean square (RMS) and standard deviation of the set of distances between each mean graph and their component TVFC graphs provides some indication of the diversity of brain dynamics at times with common stimuli and response characteristics. Compared with Table 1, the RMS edge distance between mean graphs and component TVFC graphs is below the average edge distance between all TVFC graphs. By contrast, the RMS Wasserstein distance in H0 between mean graphs and component TVFC graphs approaches the maximum H0 distance across all TVFC graphs. Through the lens of a simplicial approach, mean graphs localize centrally among all TVFC graphs. By contrast, through the lens of the Wasserstein distance in H0, mean graphs are very different from all other TVFC graphs. This observation confirms that the simplicial approach and the topological approach are observing very different features of the same dataset.
Visualization of Homological Information
Finally, having identified the high utility of brain dynamics metric spaces developed from homology to disambiguate group-general brain states, we wanted to gain some insights into the features of TVFC that homology resolves. Owing to the optimal performance of the H0 metric space, in Figure 6, we present a visualization of topological features of a mean performance graph, and also of an instantaneous TVFC graph. Parts A and B of the figure display the H0 and H1 homology groups at a single threshold. However, we would like to emphesize that persistent homology considers the topology of point clouds over a complete filtration across thresholds. Part C of the figure gives a sense of the multiscale properties of the topological lens. Each point in the persistence diagram represents that the homology groups of the point cloud differ at that threshold. Interestingly, the observed homology groups in the mean performance graph are shifted to less coherent birth distances compared with the homology groups from the sample TVFC graph. Both distributions of birth and death times are above the threshold for significant wavelet coherence distance, 0.6, as defined relative to an AR1 model of the input data (see part B of Supplemental Figure 0.1). This shift indicates the loss of highly coherent edges among mean graphs.
DISCUSSION
Whereas brain function is believed to emerge from extensive coordination among brain regions, what quantifiable features best typify state-specific brain organization remains a subject of intense and ongoing research (Battaglia & Brovelli, 2020; Lurie et al., 2020). To better understand the correspondence between the methods used to describe brain dynamics, and the quality of the eventual descriptions, we compared the performance of two broad classes of TVFC metric spaces: one based upon overlap distances between decorated k-simplices, and the other based upon k-dimensional homological structures. The results of the present study provide evidence that the homology of coherence-based TVFC effectively disambiguates experimentally defined brain states in the population-general brain. By contrast, the performance of approaches based on network and simplicial overlap generally performed worse at distinguishing population-general and experimentally relevant brain states (see Figures 3 and 4).
Mapping Brain Dynamics
Given a good space for representing brain dynamics, it is possible to map relationships between stereotypical brain states and subtly different conditions. Utilizing the same dataset as the present study, Saggar et al. (2018) computed distances between node activities to visualize two-dimensional mappings of within-volunteer temporal similarity. In the majority of cases, the visualization depicts smooth transitions across time points. Smooth transitions over short distances are clearly depicted during the resting state. Smooth transitions are also a feature of most temporally adjacent transitions during task states. However, for some volunteers, the mapping depicts disjoint transitions within the context of a single experiment.
Using a complementary dataset, Billings et al. (2017) also computed maps of node activity distances. Distances were mapped across a population of volunteers. Even at the group level, a general trend was observed of variable activity punctuated by moments of clear transitions between focal brain states. Similarly, a sample of the nodes embedding shown in Figure 2 contains a single densely populated region, with several peripheral clusters.
It is interesting to note that, whereas all three simplicial approaches depict embeddings having several disjoint clusters, embeddings utilizing topology depict a more continuously varying state space. Given the improved capacity of the topological approach to segment experimentally defined states, it is interesting to consider that the topology-based embeddings may establish maps of brain states wherein transitions across the embedding space relate directly to trajectories through a latent space of brain dynamics.
Towards a Topological View
While studies implementing simplicial metrics evidence that brains select conserved dynamical patterns towards the production of brain function, the empirical and theoretical support for emphasizing homological and other topological descriptors has prompted several authors to reinterpret neuronal dynamics from a topological perspective (Curto, 2017; Giusti, Ghrist, & Bassett, 2016; Lerda, 2016; Rasetti, 2017; Reimann et al., 2017; A. E. Sizemore, Phillips-Cremins, Ghrist, & Bassett, 2019; Stolz, 2014). A. E. Sizemore et al. (2018) evidence that cliques and homological cavities in the mesoscopic space of structural brain images reflect known brain networks. Further evidence that cliques and homologies encode microscopic interactions among neuronal circuits has been discovered within the hippocampal placefield (Basso, Arai, & Dabaghian, 2016; Dabaghian, Brandt, & Frank, 2014; Giusti, Pastalkova, Curto, & Itskov, 2015) and in the somatomotor representation of the head (Chaudhuri, Gerçek, Pandey, Peyrache, & Fiete, 2019). The present results provide further support for the utility of the topological approach to discern the evolution of brain states through time, thus to possibly improve our comprehension of the brain’s multiscale self-organization.
As a quantitative tool, persistent homology is tailor-made for defining topological similarities among metric spaces (Carlsson, 2009). Indeed, fMRI studies have implemented persistent homology to discern group-level FC differences in task performance (Ibáñez-Marcelo, Campioni, Phinyomark, Petri, & Santarcangelo, 2019), and with respect to pharmacological treatments (Petri et al., 2014). Similar findings are observed in MEG data (Duman et al., 2019). Stateful segmentation was also achieved from homological features in H0 for eight-channel EEG TVFC as volunteers engaged in a visuo-motor task (Yoo, Kim, Ahn, & Ye, 2016).
Visualizing Topology
Certainly, functional connectivity describes a multiscale process. And while there are ongoing questions regarding the pathways through which otherwise structurally distributed brain networks form TVFC networks (Damoiseaux & Greicius, 2009), the development of data-driven functions that operate over spectral and spatial features of complex networks may drive new insights. The view from homology may be especially useful when topological features are expected to be important, that is, when one expects multiple scales of patterned connectivity among clusters in H0, and/or higher order (dis)connected cycles in H1 and above. The present observation of meaningful homology in H0 may relate to the standard description of brains as functioning through (clustered) functional brain networks. Given the theoretical significance of homology in H0 (e.g., multiscale clustering), and its computational speed increases relative to computing homology in H1 and above, it appears to be worthwhile to use persistent homology in H0 as a general tool for describing and comparing brain states.
Limitations and Future Directions
Future research should strive to make a more detailed catalogue of the homologies that commonly appear among brain regions. While the present study resorted to a very coarse brain parcellation, it is not clear that 333 parcels provide a maximal resolution of brain dynamics. In theory, a more fine-grained sampling of brain signals from different brain regions should enhance the capacity for persistent homology to distinguish brain states, albeit up to some plateau. By contrast, element-wise operations over simplicial decorations benefit from clustering (Glasser et al., 2016; Gordon et al., 2016) and unmixing (Kunert-Graf et al., 2019; Smith et al., 2009). Future work should utilize TDA’s capacity to make good use of the intrinsically fine-grained information contained in fMRI data to catalogue the stability of topological features across multiple scales of parcellation. Similar comments could be made regarding the use of the data’s intrinsically multispectral coherence in place of the power-weighted coherence (see Methods).
Another limitation of the present study is the reliance on clustering in a low-dimensional embedding space. Even while low-dimensional embeddings provide an efficient means for visualizing data, there is always some loss of information. For instance, the UMAP (Uniform Manifold Approximation and Projection) method for embedding point cloud data transduces an explicit nearest neighbor approximation of the high-dimensional simplicial complex into the low-dimensional space. This nearest neighbor approximation may run into problems when temporally adjacent brain states are much more similar to themselves than to states from other volunteers (see, e.g., Supplemental Figure 0.2). And while there is some evidence to suggest that metric spaces utilizing an edge distance depict volunteer-specific “fingerprints” (Finn et al., 2015), the present study pursues extensive subsampling to avoid idiosyncratic and autocorrelated similarities. Partial alleviation of idiosyncratic information might also be achieved by deconvolving each scan with a volunteer-specific hemodynamic response function. Moreover, future work that biases the low-dimensional embedding in a more appropriate way—perhaps by learning a transductive vector embedding as in Bai et al. (2019)—may offer some additional improvements. In any case, approaches that circumvent dimensionality reduction entirely by operating in the native high-dimensional space may offer the most general solution to the loss of information during low-dimensional embedding.
Finally, it is always interesting to consider more concise multispectral decompositions than those provided by Morlet wavelet kernels. Perhaps kernels that imitate the canonical hemodynamic response function would offer a more compact representation of fMRI data. Also, while the Morlet wavelet is roughly symmetric, it may be useful to implement asymmetric filters that place more emphasis on information from more recent time points.
In Conclusion
To understand the dynamic self-organization of complex systems like the brain, it helps to view system dynamics through lenses that highlight the presence and the structure of complexes. And whereas TDA understands data in terms of complexes of simplices, it makes sense to utilize TDA to understand brain function. Given the kinds of weighted graphs typical of TVFC analysis, persistent homology is well suited for interpreting complexes of brain regions. The view from homology outperforms more traditional graph metrics—like the activity measures of zero-dimensional nodes, and like the weights of one-dimensional edges—at the task of segmenting experimentally defined brain states into features that generalize well across multiple volunteers. The observed utility of the topological approach presents a novel and enticing lens through which to understand the complex network architecture of human brain dynamics.
METHODS
As described in Figure 1, our procedure unfolds across four steps:
Acquire task and resting-state BOLD fMRI data from a group. Apply minimal preprocessing.
Compute TVFC as instantaneous coherence.
Differentiate instantaneous brain dynamics via each of six metrics.
- (a)
Euclidean distance between node topographies,
- (b)
weighted Jaccard distance between edge geometries,
- (c)
weighted Jaccard distance between the weighted degree strength of networks,
- (d)
sliced-Wasserstein distance between topographic persistence diagrams in H0,
- (e)
sliced-Wasserstein distance between topographic persistence diagrams in H1, and
- (f)
sliced-Wasserstein distance between topographic persistence diagrams in H2.
- (a)
Embed brain dynamics metric spaces onto two dimensions for visualization and statistical analysis.
Data Acquisition and Preprocessing
To discern the relative capacities of a range of distance metrics to disambiguate dynamical brain states induced by stimuli, for the present study, we adopt a dataset acquired during the presentation of multiple experimentally defined tasks. Study methods benefit from scans acquired continuously over relatively long time spans as the process of spectral filtration requires complete overlap between the signal and the filtration kernel so as to avoid effects at the undefined edges of the time series. And, whereas we are interested in signals in the low-frequency fluctuation range (1/100 seconds2), we require scans to be longer than 200 s.
The data acquired by Gonzalez-Castillo et al. (2015) met these criteria. These data were publicized as an open-access dataset through the XNAT neuroimaging database (https://central.xnat.org; project ID: FCStateClassif). Here, we briefly summarize the dataset as follows: 18 volunteers were scanned continuously over 25.5 min (7 Tesla, 32-element coil, gre-EPI, TR = 1.5 s, TE = 25 ms, 2 mm isotropic). Preprocessing was performed to transform individual datasets into a common MNI space and to remove artifacts from slice timing, motion, linear trends, quadratic trends, white matter signals, and CSF signals. Data were spatially smoothed using a 4-mm FWHM Gaussian filter. They were temporally band-pass filtered to between 0.009 Hz and 0.08 Hz. Finally, images were downsampled to 3 mm isotropic, and normalized to common (MNI) coordinates. Data were acquired in compliance with a protocol approved by the Institutional Review Board of the National Institute of Mental Health in Bethesda, Maryland. For complete preprocessing details, please refer to Saggar et al. (2018). In addition to the aforementioned steps, voxelwise data were spatially aggregated onto an atlas of 333 brain regions (Gordon et al., 2016). Up to five brain regions contained no information from some volunteers, and were excluded from all datasets for the remainder of the analysis (numbers 133, 296, 299, 302, and 304, indexed from 0. See also the missing patches in Figure 1, part A). Thus, the finest granularity of study results are over 333 − 5 = 328 brain regions. During the scan, volunteers interacted with three block-design tasks and one rest stimulus. Each task was presented twice. Each task presentation lasted 3 min, and was proceeded by a 12-s instruction block. Tasks included “video,” watching videos of a fish tank while responding to a visual target; “math,” computing algebra problems; and “memory,” a two-back memory task with abstract shapes. A “rest” stimulus was also included, and entailed the presentation of a fixation cross for 3 min. Stimuli were randomly ordered in a fixed sequence for all volunteers. For each task block, performance metrics were collected, including the percentage of correct responses.
Time-Varying Connectivity
Following Torrence and Compo (1998), symmetric wavelets will produce similar coherence values. And without strong support for any particular wavelet kernel, we adopt the complex Morlet wavelet as the CWT kernel. The filter is a plane wave modified by a Gaussian, ψ = eiω0t/se−t2/(2s2). We set the base frequency to ω0 = 6. Following Farge (1992), an ω0 ≥ 6 ensures that the function’s nonzero average is outside machine precision. Spectral selectivity increases with increasing ω0, at the expense of decreased temporal selectivity (e.g., sharper filters require more temporal support). Thus, a base frequency of ω0 = 6 ensures maximal temporal resolution.
To account for the cone of influence at the temporal edges of the wavelet filtration, as well as the loss of precision at the temporal and spectral edges of the smoothed coherence data, the outside 120 time points and the outside 2 scales are dropped before taking the summation in Equation 1. The removed time points include one whole “rest” block, and one whole “video” block. Coherence graphs are thus available for the middle 777 images of the scan, and for 11 spectral scales between 0.0095 and 0.1 Hz.
Distance Metrics Comparing Brain Dynamics
Theory.
Having constructed TVFC graphs for all included time points and for all volunteers, we pursue two broad alternatives for comparing brain dynamics. The first is related to element-wise differences between the decorations (e.g., weights) applied to graphs. The second relates to shared topological structure. To describe in detail these two views, it is useful to supply some definitions.
A graph G = (V, E) represents a set of V nodes interconnected by E edges. Nodes and edges may be decorated with properties such as value, weight, directionality, sign, layer, degree centrality, degree strength, and so on. A collection of k completely interconnected nodes forms a clique, C. In the following, we identify cliques with geometric primitives called “simplices” in standard fashion (Petri et al., 2014; Petri, Scolamiero, Donato, & Vaccarino, 2013); that is, to a clique of k + 1 nodes we associate the corresponding k-simplex, σk. For instance, two connected nodes form a 2-clique. The surface enclosing a 2-clique is a 1-simplex, that is, an “edge.” A 2-simplex formed by a clique of three connected nodes is a “filled triangle,” and so forth for higher order simplices.
Formally, a simplicial complex is a topological space, 𝒦, composed of all σk and their subfaces. Along the same lines, a clique complex, Cl(G), is a simplicial complex formed from an unweighted graph G by promoting every k-clique into a (k − 1)-simplex. Holes in dimension k may develop within the boundaries established by closed chains of (k − 1)-simplices. Such holes are called homologies.
The topological approach, TDA, includes methods for identifying topological features of an abstract geometric object represented by a data sampling. By contrast, the more traditional approach to comparing brain dynamics constitutes a simplicial approach that directly compares the decorations applied to sets of simplices.
Homology.
The boundary of a homology is termed a “homological cycle” or “generator.” To illustrate the concept, consider the case of four nodes connected in a cycle such that each node has exactly two edges. The nodes form neither a 4-clique nor a 3-dimensional simplex because there are two missing edges. Rather, these nodes form a connected cycle that is the boundary of a two-dimensional hole. This void space is also called a homology in dimension 1 (i.e., formed by a set of 1-d edges). The kth homology group, Hk(𝒦), describes the (k + 1)-dimensional holes bounded by chains of k-simplices. For example, the H1 homology group are the holes bounded by edges in 𝒦, H2 are the voids bounded by filled triangles, and so on.
The term homology follows from the Greek homo, the same, and logos, relation, to indicate that the hole belongs to an equivalence class that is categorically the same across continuous deformations that neither break the boundary nor create new simplices spanning the boundary (e.g., inflation, compression, rotation, and translation). Different representative cycles may therefore exist that describe the same homological cycle. For instance, a very elastic coffee cup could be continuously contracted into the shape of a donut, as they share the same toroidal topology. For the sake of convenience, a homological cycle is often represented as the minimal representative cycle (Guerra, De Gregorio, Fugacci, Petri, & Vaccarino, 2020; Petri et al., 2014).
Simplicial distances.
Further, we compute distances between the explicit 0-dimensional values decorating each node; for example, with respect to the signal activity of each node. Specifically, for each point in time, we treat the absolute values of multispectral wavelet coefficients from all brain regions as an ordered vector. We then compute the Euclidean distance between vectors from different points in time.
The third distance is inspired by previous work on relations between graph networks and homological cycles. Lord et al. (2016) demonstrate that the nodes’ weighted degree (also called strength) is significantly correlated with the frequency and the intensity with which nodes participate in the shortest representatives of homological cycles. The third distance is thus the weighted Jaccard distance between vectors of the node-wise weighted degree, also called the strength, of each TVFC graph.
Homological distances.
While many TVFC studies regard only the graph’s connectivity as the feature of primary import, TDA provides a suite of tools to further develop network properties into conserved higher order structures in point cloud data (Carlsson, 2009; Edelsbrunner, Letscher, & Zomorodian, 2002; Patania, Vaccarino, & Petri, 2017) and in weighted networks (Chung, Lee, Di Christofano, Ombao, & Solo, 2019; Petri et al., 2013; A. Sizemore, Giusti, & Bassett, 2017).
Homology is defined on simplicial complexes. In the case of persistent homology of weighted graphs, simplices are added to the complex incrementally, and appear at and beyond some threshold. Varying this threshold allows us to track how homological features appear and persist across thresholds (Petri et al., 2013). A complete representation of homolocial features within some range of thresholds is called a filtration. By observing topological features over a filtration, “persistent homology” allows us to take a multiscale view of the data that accounts for both the explicit connectivity structure of the system, as well as the relative importance of ensembles of connections that emerge over some range of scales.
Formally, we define the Vietoris-Rips simplicial complex 𝒦r = Rips(G(E < r)) as the clique-complex of the weighted graph G composed after removing all edges, E, longer than r. From this, we may recover the complex’s k-dimensional homology group, Hk(𝒦r). Within the boundaries of thresholds a and b, let [ra, …, r − ϵ, r, …, rb] be the longest series wherein any Hk(𝒦r) and Hk(𝒦r − ϵ) are not identical. The ordered set [Hk(𝒦)] defines a filtration over G. A homology class α ∈ Hk is said to be born at radius u if a class of homotopy equivalent homologies are not supported in 𝒦r for any r < u. The homology class α is said to die going into 𝒦v if v is the lowest index wherein at least one (k + 1) − clique is established within the boundary of the homology. Persistent homology was computed using version 0.4.1 of the Ripser package as bundled with the Scikit-TDA toolbox for python (Tralie, Saul, & Bar-On, 2018). Ripser finds it is faster to compute cohomology, the covariant functor of homology. Thus the algorithm computes cocycles in Hk that track the disappearance of σk+1 along the reversed filtration (De Silva, Morozov, & Vejdemo-Johansson, 2011).
The persistent homology of a filtration over G is summarized by collecting the birth/death pairs of k-dimensional homology classes as points (u, v) in a “persistence diagram.” It is naturally possible to compute a persistence diagram for each simplicial dimension up to the maximum dimension of the simplicial complex. But because the computational load to calculate persistence homology increases exponentially with the homology dimension, we limit the present study to the investigation of persistence homology in dimensions 0, 1, and 2. The case of 0-dimensional persistence diagrams—corresponding to 0-dimensional holes, that is, disjoint sets of connected nodes—is particularly interesting as the homological classes are slices through an agglomerative clustering among nodes when using the “simple” linkage distance.
Persistence diagrams can, themselves, be endowed with a metric structure. This means that it is possible to measure distances between persistence diagrams. Such distances encode how different the homological structures of two TVFC graphs are. One such distance is a multidimensional analogue of the earth-mover distance, known as the sliced-Wasserstein distance (Carriere, Cuturi, & Oudot, 2017). The sliced-Wasserstein distance between persistence diagrams is bounded from above by the total distance between the associated topological spaces (Mileyko, Mukherjee, & Harer, 2011). In the present study, for each pair of persistence diagrams of a given dimension, we calculate the average Wasserstein distance, over 20 slices (see Carriere et al., 2017 for details). That is, for all pairs Gi = Gj we compute d(Hk(𝒦i), Hk(𝒦j)).
Visualization/Output
Having developed metric spaces to compare simplicial and homological brain dynamics, we want to assess their relative capacities to represent apparent brain states. To this end, we embed each metric space onto a two-dimensional manifold using the Uniform Manifold Approximation and Projection (UMAP) algorithm (McInnes, Healy, & Melville, 2020). As illustrated in Figure 2, the embedding process facilitates state-space visualization and segmentation. UMAP approximates a metric space’s n-dimensional manifold in three steps. First, the algorithm calculates the k-nearest neighbors of each point. Second, each neighborhood is promoted to a local simplicial complex. Third, the algorithm searches for the n-dimensional distribution of points that best approximates the original simplicial complex. This search is conducted over successive iterations, with the initial position of low-dimensional points derived from a random distribution.
To better understand the distribution of points in the resulting embedding spaces, we transformed point clouds into a Gaussian distribution and estimated clusters via a watershed transform. An illustration of watershed clustering is found in part B of Figure 2. The Gaussian grid size was initially set to 256 × 256. The number of grid points in the dimension having the smaller range was trimmed to maintain the aspect ratio of the embedding. The Gaussian kernel bandwidth factor was set to 0.08. The watershed transform marks the local densities as cluster centers, then grows clusters by adding adjacent pixels whose directed gradient is maximal in the direction of the cluster center.
Subsampling and Bootstrapping
In the present study, we were concerned with resolving two-dimensional embeddings that generalize across volunteers, while also segmenting experimental stimuli. One challenge in the way of resolving this ideal embedding is that brain states tend to change slowly through time. An example of this issue is shown in Supplemental Figure 0.2 for the metric between TVFC edges. Temporal similarities draw the distance between adjacent time points closer than the distance between two different volunteers experiencing the same stimuli. For dimensionality-reduction algorithms like UMAP and tSNE that leverage nearest neighbor approximations, the attractive force between temporally adjacent time points can force the embedding to overemphasize information about the order of the scanning sessions when attempting to resolve population-wise brain states (see, e.g., Supplemental Figure 0.2).
To help disentangle graphs representing intrinsically similar brain states from those that are simply autocorrelated, we subsampled our dataset in several ways. Statistics over the results could then be generated via bootstrapping, with 256 random permutations of data subsamplings.
Volunteer-wise scans were split into three equal groups. The first group supplied data to train the UMAP embedding. The second group supplied data to segment the space of the embedding into watershed clusters. The third group supplied data to test how metric spaces segment brain states during contrasting experimental conditions.
The data were also split in time. To balance the number of time points from each experimental condition, during each permutation, one of each of the repeated mathematics and memory tasks was removed, at random, from each volunteer’s dataset. In addition, embeddings were trained using a subsample of 6*100 time points from the remaining 6*537 time points from each of the six volunteers. Each batch of the six batches of 100 training points were selected to emphasize maximal temporal separation within each scan.
Statistical Analysis
Watershed clusters provide a data-driven basis for hypothesis testing over the likelihood that certain metadata labels—that is, volunteer number, stimulus type, and valenced performance level—were more or less likely to be found in a given embedding region. For all statistical tests, we generated null distributions by randomly permuting the labels of cluster points (e.g., volunteer number, experimental condition) 300 times. This procedure obtained a mean and standard deviation that indicate the labels we should expect to find by chance in any given watershed cluster. The significance threshold was always set to an α = 0.05. Bonferroni correction was applied relative to the number of simultaneous tests performed. The total number of clusters was O(100) in each embedding.
Tests related to volunteer colocalization calculated significant volunteer-wise underrepresentation in each cluster (left-tail test, Bonferroni correction equal to the number of volunteers [six] times the number of clusters per embedding (O(100))). Tests related to stimulus colocalization identified clusters that were more than likely to contain time periods during each stimulus condition (right-tail test, Bonferroni correction equal to the number of stimulus conditions [five] times the number of clusters in each embedding (O(100))). Tests related to task performance were conducted for each task condition independently, and were confined only to the clusters that were significantly more likely to contain points from the task being tested (two-tailed test, null distribution is the mean and standard deviation of task performance, Bonferroni correction equal to the number of clusters showing significantly many within-condition time points (O(10))).
Secondary Statistics Over Mean Graphs
It is possible to generate mean FC matrices from select time points of TVFC graphs. For instance, the mean TVFC graph over all time points reveals the average coherence between regions. Condition-dependent mean graphs such as that over all rest conditions may also be calculated. In the present study, we were particularly interested in mean graphs calculated with respect to within-task performance levels.
Given the identification of clusters significantly associated with task performance, for each task, and for each cluster associated with the task, we tested whether the task-specific points within that cluster contained significantly more or fewer correct responses than the mean percentage of correct responses for all of that task’s time points (no Bonferroni correction). For each task, every time point from clusters having significantly more correct responses is stored into a task-specific list. The same process occurs for clusters showing fewer correct responses. The mean TVFC graph from each list constitutes a “mean performance graph.” Mean performance graphs may be compared with one another to measure a difference between apparent brain states.
ACKNOWLEDGMENTS
This work could not have proceeded without the insights from Dr. Alessio Medda on wavelet theory. Nor would this study have been possible without the essential work of the many who participate in this global community.
SUPPORTING INFORMATION
Supporting information for this article is available at https://doi.org/10.1162/netn_a_00190.
AUTHOR CONTRIBUTIONS
Jacob Billings: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Project administration; Resources; Software; Validation; Visualization; Writing – original draft; Writing – review & editing. Manish Saggar: Conceptualization; Data curation; Methodology; Validation; Writing – review & editing. Jaroslav Hlinka: Funding acquisition; Investigation; Methodology; Resources; Supervision; Validation; Writing – review & editing. Shella Keilholz: Conceptualization; Methodology; Resources; Validation; Writing – review & editing. Giovanni Petri: Conceptualization; Funding acquisition; Methodology; Project administration; Resources; Supervision; Validation; Writing – original draft; Writing – review & editing.
TECHNICAL TERMS
- Topography:
The vector of a multivariate signal measuring a system at a given instant.
- Geometry:
The study of distance functions.
- Graph:
A finite set of nodes, equipped with a finite set of edges.
- Network:
A graph wherein edges convey that the property “interacts with.”
- Topological space:
A totality of two elements: a set of points, and a topology on this set.
- Simplex:
The k-dimensional convex hull of a clique of k + 1 nodes.
- Topology:
A collection of subsets of a set.
- Simplicial complex:
A collection of multiple simplicies.
- Homology:
A k-dimensional hole bounded by cyclically connected (k + 1)-dimensional simplices.
- Filtration:
Varying the threshold parameter of a weighted graph to resolve simplicial complexes with altered homology.
- Clique:
A set of k connected nodes.
- Functor:
A function between categories that maps objects to objects and morphisms to morphisms. Functors exist in both covariant and contravariant types.
REFERENCES
Author notes
Competing Interests: The authors have declared that no competing interests exist.
Handling Editor: Andrew Zalesky