6  Analysis of metastatic disease profiles

I proceed with the multivariate analysis of the metastatic disease profiles between dnMBC and rMBC. We agreed that we characterize the differences only for the more recent cases, at least 2015. You will see the original analysis performed on the whole cohort (from 2000) and then a secondary analysis on the sub-cohort from 2015. We will first characterize the prevalence of the single sites and then of the co-occurrence.

For the co-occurrence we then proceed to a multivariate analysis: multiple correspondence analysis and hierarchical clustering on principal components.

6.0.1 Relative frequencies of site occurrence

The following figures represent the relative frequencies of occurrence of each site for rM0 and dnM1 patients.

relative frequency for rM0

relative frequency for dnM1

The following figure represents the comparison of the occurrence sites between rM0 and dnM1.

The following table summarize the differences between rM0 and dnM1 in terms of number of sites involved in the metastatic disease and the individual site where metastases occurred.

M0
(N=189)
M1
(N=180)
Overall
(N=369)
Number of sites
1 111 (58.7%) 86 (47.8%) 197 (53.4%)
2 48 (25.4%) 48 (26.7%) 96 (26.0%)
3 20 (10.6%) 30 (16.7%) 50 (13.6%)
4 9 (4.8%) 12 (6.7%) 21 (5.7%)
5 1 (0.5%) 2 (1.1%) 3 (0.8%)
8 0 (0%) 1 (0.6%) 1 (0.3%)
6 0 (0%) 1 (0.6%) 1 (0.3%)
bones
no 62 (32.8%) 42 (23.3%) 104 (28.2%)
yes 127 (67.2%) 138 (76.7%) 265 (71.8%)
abdomen_extrahepatic
no 156 (82.5%) 134 (74.4%) 290 (78.6%)
yes 33 (17.5%) 46 (25.6%) 79 (21.4%)
liver
no 159 (84.1%) 156 (86.7%) 315 (85.4%)
yes 30 (15.9%) 24 (13.3%) 54 (14.6%)
lymph_nodes
no 150 (79.4%) 134 (74.4%) 284 (77.0%)
yes 39 (20.6%) 46 (25.6%) 85 (23.0%)
pleura
no 177 (93.7%) 165 (91.7%) 342 (92.7%)
yes 12 (6.3%) 15 (8.3%) 27 (7.3%)
reproductive_organs
no 175 (92.6%) 165 (91.7%) 340 (92.1%)
yes 14 (7.4%) 15 (8.3%) 29 (7.9%)
brain_nonleptomeningeal
no 181 (95.8%) 178 (98.9%) 359 (97.3%)
yes 8 (4.2%) 2 (1.1%) 10 (2.7%)
lungs
no 179 (94.7%) 171 (95.0%) 350 (94.9%)
yes 10 (5.3%) 9 (5.0%) 19 (5.1%)
leptomeningeal
no 181 (95.8%) 180 (100%) 361 (97.8%)
yes 8 (4.2%) 0 (0%) 8 (2.2%)
pericard
no 188 (99.5%) 179 (99.4%) 367 (99.5%)
yes 1 (0.5%) 1 (0.6%) 2 (0.5%)
skin
no 179 (94.7%) 161 (89.4%) 340 (92.1%)
yes 10 (5.3%) 19 (10.6%) 29 (7.9%)
spleen
no 187 (98.9%) 180 (100%) 367 (99.5%)
yes 2 (1.1%) 0 (0%) 2 (0.5%)
retroperitoneum
no 184 (97.4%) 179 (99.4%) 363 (98.4%)
yes 5 (2.6%) 1 (0.6%) 6 (1.6%)
adrenal
no 185 (97.9%) 175 (97.2%) 360 (97.6%)
yes 4 (2.1%) 5 (2.8%) 9 (2.4%)
muscle
no 187 (98.9%) 175 (97.2%) 362 (98.1%)
yes 2 (1.1%) 5 (2.8%) 7 (1.9%)
biochemical
no 187 (98.9%) 180 (100%) 367 (99.5%)
yes 2 (1.1%) 0 (0%) 2 (0.5%)
thyroid
no 188 (99.5%) 178 (98.9%) 366 (99.2%)
yes 1 (0.5%) 2 (1.1%) 3 (0.8%)
orbita
no 189 (100%) 177 (98.3%) 366 (99.2%)
yes 0 (0%) 3 (1.7%) 3 (0.8%)
eye
no 189 (100%) 179 (99.4%) 368 (99.7%)
yes 0 (0%) 1 (0.6%) 1 (0.3%)
bone_marrow
no 189 (100%) 171 (95.0%) 360 (97.6%)
yes 0 (0%) 9 (5.0%) 9 (2.4%)
bladder
no 189 (100%) 179 (99.4%) 368 (99.7%)
yes 0 (0%) 1 (0.6%) 1 (0.3%)
mediastinum
no 189 (100%) 178 (98.9%) 367 (99.5%)
yes 0 (0%) 2 (1.1%) 2 (0.5%)

We can then operate on a co-occurrence matrix. This matrix displays the absolute frequency of the co-occurrence of metastases in two different sites, for each site-pair.

We can display these complex matrices with non-directed graphs. In this graph, each node is a site of metastasis, and each arrow represents a single co-occurrence. The more the site occur together, the wider is the arrow connecting the two nodes. Nodes that are more tightly connected (or part of the same “neighborhood”) tend to be grouped closer together. It minimizes edge crossing and tries to make edge lengths uniform, revealing clusters or central hubs in the network. The size represents the Degree of the node. This counts the number of unique connections a node has. A larger node means that entity co-occurs with a wider variety of other entities. A small node has fewer unique co-occurrences. The same holds true for the color of the nodes.

The most striking feature of the M0 network is its fragmentation. There is a distinct, isolated sub-network in the top left consisting of brain_nonleptomeningeal and leptomeningeal metastases. This indicates that in the recurrent setting, Central Nervous System (CNS) progression often occurs independently of the visceral metastatic burden. In contrast, the M1 network presents as a single, highly connected component. There are no isolated islands; every metastatic site is linked to the main cluster. This suggests that De Novo metastasis is a more systemic event where disease burden is correlated across all affected sites simultaneously.

The gray connecting lines (edges) in the M1 network appear thicker and more numerous compared to M0. This visualizes the results of your Jaccard analysis: sites in M1 are “co-occurring” more frequently and with stronger statistical association than in M0.

Bone_marrow and Muscle appear as distinct nodes in the M1 network but are absent or below the plotting threshold (3) in M0. The Retroperitoneum is visible in the visceral cluster of M0 but is not a labeled node in this view of M1, suggesting it may be a more specific site of failure in recurrent disease rather than a primary site in De Novo presentation.

We can also compare the difference in the co-occurrence of metastases looking at the matrices of conditional probability or the matrix that report the contrasts between the two. Each cell of the matrix defines a conditional probability of occurrence: ‘conditional on having a metastasis on site A, what is the probability of having a metastasis also in site B?’. In the upper left triangle, the conditioning site are reported in the Y-axis. For the lower right triangle, the conditioning site are reported in the X-axis.

Finally, we can display the strenght of association by considering the jaccard index. I perform a comparative analysis of Jaccard Similarity between two conditions. It calculates how much “sites” overlap with one another in two different datasets, visualizes those overlaps, and finally plots the change in similarity between the two conditions.

The big matrices of co-occurrence are structured as follow: off-diagonals cells \((i, j)\) contain the count of the intersection (overlap) between site \(i\) and site \(j\). The Diagonal \((i, i)\) contains the total count for site \(i\) (since \(Intersection(A, A) = Total(A)\)). To get the Jaccard Index we compute \(J(A,B) = \frac{|A \cap B|}{|A \cup B|}\)

The following representations keep the lower triangle of the matrix to avoid duplicate visual information, remove the diagonal (self-similarity is always 1) and remove, values \(> 0.8\). This is likely done to increase contrast for lower/mid-range values, preventing the color scale from being “washed out” by perfect matches. In the graphs purple color means high similarity, whereas white low similarity. I then represent the difference in the jaccard similarities between M0 and M1. In the plot

  • Red (High/Positive): The value is positive, meaning similarity increased in M1 (sites are more alike).

  • Blue (Low/Negative): The value is negative, meaning similarity decreased in M1 (sites are diverging).

  • White: No change in similarity.

6.1 Multiple correspondence analysis

To move beyond pairwise associations and identify broader systemic patterns of metastasis, Multiple Correspondence Analysis (MCA) was performed on the binary presence/absence data of major metastatic sites (Skin, Reproductive Organs, Lymph Nodes, Lungs, Liver, Bones, Extrahepatic Abdomen).

  • Dimensionality Reduction: The multidimensional clinical data was reduced to principal components (dimensions) to visualize the underlying structure of the data.

  • Inertia Correction: To address the characteristic inflation of noise in standard MCA, a Benzécri correction was applied to the raw eigenvalues. This yielded a recalibrated estimate of the explained variance (inertia) for each dimension, ensuring that only significant signals were interpreted.

  • Variable Contribution: The association between specific metastatic sites and the resulting dimensions was assessed using the correlation ratio (\(\eta^2\)), identifying which organ sites were the primary drivers of variance in the dataset.

6.1.1 Patient Clustering (HCPC)

To classify patients into distinct clinical phenotypes based on their metastatic profiles:

  • Hierarchical Clustering: Hierarchical Clustering on Principal Components (HCPC) was applied to the coordinates derived from the MCA.

  • Subgroup Identification: The algorithm partitioned the patient population into four distinct clusters based on the similarity of their metastatic spread.

6.1.2 M0

This plot represents the percentage of variance explained by the first three dimension. The first dimension seems to explain the major part of the variance.

The following plot represents the correlation between the dimension and each site.

The following interactive plot represents the multidimensional space of the three prinicipal component to display the association between metastatic site.

The following interactive plot represents the projection of the patients in the multi-dimensional space defined by the principal component. Patients are colored depending on the cluster identified.

We describe the profiles identified considering the prevalence of each site within the cluster.

6.1.3 M1

The same analysis was performed for M1.

6.1.4 Merged

The following analysis was conducted with a merged matrix (M0 and M1). The association of M with the cluster identified has been described with the representation of the prevalence of the sites in the cluster. In this case, prevalence of M1 was computed to see whether specific clusters contain more M1. No differences were found.

6.2 Same analysis, but from 2015

6.2.1 Relative frequencies of site occurrence

relative frequency for rM0

relative frequency for dnM1

M0
(N=113)
M1
(N=104)
Overall
(N=217)
Number of sites
1 71 (62.8%) 46 (44.2%) 117 (53.9%)
3 10 (8.8%) 14 (13.5%) 24 (11.1%)
2 25 (22.1%) 30 (28.8%) 55 (25.3%)
4 6 (5.3%) 10 (9.6%) 16 (7.4%)
8 0 (0%) 1 (1.0%) 1 (0.5%)
5 1 (0.9%) 2 (1.9%) 3 (1.4%)
6 0 (0%) 1 (1.0%) 1 (0.5%)
bones
no 34 (30.1%) 21 (20.2%) 55 (25.3%)
yes 79 (69.9%) 83 (79.8%) 162 (74.7%)
abdomen_extrahepatic
no 91 (80.5%) 76 (73.1%) 167 (77.0%)
yes 22 (19.5%) 28 (26.9%) 50 (23.0%)
liver
no 96 (85.0%) 93 (89.4%) 189 (87.1%)
yes 17 (15.0%) 11 (10.6%) 28 (12.9%)
lymph_nodes
no 92 (81.4%) 70 (67.3%) 162 (74.7%)
yes 21 (18.6%) 34 (32.7%) 55 (25.3%)
pleura
no 104 (92.0%) 95 (91.3%) 199 (91.7%)
yes 9 (8.0%) 9 (8.7%) 18 (8.3%)
reproductive_organs
no 105 (92.9%) 94 (90.4%) 199 (91.7%)
yes 8 (7.1%) 10 (9.6%) 18 (8.3%)
brain_nonleptomeningeal
no 108 (95.6%) 103 (99.0%) 211 (97.2%)
yes 5 (4.4%) 1 (1.0%) 6 (2.8%)
leptomeningeal
no 109 (96.5%) 104 (100%) 213 (98.2%)
yes 4 (3.5%) 0 (0%) 4 (1.8%)
pericard
no 112 (99.1%) 103 (99.0%) 215 (99.1%)
yes 1 (0.9%) 1 (1.0%) 2 (0.9%)
lungs
no 109 (96.5%) 100 (96.2%) 209 (96.3%)
yes 4 (3.5%) 4 (3.8%) 8 (3.7%)
spleen
no 112 (99.1%) 104 (100%) 216 (99.5%)
yes 1 (0.9%) 0 (0%) 1 (0.5%)
muscle
no 111 (98.2%) 99 (95.2%) 210 (96.8%)
yes 2 (1.8%) 5 (4.8%) 7 (3.2%)
biochemical
no 112 (99.1%) 104 (100%) 216 (99.5%)
yes 1 (0.9%) 0 (0%) 1 (0.5%)
skin
no 110 (97.3%) 95 (91.3%) 205 (94.5%)
yes 3 (2.7%) 9 (8.7%) 12 (5.5%)
thyroid
no 112 (99.1%) 102 (98.1%) 214 (98.6%)
yes 1 (0.9%) 2 (1.9%) 3 (1.4%)
adrenal
no 112 (99.1%) 101 (97.1%) 213 (98.2%)
yes 1 (0.9%) 3 (2.9%) 4 (1.8%)
retroperitoneum
no 112 (99.1%) 103 (99.0%) 215 (99.1%)
yes 1 (0.9%) 1 (1.0%) 2 (0.9%)
bone_marrow
no 113 (100%) 97 (93.3%) 210 (96.8%)
yes 0 (0%) 7 (6.7%) 7 (3.2%)
orbita
no 113 (100%) 103 (99.0%) 216 (99.5%)
yes 0 (0%) 1 (1.0%) 1 (0.5%)
bladder
no 113 (100%) 103 (99.0%) 216 (99.5%)
yes 0 (0%) 1 (1.0%) 1 (0.5%)
mediastinum
no 113 (100%) 102 (98.1%) 215 (99.1%)
yes 0 (0%) 2 (1.9%) 2 (0.9%)

6.3 Multiple correspondence analysis

6.3.1 M0

6.3.2 M1

6.3.3 Merged