ILC Project

Author

Giacomo Biganzoli

Published

November 20, 2025

0.1 Preliminary Statistical Analysis Plan

As ILC data is not available for the moment, I focused on the metadata file that was sent on November 12th. This report formalizes the aims of the project and propose a preliminary analysis plan to be discussed together.

Tip

Few instructions on how to read this report (and the ones will come…)

This report was built in HTML to exploit its features.

  • First off, above every object displayed (tables, figures or results) the R code is reported for reproducibility. You just need to click on the arrow before the word ‘Code’ to visualize it. You can also copy and paste it by clicking at the icon on the top right corner of the folding box.

  • Below each table you can find a button to download and save the table on your laptop and possibly perform additional analyses. You can also download specific figures of the report by right-clicking on it and choosing ‘Save image as’; or you can visualize it better by clicking on ‘Open image in a new tab’.

  • If you hover over hyperlinks a preview of the object will appear.

  • I enabled the possibility to highlight and comment the report directly on the page using Hypothes.is. I just need your usernames to add you to a common group.

The metadata for M0 patients is reported in Table 1.

Table 1
Variable Group Entry field Explanation
Patient specific factors patient_ID pseudonimized ID
NA date_of_diagnosis date in dd/mm/yyyy of first malignant biopsy if unknown date of first contact regarding invasive lobular cancer
NA method_of_detection was the tumor found by population based screening or did the patient present with symptoms
NA date_of_birth date in dd/mm/yyyy of birth
NA age_at_diagnosis (y) age of patient in years at the time of primary diagnosis
NA age_category category of age at the time of primary diagnosis (<40, 40 - 49, 50 - 59, 60 - 69, 70 - 79, ≥80)
NA gender M= male; F= female
NA height (m) in meters, at time of diagnosis primary
NA weight (kg) in kg, at time of diagnosis primary
NA BMI calculated, at diagnosis primary (weight (kg)/height (m)^2)
NA BMI_category category of calculated BMI ( <25, 25 - 30, >30)
NA menopausal_status pre or postmenopausal at timing of diagnosis. If in transition, to be considered as premenopausal (categories = pre- and perimenopausal, postmenopausal).
NA body_surface_area calculated, at diagnosis primary (0.20247 x height (m)^0.725 x weight (kg)^0.425)
NA smoking is the patient a present smoker (= active), past smoker (=former) or did he/she never smoke (= no) at the time of first diagnosis)
NA alcohol_abuse is there a history of alcohol abuse reported (yes/no) at the time of diagnosis
NA hypertension is there a personal history of arterial hypertension at primary diagnosis (yes/no)
NA hyperlipidemia is there a personal history of hyperlipidemia at time of diagnosis (yes/no)
NA diabetes is there a personal history of diabetes + type at the time of diagnosis (type 1, type 2 or no)
NA oral_anticonceptive_use has the patient ever used oral contraceptives or are they still using them at the time of diagnosis (former, active, no)?
NA pregnancy_A number of abortus/miscarriage at the time of diagnosis
NA pregnancy_P number of partus/child-births at the time of diagnosis
NA pregnancy_G number of gravidus/pregnancies at the time of diagnosis (=A+P)
NA Age.FFTP age at first child birth, if applicable (if no child birth = nulliparous)
NA Interval.1st.FTP age difference between age diagnosis and age first child birth (if no child birth = NA)
NA hormone_replacement has the patient ever used hormone replacement therapy or are they still using hormone replacement therapy (former, active, no)
NA familial_history_breast_ovary is there a history of breast or ovarian cancer in the family?
NA familial_history_breast_ovary_line are the affected relatives first (1), second (2) or thirth (3) degree relatives? If no history: NA
NA germline_mutation_testing_performed was there a test done to see if there are any germline mutations present associated with higher breast cancer risk?
NA germline_mutation_testing_year_most_recent_test if applicable year that the most recent test was performed, if not applicable NA
NA germline_mutation_testing_result if applicable gene that was mutated, if tested but no mutation = negative, if not tested = NA
Radiology visible_on_mammogram was the primary tumor seen on mammogram (yes/no), NA if no mammogram was performed
NA diameter_mammogram_at_diagnosis diameter in mm of largest focus on mammogram, if not seen on mammogram NA
NA number_of_suspected_foci_mammogram number of foci seen on mammogram numerical, if not described properly but multifocal = multiple, if not seen on mammogram = NA
NA breast_density_score_mammogram Birads-score of breast density on mammogram (fatty breast = Type A, dense breast = Type D), not seen on mammogram = NA, not reported by the radiologist = NA
NA diameter_ultrasound_at_diagnosis diameter in mm of largest focus on ultrasound, if not seen on ultrasound NA
NA Number_of_adenopathies_expected_on_ultrasound number of adenopathies seen on ultrasound numerical, if not described properly but >1 = multiple, if not seen on ultrasound = NA
NA MRI_breast_performed Was an MRI of the breast performed at diagnosis (yes/no)
NA diameter_MRI_at_diagnosis diameter in mm of largest focus on MRI, if not performed/reported NA
NA number_of_suspected_foci_MRI number of foci seen on MRI numerical, if not described properly but multifocal = multiple, if not performed/reported = NA
NA breast_density_score_MRI Birads-score of breast density on mammogram (fatty breast = Type A, dense breast = Type D), not performed/reported = NA
NA Number_of_adenopathies_expected_on_MRI_breast number of adenopathies seen on MRI numerical, if not described properly but >1 = multiple, if not performed/reported = NA
Primary tumor, pre-treatment primary_laterality left or right or bilateral
NA TNM_cT_at_diagnosis clinical T-classification according to TNM classification of malignant tumors
NA TNM_cN_at_diagnosis clinical N-classification according to TNM classification of malignant tumors
NA TNM_cM_at_diagnosis clinical M-classification according to TNM classification of malignant tumors
NA diameter_radiology_at_diagnosis (mm) of largest focus, in mm (largest reported diameter on mammogram/ultrasound/MRI)
Neoadjuvant therapy neo_adjuvant_therapy did the patient recieve any kind of therapy (endocrine, immunetherapy, chemotherapy) prior to surgery (yes/no)
Surgery surgery_date date in dd/mm/yyyy of the surgery (first surgery of the primary tumor), no surgery performed = NA
NA surgery_type_breast mastectomy vs tumorectomy (= breast conserving surgery), no surgery performed = NA
NA surgery_type_axilla sentinel lymph node biopsy (SLN) vs axillary clearance (ALN) or SN followed by ALN in same or subsequent surgery (SLN + ALN), no surgery performed = NA
Pathology resection specimen TNM_pT_resection_specimen T-classification according to TNM classification of malignant tumors
NA TNM_pN_resection_specimen N-classification according to TNM classification of malignant tumors
NA diameter_pathology_resection_specimen (mm) of largest focus, in mm, if no surgery performed = NA, if surgery performed externally and no reports available = unknown
NA tumor_grade_resection_specimen histological grade (bloom-score) reported on resection specimen. If multifocal, then largest focus is considered
NA resection_margin_resection_specimen was the tumor completely resected? If no tumor in resection margins = negative, if doubt = dubious (< 1 mm), if tumor in resection margins = positive, if no surgery/not reported = NA
NA ER_Interpretation Is the estrogen receptor expression considered positive or negative on the largest focus
NA PR_Interpretation Is the progesteron receptor expression considered positive or negative on the largest focus
NA HER2_Interpretation Is the expression of HER2 considered positive or negative on the largest focus
NA Ki67_resection_specimen (%) value of Ki67 if available in parhology report
NA presence_DCIS_resection_specimen is DCIS present in the resection specimen (yes/no)
NA presence_LCIS_resection_specimen is LCIS present in the resection specimen + type (yes, classical LCIS; yes, non-classical LCIS; no; unknown; NA)
NA total_ALN_removed total amount of lymph nodes prelevated (Sentinel included)
NA positive_ALN total amount of positive lymph nodes (sentinel included)
NA Micro_vs_macrometastases if positive lymph nodes are present are they micro- or macro-invaded?
NA ALN_maxdiameter maximal diameter in mm of metastasis to lymph node if applicable
NA HER2_FISH_resection_specimen HER2-FISH status (amplification/no amplicifation/NA) on resection specimeny, if applicable, if not repeated on resection state, score of biopsy
NA HER2_ratio_resection_specimen HER2-FISH ratio on resection specimeny, if applicable, if not repeated on resection state, score of biopsy
Adjuvant therapy radiotherapy radiotherapy performed at site of surgery? (yes/no)
NA adjuvant_chemotherapy was chemotherapy given in adjuvant setting
NA adjuvant_HER2 was HER2-therapy drgiven in adjuvant setting
NA adjuvant_endocrinetherapy did the patient get post-surgery endocrine therapy?
Metastatic disease meta_brain_nonleptomeningeal_first_metastases occurence of brain metastases with the exception of leptomeningeal disease at first diagnosis of metastatic setting (yes/no)
NA meta_leptomeningeal_first_metastases occurence of leptomeningeal metastases at first diagnosis of metastatic setting (yes/no)
NA meta_bones_first_metastases occurence of bone metastases at first diagnosis of metastatic setting (yes/no)
NA meta_skin_first_metastases occurence of skin metastases at first diagnosis of metastatic setting (yes/no)
NA meta_lungs_first_metastases occurence of lung metastases at first diagnosis of metastatic setting (yes/no)
NA meta_liver_first_metastases occurence of liver metastasesat first diagnosis of metastatic setting (yes/no)
NA meta_abdomen_extrahepatic_first_metastases occurence of abdominal (extrahepatical) metastases at first diagnosis of metastatic setting (yes/no)
NA meta_reproductive_organs_first_metastases occurence of metastases in reproductive organs at first diagnosis of metastatic setting (yes/no)
NA meta_lymph_nodes_first_metastases occurence of lymph node metastases at first diagnosis of metastatic setting (yes/no)
NA meta_other_first_metastases occurence of other metastases at first diagnosis of metastatic setting, + site (e.g. yes: pleura)
NA Systemic_treatment_firstline Type of treatment if applicable
NA surgery_1st_line_metastatic Surgery (resection) performed of a metastatis at first metastatic diagnosis?
NA radiotherapy_1st_line_metastatic Radiotherapy performed of a metastatis at first metastatic diagnosis?
NA first_progression_distant_disease_metastatic Was there already progression of the metastatic disease reported (yes/no)?
NA date_first_progression_metastatic if applicable date first progression was reported (dd/mm/yyyy)
NA meta_brain_nonleptomeningeal_atfirstprogression progression of brain metastases with the exception of leptomeningeal disease at first progression of metastatic setting (yes/no)
NA meta_leptomeningeal_atfirstprogression progression of leptomeningeal metastases at first progression of metastatic setting (yes/no)
NA meta_bones_atfirstprogression progression of bone metastases at first progression of metastatic setting (yes/no)
NA meta_skin_atfirstprogression progression of skin metastases at first progression of metastatic setting (yes/no)
NA meta_lungs_atfirstprogression progression of lung metastases at first progression of metastatic setting (yes/no)
NA meta_liver_atfirstprogression progression of liver metastasesat first progression of metastatic setting (yes/no)
NA meta_abdomen_extrahepatic_atfirstprogression progression of abdominal (extrahepatical) metastases at first progression of metastatic setting (yes/no)
NA meta_reproductive_organs_atfirstprogression progression of metastases in reproductive organs at first progression of metastatic setting (yes/no)
NA meta_lymph_nodes_atfirstprogression progression of lymph node metastases at first progression of metastatic setting (yes/no)
NA meta_other_atfirstprogression progression of other metastases at first progression of metastatic setting, + site (e.g. yes: pleura)
NA systemic_treatment_secondline Type of treatment if applicable
NA radiotherapy_2nd_line_metastatic Radiotherapy performed of a metastatis at first progression?
NA number_of_lines_metastatic Number of treatments a patient received due to progression at the time of last database update
Recurrences and death date_last_update_file_database cut off date of last update (dd/mm/yyyy)
NA locoregional_recurrence did patient present with locoregional recurrence (i.e. ipsilateral breast and/or axilla)? yes or no
NA date_locoregional_recurrence dd/mm/yyyy if applicable
NA recurrence_contralateral_breast did patient present with controlateral recurrence (i.e. contralateral breast and/or axilla)? yes or no
NA date_recurrence_contralateral_breast dd/mm/yyyy if applicable
NA distant_recurrence occurence of metastasis after surgery for primary (=cM0 at diagnosis). Yes or no?
NA date_distant_recurrence dd/mm/yyyy if applicable
NA death is the patient deceased? Yes or no
NA date_of_death date in dd/mm/yyyy of death if applicable otherwise NA
NA cause_of_death is the death realted to breast cancer or not? (breast cancer related, not breast cancer related, unknown or NA)
NA date_last_FU date in dd/mm/yyyy of last follow up (in own center, with other physician or in other centre with clear communication to your center)
NA date_last_FU_Leuven date in dd/mm/yyyy of last follow up for breast cancer in UZ Leuven

The metadata for M1 patients is reported in Table 2.

Table 2
Variable Group Entry field Explanation
Patient specific factors patient_ID pseudonimized ID
NA date_of_diagnosis date in dd/mm/yyyy of first malignant biopsy if unknown date of first contact regarding invasive lobular cancer
NA method_of_detection was the tumor found by population based screening or did the patient present with symptoms (of primary or metastases not dinstiguished in the database)
NA date_of_birth date in dd/mm/yyyy of birth
NA age_at_diagnosis (y) age of patient in years at the time of primary diagnosis
NA age_category category of age at the time of primary diagnosis (<40, 40 - 49, 50 - 59, 60 - 69, 70 - 79, ≥80)
NA gender M= male; F= female
NA height (m) in meters, at time of diagnosis primary
NA weight (kg) in kg, at time of diagnosis primary
NA BMI calculated, at diagnosis primary (weight (kg)/height (m)^2)
NA BMI_category category of calculated BMI ( <25, 25 - 30, >30)
NA menopausal_status pre or postmenopausal at timing of diagnosis. If in transition, to be considered as premenopausal (categories = pre- and perimenopausal, postmenopausal).
NA body_surface_area calculated, at diagnosis primary (0.20247 x height (m)^0.725 x weight (kg)^0.425)
NA smoking is the patient a present smoker (= active), past smoker (=former) or did he/she never smoke (= no) at the time of first diagnosis)
NA alcohol_abuse is there a history of alcohol abuse reported (yes/no) at the time of diagnosis
NA hypertension is there a personal history of arterial hypertension at primary diagnosis (yes/no)
NA hyperlipidemia is there a personal history of hyperlipidemia at time of diagnosis (yes/no)
NA diabetes is there a personal history of diabetes + type at the time of diagnosis (type 1, type 2 or no)
NA oral_anticonceptive_use has the patient ever used oral contraceptives or are they still using them at the time of diagnosis (former, active, no)?
NA pregnancy_A number of abortus/miscarriage at the time of diagnosis
NA pregnancy_P number of partus/child-births at the time of diagnosis
NA pregnancy_G number of gravidus/pregnancies at the time of diagnosis (=A+P)
NA Age.FFTP age at first child birth, if applicable (if no child birth = nulliparous)
NA Interval.1st.FTP age difference between age diagnosis and age first child birth (if no child birth = NA)
NA hormone_replacement has the patient ever used hormone replacement therapy or are they still using hormone replacement therapy (former, active, no)
NA familial_history_breast_ovary is there a history of breast or ovarian cancer in the family?
NA familial_history_breast_ovary_line are the affected relatives first (1), second (2) or thirth (3) degree relatives? If no history: NA
NA germline_mutation_testing_performed was there a test done to see if there are any germline mutations present associated with higher breast cancer risk?
NA germline_mutation_testing_year_most_recent_test if applicable year that the most recent test was performed, if not applicable NA
NA germline_mutation_testing_result if applicable gene that was mutated, if tested but no mutation = negative, if not tested = NA
Radiology visible_on_mammogram was the primary tumor seen on mammogram (yes/no), NA if no mammogram was performed
NA diameter_mammogram_at_diagnosis diameter in mm of largest focus on mammogram, if not seen on mammogram NA
NA number_of_suspected_foci_mammogram number of foci seen on mammogram numerical, if not described properly but multifocal = multiple, if not seen on mammogram = NA
NA breast_density_score_mammogram Birads-score of breast density on mammogram (fatty breast = Type A, dense breast = Type D), not seen on mammogram = NA, not reported by the radiologist = NA
NA diameter_ultrasound_at_diagnosis diameter in mm of largest focus on ultrasound, if not seen on ultrasound NA
NA Number_of_adenopathies_expected_on_ultrasound number of adenopathies seen on ultrasound numerical, if not described properly but >1 = multiple, if not seen on ultrasound = NA
NA MRI_breast_performed Was an MRI of the breast performed at diagnosis (yes/no)
NA diameter_MRI_at_diagnosis diameter in mm of largest focus on MRI, if not performed/reported NA
NA number_of_suspected_foci_MRI number of foci seen on MRI numerical, if not described properly but multifocal = multiple, if not performed/reported = NA
NA Number_of_adenopathies_expected_on_MRI_breast number of adenopathies seen on MRI numerical, if not described properly but >1 = multiple, if not performed/reported = NA
Primary tumor, pre-treatment primary_laterality left or right or bilateral
NA TNM_cT_at_diagnosis clinical T-classification according to TNM classification of malignant tumors
NA TNM_cN_at_diagnosis clinical N-classification according to TNM classification of malignant tumors
NA TNM_cM_at_diagnosis clinical M-classification according to TNM classification of malignant tumors
NA diameter_radiology_at_diagnosis (mm) of largest focus, in mm (largest reported diameter on mammogram/ultrasound/MRI)
Pathology biopsy tumor_grade_biopsy_breast histological grade (bloom-score) reported on biopsy primary tumor. If multifocal, then largest focus is considered
NA ER_Interpretation_biopsy_breast Is the estrogen receptor expression considered positive or negative on the largest focus
NA PR_Interpretation_biopsy_breast Is the progesteron receptor expression considered positive or negative on the largest focus
NA HER2_Interpreation_biopsy_breast Is the expression of HER2 considered positive or negative on the largest focus
NA HER2_FISH_biopsy_breast HER2-FISH status (amplification/no amplicifation/NA) on biopsy of primary tumor if applicable
NA HER2_ratio_biopsy_breast HER2-FISH ratio on biopsy of primary tumor if applicable
Surgery surgery_performed_primary_tumor_breast did the patient get any type of surgery of the primary tumor/axilla? (Yes/no)
NA surgery_date date in dd/mm/yyyy of the surgery (first surgery of the primary tumor), no surgery performed = NA
NA surgery_type_breast mastectomy vs tumorectomy (= breast conserving surgery), no surgery performed = NA
NA surgery_type_axilla sentinel lymph node biopsy (SLN) vs axillary clearance (ALN) or SN followed by ALN in same or subsequent surgery (SLN + ALN), no surgery performed = NA
Other treatment primary radiotherapy radiotherapy performed at site of the primary tumor or axillary lymph nodes? (yes/no)
Metastatic disease meta_brain_nonleptomeningeal_first_metastases occurence of brain metastases with the exception of leptomeningeal disease at first diagnosis of metastatic setting (yes/no)
NA meta_leptomeningeal_first_metastases occurence of leptomeningeal metastases at first diagnosis of metastatic setting (yes/no)
NA meta_bones_first_metastases occurence of bone metastases at first diagnosis of metastatic setting (yes/no)
NA meta_skin_first_metastases occurence of skin metastases at first diagnosis of metastatic setting (yes/no)
NA meta_lungs_first_metastases occurence of lung metastases at first diagnosis of metastatic setting (yes/no)
NA meta_liver_first_metastases occurence of liver metastasesat first diagnosis of metastatic setting (yes/no)
NA meta_abdomen_extrahepatic_first_metastases occurence of abdominal (extrahepatical) metastases at first diagnosis of metastatic setting (yes/no)
NA meta_reproductive_organs_first_metastases occurence of metastases in reproductive organs at first diagnosis of metastatic setting (yes/no)
NA meta_lymph_nodes_first_metastases occurence of lymph node metastases at first diagnosis of metastatic setting (yes/no)
NA meta_other_first_metastases occurence of other metastases at first diagnosis of metastatic setting, + site (e.g. yes: pleura)
NA Systemic_treatment_firstline Type of treatment if applicable
NA surgery_1st_line_metastatic Surgery (resection) performed of a metastatis at first metastatic diagnosis?
NA radiotherapy_1st_line_metastatic Radiotherapy performed of a metastatis at first metastatic diagnosis?
NA first_progression_distant_disease_metastatic Was there already progression of the metastatic disease reported (yes/no)?
NA date_first_progression_metastatic if applicable date first progression was reported (dd/mm/yyyy)
NA meta_brain_nonleptomeningeal_atfirstprogression progression of brain metastases with the exception of leptomeningeal disease at first progression of metastatic setting (yes/no)
NA meta_leptomeningeal_atfirstprogression progression of leptomeningeal metastases at first progression of metastatic setting (yes/no)
NA meta_bones_atfirstprogression progression of bone metastases at first progression of metastatic setting (yes/no)
NA meta_skin_atfirstprogression progression of skin metastases at first progression of metastatic setting (yes/no)
NA meta_lungs_atfirstprogression progression of lung metastases at first progression of metastatic setting (yes/no)
NA meta_liver_atfirstprogression progression of liver metastasesat first progression of metastatic setting (yes/no)
NA meta_abdomen_extrahepatic_atfirstprogression progression of abdominal (extrahepatical) metastases at first progression of metastatic setting (yes/no)
NA meta_reproductive_organs_atfirstprogression progression of metastases in reproductive organs at first progression of metastatic setting (yes/no)
NA meta_lymph_nodes_atfirstprogression progression of lymph node metastases at first progression of metastatic setting (yes/no)
NA meta_other_atfirstprogression progression of other metastases at first progression of metastatic setting, + site (e.g. yes: pleura)
NA systemic_treatment_secondline Type of treatment if applicable
NA radiotherapy_2nd_line_metastatic Radiotherapy performed of a metastatis at first progression?
NA number_of_lines_metastatic Number of treatments a patient received due to progression at the time of last database update
Recurrences and death date_last_update_file_database cut off date of last update (dd/mm/yyyy)
NA death is the patient deceased? Yes or no
NA date_of_death date in dd/mm/yyyy of death if applicable otherwise NA
NA cause_of_death is the death realted to breast cancer or not? (breast cancer related, not breast cancer related, unknown or NA)
NA date_last_FU date in dd/mm/yyyy of last follow up (in own center, with other physician or in other centre with clear communication to your center)
NA date_last_FU_Leuven date in dd/mm/yyyy of last follow up for breast cancer in UZ Leuven
Warning

For M0 we have the dates that they have a recurrence (locoregional or distant) and the dates that they progress on their first treatment they got for metastatic disease. For M1 we only have the date they progress after their first treatment. So we don’t have a PFS2 for them

Tip

I suggest to add a column ‘variable label’ in which a unique common name to the variable id is assigned; and also a column ‘format’ particularly for codifying the categories of the categorical variables.

0.2 Aims & Objectives

Primary Aim 1: Epidemiological Trends To analyze the temporal trends in the prevalence of ILC (vs. all BC) and M1 ILC (vs. all ILC) and M1 ILC vs all M1 BC diagnosed at UZ Leuven over a defined period of time (2000-2023).

  • Hypothesis 1a: The relative percentage of ILC diagnoses has increased over time.

  • Hypothesis 1b: The relative percentage of de novo M1 ILC diagnoses has increased over time, correlated with the implementation of enhanced imaging and pathological diagnostics.

Warning

The total number of BCs diagnosed at UZ Leuven through the years 2000-2023 are included in a different dataset (MBC data by Chantal). For now, we can certainly analyze a temporal trend of the prevalence of patients diagnosed with M1 ILC over total ILC patients over a defined period (2000-2023).

Primary Aim 2: Baseline Characteristics To compare the baseline clinical and pathological characteristics of patients diagnosed with M0 ILC versus de novo M1 ILC.

  • Objective: To identify key differences (e.g., age, tumor size, grade, hormone receptor status) and compare these findings to documented differences in the general M0/M1 BC population.
Note

We need to define together what are the baseline variables of interest for this analysis. The ones I identified for the moment are age_at_diagnosis; gender; BMI; menopausal_status; body_surface_area; smoking; alchool_abuse; hypertension; hyperlipidemia; diabetes… others?

Primary Aim 3: Metastatic Disease Profile To compare the nature (e.g., site, distribution, burden) of metastatic disease between two cohorts:

  1. Patients diagnosed with de novo M1 ILC.

  2. Patients initially diagnosed with M0 ILC who later developed a first recurrence (M0 -> M1).

Note

We will consider these variables to characterize the site, also for de novo M1 patients, right? meta_brain_nonleptomeningeal_first_metastases, meta_leptomeningeal_first_metastases, meta_bones_first_metastases, meta_skin_first_metastases, meta_lungs_first_metastases, meta_liver_first_metastases, meta_abdomen_extrahepatic_first_metastases, meta_reproductive_organs_first_metastases, meta_lymph_nodes_first_metastases, meta_other_first_metastases, Systemic_treatment_firstline, surgery_1st_line_metastatic, radiotherapy_1st_line_metastatic, first_progression_distant_disease_metastatic, date_first_progression_metastatic, meta_brain_nonleptomeningeal_atfirstprogression, meta_leptomeningeal_atfirstprogression, meta_bones_atfirstprogression, meta_skin_atfirstprogression, meta_lungs_atfirstprogression, meta_liver_atfirstprogression, meta_abdomen_extrahepatic_atfirstprogression, meta_reproductive_organs_atfirstprogression, meta_lymph_nodes_atfirstprogression, meta_other_atfirstprogression

Primary Aim 4: Survival Outcomes To compare survival outcomes between patients diagnosed with M0 ILC and M1 ILC.

  • Objective: To analyze PFS1 (time to first recurrence/progression), PFS2 (time from first to second recurrence/progression), and Overall Survival (OS).
  • Findings will be benchmarked against existing literature (e.g., Mouabbi et al.).
Note

We will analyze PFS between de novo M1 and M1 that were M0. Mouabbi et al. discuss data about metastatic breast cancer (MBC). We won’t analyze PFS2 as we don’t have part of the survival experience for these patients.

0.3 Preliminary Statistical Analysis Plan

0.3.2 Modeling the prevalence

For both hypothesis 1a and 1b, a prevalence, defined as \(\frac{\text{number of specic cases}}{\text{total number of cases}}\) , is the target. First off, a raw representation of the prevalence (Y axis) and Time (X-axis) will be presented, considering a time unit of one year. This non-parametric representation is just exploratory and it would be useful considering specific landmark time-points where specific diagnostic techniques were introduced to visually asses whether they have a clear impact on this prevalence.

To model the prevalence we could opt for either a binomial log-linear model or a poisson model. Literature states that the poisson approach might be more robust especially with a low proportions of cases. With high proportion of cases, both the log-binomial model and the poisson model provide almost the same performances.

The poisson model is structured as follows. Say we are considering Hypothesis 1a, the prevalence is linked to a linear combination of parameters (the effect of the covariates) with a link function (exponential):

\[\frac{\text{Number of ILC diagnoses}}{\text{Total number of BC diagnoses}} = exp\bigl[\beta_0 + \beta_1Y_2 + \beta_2Y_3 ... + \epsilon \bigl]\]

where \(Y_1, Y_2, .. Y_n\) are indicator variables (dummy variables) of the specific year of diagnosis. The model can also be expressed as:

\[\log\bigl[\text{Number of ILC diagnoses}\bigl] = \beta_0 + \beta_1Y_2 + \beta_2Y_3 ...+ \log\bigl[\text{Total number of BC diagnoses }\bigl] + \epsilon\]

It follows that \(\beta_1\) is the regression coefficient associated to specific \(Y_2\), \(\beta_2\) to \(Y_3\) and so on. And that for example, the estimated prevalence of ILC diagnoses at Year 2 will be defined as: \[\log\bigl[\text{Number of ILC diagnoses}\bigl] = \beta_0 + \beta_1Y_2 +\log\bigl[\text{Total number of BC diagnoses }\bigl]\] that is \[\frac{\text{Number of ILC diagnoses}}{\text{Total number of BC diagnoses}} = exp\bigl[\beta_0 + \beta_1Y_2 \bigl]\].

The exponentiated coefficients will give the prevalence ratio associated to a specific year where the reference value is Year 1. Say we want to compare prevalence at Year 2 vs prevalence at Year 1 computing a prevalence ratio, the contrast is defined as follows:

\[\frac{\frac{\text{Number of ILC diagnoses}}{\text{Total number of BC diagnoses }}_{\text{Year 2}}}{\frac{\text{Number of ILC diagnoses}}{\text{Total number of BC diagnoses }}_{\text{Year 1}}} = \frac{exp\bigl[\beta_0 + \beta_1Y_2 \bigl]}{exp\bigl[\beta_0 \bigl]}\] \[\log\biggl[\frac{\frac{\text{Number of ILC diagnoses}}{\text{Total number of BC diagnoses }}_{\text{Year 2}}}{\frac{\text{Number of ILC diagnoses}}{\text{Total number of BC diagnoses }}_{\text{Year 1}}}\biggl] = \beta_0+\beta_1Y_2-\beta_0 = \beta_1\]

Formal hypothesis tests can be conducted. As an example, the wald test has a null hypothesis of \(\beta_1 = \beta_0\), that is prevalence at Year 2 is not different to the prevalence at 1 (prevalence ratio = 1). Naturally, other prevalence ratios can be obtained considering as a reference values different years.

In the modeling framework displayed above we are considering ‘Year of diagnosis’ as a categorical variable. A good advantage is its straightforward interpretability. However it might be of interest displaying estimates of the prevalence of diagnosis of ILC or de novo M1 patients as a smooth function of time. To such end we could consider the time of diagnosis as a numerical covariate and fit a model extending its association with a non-linear effect. Modeling a non-linear effect is important as otherwise we would assume that the proportion of cases increases linearly as a function of time.

This has an advantage as it shows how the proportion of diagnoses varies smoothly with time, but also it is convenient as it can display how the prevalence ratio varies as a function of time, considering a specific landmark point (month/year of the introduction of a specific diagnostic technique). The non-linear effect of time can be modeled with simple restricted cubic splines.

Naturally the model has to be extended, that is, we should adjust for the introduction of new diagnostic techniques and for the time from the diagnosis and the introduction of those diagnostic techniques. We can have a binary covariate for the introduction of a new technique (NDT), that is 0 for the years before introduction and 1 for the years after introduction; as well as continuous covariates that quantify the ‘time passed between the introduction of the techniques and the diagnosis’ (TFI).

In its basic form the model will be:

\[\log\bigl[\text{Number of ILC diagnoses}\bigl] = \beta_0 + \beta_1Y_2 + \beta_2Y_3 ...+ \beta_nNDT + \beta_{n+1} TFI+ \log\bigl[\text{Total number of BC diagnoses }\bigl] + \epsilon\]

Now, \(\beta_n\) tests the hypothesis that a new introduction of a diagnostic technique (NDT) has an independent effect on the prevalence of a specific diagnosis, whereas \(\beta_{n+1}\) tests the effect of the time from the introduction of the diagnostic technique (TFI) on the long term of this proportion.

0.3.3 Primary Aim 2: Baseline Characteristics

To compare the baseline clinical and pathological characteristics of patients diagnosed with M0 ILC versus M1 ILC.

We can adopt a case-control study structure for this aim. First off, we can explore the data by comparing the distribution of the clinical and pathological variables adopting the relative frequency plots and mosaic plots for the categorical variables and representations of the empirical cumulative distribution function (ECDF) and/or histograms for the numerical variables.

As an exploratory analysis, we could then study the joint association of the clinical and pathological variables considering multivariate analysis techniques. For numerical covariates we could adopt a principal component analysis (PCA) to study the joint association of the variables and project the M0 ILC and M1 ILC in the multivariate space to see if they show clear separation. The same could be performed with the categorical variables, adopting a Multiple Correspondence Analysis framework.

Depending on the available information, we could opt for ML models to assess the multivariate relationships between clinical and pathological characteristics (predictors) and the type of ILC (M1 vs M0, the response). ML models are optimal when the amount of information (sample size) is high and there are no previous strong hypothesis about the relationship between predictors and the response. In this case, they can automatically detect interaction effects and non-linear effects. However, they have a drawback of not being easily interpretable, although there are some exceptions like Random Forests and Multivariate Adaptive Regression Splines.

Otherwise, we could opt for a classic logistic regression framework where the relationship of the predictors with the response is manually specified. The logistic model models the (log)-odds of being a case (say M1 ILC), \(\frac{P(ILC = M1)}{1-P(ILC=M1)}\) as a function of the covariates \(\textbf{X}\). Namely,

\[ \log\biggl[{\frac{P(ILC = M1)}{1-P(ILC=M1)}}\biggl] = \beta_0 + \beta^t\textbf{X} + \epsilon \]

where \(\beta\) is the vector of coefficients associated to the covariates in the matrix \(\textbf{X}\). Simplifying, let’s say we want to compare the effect of age (<50, > or = 50), we can do this by specifiying the model:

\[\log\biggl[{\frac{P(ILC = M1)}{1-P(ILC=M1)}}\biggl] = \beta_0 + \beta_1(Age \ge 50) + \epsilon\]

The effect of \(Age \ge 50\) is obtained considering the ratio of the odds of being a case for a hypothetical patient with \(Age \ge 50\) vs a patient with \(Age < 50\).

\[\log\biggl[\frac{\frac{P(ILC = M1| Age \ge 50)}{1-P(ILC=M1| Age \ge 50)}}{\frac{P(ILC = M1| Age < 50)}{1-P(ILC=M1| Age < 50)}}\biggl] = \beta_0 + \beta_1(Age \ge 50) - \beta_0 = \beta_1\]

it follows that by exponentiating \(\beta_1\) we obtain the effects of Age in terms of the Odds Ratio. Formal hypothesis tests can be performed: again the Wald test has a null hypothesis of \(\beta_1 = \beta_0\), that is, there is no effect of being older than 50 years on the odds of being a M1 ILC.

0.3.4 Primary Aim 3: Metastatic Disease Profile

For this aim I propose to adopt the same analysis plan of the aim before, with some extras. For example, it would be of interest assessing whether there is a pattern in terms of the site, the burden or the distribution of the metastases and whether this pattern is associated with de novo M1 or patients that develop M1 afterwards. We could adopt a hierarchical clustering approach to see if relevant profiles of sites, distribution and burden are revealed and then link these patterns to the two types of patients.

0.3.5 Primary Aim 4: Survival Outcomes

First off, as a first exploratory step, we adopt the Kaplan-Meier estimator to compute the cumulative survival functions, also conditional on the categories of the categorical covariates considered in the analysis. These display the cumulative proportion of patients alive (for Overall Survival) or free of the first recurrence (for PFS1). This initial exploratory analysis is separated between de novo M1 patients and M0 patients.

Important

We will start from the time of first distant recurrence for M1 patients that were M0 at the time of diagnosis and from the time of diagnosis for de novo M1 patients. We are assuming that the history of M0 (the time spent free from distant metastasis ) does not affect the subsequent dynamics of progression (PFS1). We will add also a comparison between de novo M1 patients, M0-M1 patients and M0 patients that progressed to event different from distant recurrence.

For the direct comparison of de novo M1 and M1 patients we will model the progression free survival function with pseudo-observations conditional on being or not a de novo M1 patient, adjusting for the other prognostic factors.

Pseudo-observations are derived using a jackknife (leave-one-out) statistical procedure. For a specific quantity of interest, \(\theta(t)\), such as the survival probability at a fixed time \(\tau\), \(\theta(\tau) = S(\tau)\), the process is as follows:

  1. First, compute the non-parametric estimate \(\hat{\theta}(\tau)\) using the full dataset of \(n\) subjects. For the survival function, this would be the Kaplan-Meier estimate, \(\hat{S}_{KM}(\tau)\).

  2. Next, for each subject \(i=1, \dots, n\), temporarily remove that subject from the dataset and re-calculate the estimate on the remaining \(n-1\) subjects, yielding \(\hat{\theta}_{-i}(\tau)\).

  3. The pseudo-observation for subject \(i\) at time \(\tau\) is then defined as: \[ \hat{\theta}_i(\tau) = n \cdot \hat{\theta}(\tau) - (n-1) \cdot \hat{\theta}_{-i}(\tau) \]

In the absence of censoring, \(\hat{\theta}_i(\tau)\) would simply be the observed outcome for subject \(i\) (e.g., the indicator variable \(I(T_i > \tau)\). The pseudo-observations thus serve as a complete-data substitute for the potentially unobserved outcome in the presence of censoring. These generated values can then be used as the response variable in a Generalized Linear Model (GLM) or, more robustly, a Generalized Estimating Equation (GEE) framework to estimate the effect of covariates. This allows for direct modeling of the survival probability at one or more time points without any proportional hazards assumption.

At least for M0 patients we could analyze their event history with a multi-state framework. M0 patients enter in the follow-up at state 1 (disease free). M0 patients can then progress to M1 or to death. M1 patients can progress to a distant recurrence (State 3) or again death.

With this framework we can model the probability of being in a given state (state probability) and the transition probability from state A to state B as a function of the follow-up time and prognostic information.

Feature Transition Intensity Transition Probability State Probability
Measures Instantaneous rate of a specific transition (is the hazard) Probability of a specific transition Probability of being in a specific state
Time aspect At a point in time (instantaneous) Over an interval of time (from \(s\) to \(t\)) At a point in time
Value range 0 to \(\infty\) (it’s a rate) 0 to 1 (it’s a probability) 0 to 1 (it’s a probability)
Depends on… Current time \(t\) (and covariates) Start state \(i\), end state \(j\), start time \(s\), end time \(t\). (Calculated from intensities). Initial population distribution and all transition probabilities into state \(j\).
Example Question What is the instantaneous risk of progression at time t given that it was not seen before? What is the probability of progression at 5 years, given that at t = 0 the patient is without progression ? What is theprobability that a random person from the study is sick at 5 years?

MultistateModel M0_DFS State 1: M0, Distant Metastasis-Free (Start for M0 Patients) M1_1stLine State 2: M1, 1st-Line Therapy (Start for De Novo M1 Patients) M0_DFS->M1_1stLine Distant Recurrence (End of DMFS) Death State 4: Death (Absorbing) M0_DFS->Death Death (competing risk) M1_2ndLine State 3: M1, 2nd-Line Therapy M1_1stLine->M1_2ndLine Progression (End of PFS1) M1_1stLine->Death Death (competing risk) M1_2ndLine->Death Death Start_M0 M0 Patients Enter Start_M0->M0_DFS