| Variable Group | Entry field | Explanation |
|---|---|---|
| Patient specific factors | patient_ID | pseudonimized ID |
| NA | date_of_diagnosis | date in dd/mm/yyyy of first malignant biopsy if unknown date of first contact regarding invasive lobular cancer |
| NA | method_of_detection | was the tumor found by population based screening or did the patient present with symptoms |
| NA | date_of_birth | date in dd/mm/yyyy of birth |
| NA | age_at_diagnosis (y) | age of patient in years at the time of primary diagnosis |
| NA | age_category | category of age at the time of primary diagnosis (<40, 40 - 49, 50 - 59, 60 - 69, 70 - 79, ≥80) |
| NA | gender | M= male; F= female |
| NA | height (m) | in meters, at time of diagnosis primary |
| NA | weight (kg) | in kg, at time of diagnosis primary |
| NA | BMI | calculated, at diagnosis primary (weight (kg)/height (m)^2) |
| NA | BMI_category | category of calculated BMI ( <25, 25 - 30, >30) |
| NA | menopausal_status | pre or postmenopausal at timing of diagnosis. If in transition, to be considered as premenopausal (categories = pre- and perimenopausal, postmenopausal). |
| NA | body_surface_area | calculated, at diagnosis primary (0.20247 x height (m)^0.725 x weight (kg)^0.425) |
| NA | smoking | is the patient a present smoker (= active), past smoker (=former) or did he/she never smoke (= no) at the time of first diagnosis) |
| NA | alcohol_abuse | is there a history of alcohol abuse reported (yes/no) at the time of diagnosis |
| NA | hypertension | is there a personal history of arterial hypertension at primary diagnosis (yes/no) |
| NA | hyperlipidemia | is there a personal history of hyperlipidemia at time of diagnosis (yes/no) |
| NA | diabetes | is there a personal history of diabetes + type at the time of diagnosis (type 1, type 2 or no) |
| NA | oral_anticonceptive_use | has the patient ever used oral contraceptives or are they still using them at the time of diagnosis (former, active, no)? |
| NA | pregnancy_A | number of abortus/miscarriage at the time of diagnosis |
| NA | pregnancy_P | number of partus/child-births at the time of diagnosis |
| NA | pregnancy_G | number of gravidus/pregnancies at the time of diagnosis (=A+P) |
| NA | Age.FFTP | age at first child birth, if applicable (if no child birth = nulliparous) |
| NA | Interval.1st.FTP | age difference between age diagnosis and age first child birth (if no child birth = NA) |
| NA | hormone_replacement | has the patient ever used hormone replacement therapy or are they still using hormone replacement therapy (former, active, no) |
| NA | familial_history_breast_ovary | is there a history of breast or ovarian cancer in the family? |
| NA | familial_history_breast_ovary_line | are the affected relatives first (1), second (2) or thirth (3) degree relatives? If no history: NA |
| NA | germline_mutation_testing_performed | was there a test done to see if there are any germline mutations present associated with higher breast cancer risk? |
| NA | germline_mutation_testing_year_most_recent_test | if applicable year that the most recent test was performed, if not applicable NA |
| NA | germline_mutation_testing_result | if applicable gene that was mutated, if tested but no mutation = negative, if not tested = NA |
| Radiology | visible_on_mammogram | was the primary tumor seen on mammogram (yes/no), NA if no mammogram was performed |
| NA | diameter_mammogram_at_diagnosis | diameter in mm of largest focus on mammogram, if not seen on mammogram NA |
| NA | number_of_suspected_foci_mammogram | number of foci seen on mammogram numerical, if not described properly but multifocal = multiple, if not seen on mammogram = NA |
| NA | breast_density_score_mammogram | Birads-score of breast density on mammogram (fatty breast = Type A, dense breast = Type D), not seen on mammogram = NA, not reported by the radiologist = NA |
| NA | diameter_ultrasound_at_diagnosis | diameter in mm of largest focus on ultrasound, if not seen on ultrasound NA |
| NA | Number_of_adenopathies_expected_on_ultrasound | number of adenopathies seen on ultrasound numerical, if not described properly but >1 = multiple, if not seen on ultrasound = NA |
| NA | MRI_breast_performed | Was an MRI of the breast performed at diagnosis (yes/no) |
| NA | diameter_MRI_at_diagnosis | diameter in mm of largest focus on MRI, if not performed/reported NA |
| NA | number_of_suspected_foci_MRI | number of foci seen on MRI numerical, if not described properly but multifocal = multiple, if not performed/reported = NA |
| NA | breast_density_score_MRI | Birads-score of breast density on mammogram (fatty breast = Type A, dense breast = Type D), not performed/reported = NA |
| NA | Number_of_adenopathies_expected_on_MRI_breast | number of adenopathies seen on MRI numerical, if not described properly but >1 = multiple, if not performed/reported = NA |
| Primary tumor, pre-treatment | primary_laterality | left or right or bilateral |
| NA | TNM_cT_at_diagnosis | clinical T-classification according to TNM classification of malignant tumors |
| NA | TNM_cN_at_diagnosis | clinical N-classification according to TNM classification of malignant tumors |
| NA | TNM_cM_at_diagnosis | clinical M-classification according to TNM classification of malignant tumors |
| NA | diameter_radiology_at_diagnosis (mm) | of largest focus, in mm (largest reported diameter on mammogram/ultrasound/MRI) |
| Neoadjuvant therapy | neo_adjuvant_therapy | did the patient recieve any kind of therapy (endocrine, immunetherapy, chemotherapy) prior to surgery (yes/no) |
| Surgery | surgery_date | date in dd/mm/yyyy of the surgery (first surgery of the primary tumor), no surgery performed = NA |
| NA | surgery_type_breast | mastectomy vs tumorectomy (= breast conserving surgery), no surgery performed = NA |
| NA | surgery_type_axilla | sentinel lymph node biopsy (SLN) vs axillary clearance (ALN) or SN followed by ALN in same or subsequent surgery (SLN + ALN), no surgery performed = NA |
| Pathology resection specimen | TNM_pT_resection_specimen | T-classification according to TNM classification of malignant tumors |
| NA | TNM_pN_resection_specimen | N-classification according to TNM classification of malignant tumors |
| NA | diameter_pathology_resection_specimen (mm) | of largest focus, in mm, if no surgery performed = NA, if surgery performed externally and no reports available = unknown |
| NA | tumor_grade_resection_specimen | histological grade (bloom-score) reported on resection specimen. If multifocal, then largest focus is considered |
| NA | resection_margin_resection_specimen | was the tumor completely resected? If no tumor in resection margins = negative, if doubt = dubious (< 1 mm), if tumor in resection margins = positive, if no surgery/not reported = NA |
| NA | ER_Interpretation | Is the estrogen receptor expression considered positive or negative on the largest focus |
| NA | PR_Interpretation | Is the progesteron receptor expression considered positive or negative on the largest focus |
| NA | HER2_Interpretation | Is the expression of HER2 considered positive or negative on the largest focus |
| NA | Ki67_resection_specimen (%) | value of Ki67 if available in parhology report |
| NA | presence_DCIS_resection_specimen | is DCIS present in the resection specimen (yes/no) |
| NA | presence_LCIS_resection_specimen | is LCIS present in the resection specimen + type (yes, classical LCIS; yes, non-classical LCIS; no; unknown; NA) |
| NA | total_ALN_removed | total amount of lymph nodes prelevated (Sentinel included) |
| NA | positive_ALN | total amount of positive lymph nodes (sentinel included) |
| NA | Micro_vs_macrometastases | if positive lymph nodes are present are they micro- or macro-invaded? |
| NA | ALN_maxdiameter | maximal diameter in mm of metastasis to lymph node if applicable |
| NA | HER2_FISH_resection_specimen | HER2-FISH status (amplification/no amplicifation/NA) on resection specimeny, if applicable, if not repeated on resection state, score of biopsy |
| NA | HER2_ratio_resection_specimen | HER2-FISH ratio on resection specimeny, if applicable, if not repeated on resection state, score of biopsy |
| Adjuvant therapy | radiotherapy | radiotherapy performed at site of surgery? (yes/no) |
| NA | adjuvant_chemotherapy | was chemotherapy given in adjuvant setting |
| NA | adjuvant_HER2 | was HER2-therapy drgiven in adjuvant setting |
| NA | adjuvant_endocrinetherapy | did the patient get post-surgery endocrine therapy? |
| Metastatic disease | meta_brain_nonleptomeningeal_first_metastases | occurence of brain metastases with the exception of leptomeningeal disease at first diagnosis of metastatic setting (yes/no) |
| NA | meta_leptomeningeal_first_metastases | occurence of leptomeningeal metastases at first diagnosis of metastatic setting (yes/no) |
| NA | meta_bones_first_metastases | occurence of bone metastases at first diagnosis of metastatic setting (yes/no) |
| NA | meta_skin_first_metastases | occurence of skin metastases at first diagnosis of metastatic setting (yes/no) |
| NA | meta_lungs_first_metastases | occurence of lung metastases at first diagnosis of metastatic setting (yes/no) |
| NA | meta_liver_first_metastases | occurence of liver metastasesat first diagnosis of metastatic setting (yes/no) |
| NA | meta_abdomen_extrahepatic_first_metastases | occurence of abdominal (extrahepatical) metastases at first diagnosis of metastatic setting (yes/no) |
| NA | meta_reproductive_organs_first_metastases | occurence of metastases in reproductive organs at first diagnosis of metastatic setting (yes/no) |
| NA | meta_lymph_nodes_first_metastases | occurence of lymph node metastases at first diagnosis of metastatic setting (yes/no) |
| NA | meta_other_first_metastases | occurence of other metastases at first diagnosis of metastatic setting, + site (e.g. yes: pleura) |
| NA | Systemic_treatment_firstline | Type of treatment if applicable |
| NA | surgery_1st_line_metastatic | Surgery (resection) performed of a metastatis at first metastatic diagnosis? |
| NA | radiotherapy_1st_line_metastatic | Radiotherapy performed of a metastatis at first metastatic diagnosis? |
| NA | first_progression_distant_disease_metastatic | Was there already progression of the metastatic disease reported (yes/no)? |
| NA | date_first_progression_metastatic | if applicable date first progression was reported (dd/mm/yyyy) |
| NA | meta_brain_nonleptomeningeal_atfirstprogression | progression of brain metastases with the exception of leptomeningeal disease at first progression of metastatic setting (yes/no) |
| NA | meta_leptomeningeal_atfirstprogression | progression of leptomeningeal metastases at first progression of metastatic setting (yes/no) |
| NA | meta_bones_atfirstprogression | progression of bone metastases at first progression of metastatic setting (yes/no) |
| NA | meta_skin_atfirstprogression | progression of skin metastases at first progression of metastatic setting (yes/no) |
| NA | meta_lungs_atfirstprogression | progression of lung metastases at first progression of metastatic setting (yes/no) |
| NA | meta_liver_atfirstprogression | progression of liver metastasesat first progression of metastatic setting (yes/no) |
| NA | meta_abdomen_extrahepatic_atfirstprogression | progression of abdominal (extrahepatical) metastases at first progression of metastatic setting (yes/no) |
| NA | meta_reproductive_organs_atfirstprogression | progression of metastases in reproductive organs at first progression of metastatic setting (yes/no) |
| NA | meta_lymph_nodes_atfirstprogression | progression of lymph node metastases at first progression of metastatic setting (yes/no) |
| NA | meta_other_atfirstprogression | progression of other metastases at first progression of metastatic setting, + site (e.g. yes: pleura) |
| NA | systemic_treatment_secondline | Type of treatment if applicable |
| NA | radiotherapy_2nd_line_metastatic | Radiotherapy performed of a metastatis at first progression? |
| NA | number_of_lines_metastatic | Number of treatments a patient received due to progression at the time of last database update |
| Recurrences and death | date_last_update_file_database | cut off date of last update (dd/mm/yyyy) |
| NA | locoregional_recurrence | did patient present with locoregional recurrence (i.e. ipsilateral breast and/or axilla)? yes or no |
| NA | date_locoregional_recurrence | dd/mm/yyyy if applicable |
| NA | recurrence_contralateral_breast | did patient present with controlateral recurrence (i.e. contralateral breast and/or axilla)? yes or no |
| NA | date_recurrence_contralateral_breast | dd/mm/yyyy if applicable |
| NA | distant_recurrence | occurence of metastasis after surgery for primary (=cM0 at diagnosis). Yes or no? |
| NA | date_distant_recurrence | dd/mm/yyyy if applicable |
| NA | death | is the patient deceased? Yes or no |
| NA | date_of_death | date in dd/mm/yyyy of death if applicable otherwise NA |
| NA | cause_of_death | is the death realted to breast cancer or not? (breast cancer related, not breast cancer related, unknown or NA) |
| NA | date_last_FU | date in dd/mm/yyyy of last follow up (in own center, with other physician or in other centre with clear communication to your center) |
| NA | date_last_FU_Leuven | date in dd/mm/yyyy of last follow up for breast cancer in UZ Leuven |
ILC Project
0.1 Preliminary Statistical Analysis Plan
As ILC data is not available for the moment, I focused on the metadata file that was sent on November 12th. This report formalizes the aims of the project and propose a preliminary analysis plan to be discussed together.
Few instructions on how to read this report (and the ones will come…)
This report was built in HTML to exploit its features.
First off, above every object displayed (tables, figures or results) the R code is reported for reproducibility. You just need to click on the arrow before the word ‘Code’ to visualize it. You can also copy and paste it by clicking at the icon on the top right corner of the folding box.
Below each table you can find a button to download and save the table on your laptop and possibly perform additional analyses. You can also download specific figures of the report by right-clicking on it and choosing ‘Save image as’; or you can visualize it better by clicking on ‘Open image in a new tab’.
If you hover over hyperlinks a preview of the object will appear.
I enabled the possibility to highlight and comment the report directly on the page using Hypothes.is. I just need your usernames to add you to a common group.
The metadata for M0 patients is reported in Table 1.
The metadata for M1 patients is reported in Table 2.
| Variable Group | Entry field | Explanation |
|---|---|---|
| Patient specific factors | patient_ID | pseudonimized ID |
| NA | date_of_diagnosis | date in dd/mm/yyyy of first malignant biopsy if unknown date of first contact regarding invasive lobular cancer |
| NA | method_of_detection | was the tumor found by population based screening or did the patient present with symptoms (of primary or metastases not dinstiguished in the database) |
| NA | date_of_birth | date in dd/mm/yyyy of birth |
| NA | age_at_diagnosis (y) | age of patient in years at the time of primary diagnosis |
| NA | age_category | category of age at the time of primary diagnosis (<40, 40 - 49, 50 - 59, 60 - 69, 70 - 79, ≥80) |
| NA | gender | M= male; F= female |
| NA | height (m) | in meters, at time of diagnosis primary |
| NA | weight (kg) | in kg, at time of diagnosis primary |
| NA | BMI | calculated, at diagnosis primary (weight (kg)/height (m)^2) |
| NA | BMI_category | category of calculated BMI ( <25, 25 - 30, >30) |
| NA | menopausal_status | pre or postmenopausal at timing of diagnosis. If in transition, to be considered as premenopausal (categories = pre- and perimenopausal, postmenopausal). |
| NA | body_surface_area | calculated, at diagnosis primary (0.20247 x height (m)^0.725 x weight (kg)^0.425) |
| NA | smoking | is the patient a present smoker (= active), past smoker (=former) or did he/she never smoke (= no) at the time of first diagnosis) |
| NA | alcohol_abuse | is there a history of alcohol abuse reported (yes/no) at the time of diagnosis |
| NA | hypertension | is there a personal history of arterial hypertension at primary diagnosis (yes/no) |
| NA | hyperlipidemia | is there a personal history of hyperlipidemia at time of diagnosis (yes/no) |
| NA | diabetes | is there a personal history of diabetes + type at the time of diagnosis (type 1, type 2 or no) |
| NA | oral_anticonceptive_use | has the patient ever used oral contraceptives or are they still using them at the time of diagnosis (former, active, no)? |
| NA | pregnancy_A | number of abortus/miscarriage at the time of diagnosis |
| NA | pregnancy_P | number of partus/child-births at the time of diagnosis |
| NA | pregnancy_G | number of gravidus/pregnancies at the time of diagnosis (=A+P) |
| NA | Age.FFTP | age at first child birth, if applicable (if no child birth = nulliparous) |
| NA | Interval.1st.FTP | age difference between age diagnosis and age first child birth (if no child birth = NA) |
| NA | hormone_replacement | has the patient ever used hormone replacement therapy or are they still using hormone replacement therapy (former, active, no) |
| NA | familial_history_breast_ovary | is there a history of breast or ovarian cancer in the family? |
| NA | familial_history_breast_ovary_line | are the affected relatives first (1), second (2) or thirth (3) degree relatives? If no history: NA |
| NA | germline_mutation_testing_performed | was there a test done to see if there are any germline mutations present associated with higher breast cancer risk? |
| NA | germline_mutation_testing_year_most_recent_test | if applicable year that the most recent test was performed, if not applicable NA |
| NA | germline_mutation_testing_result | if applicable gene that was mutated, if tested but no mutation = negative, if not tested = NA |
| Radiology | visible_on_mammogram | was the primary tumor seen on mammogram (yes/no), NA if no mammogram was performed |
| NA | diameter_mammogram_at_diagnosis | diameter in mm of largest focus on mammogram, if not seen on mammogram NA |
| NA | number_of_suspected_foci_mammogram | number of foci seen on mammogram numerical, if not described properly but multifocal = multiple, if not seen on mammogram = NA |
| NA | breast_density_score_mammogram | Birads-score of breast density on mammogram (fatty breast = Type A, dense breast = Type D), not seen on mammogram = NA, not reported by the radiologist = NA |
| NA | diameter_ultrasound_at_diagnosis | diameter in mm of largest focus on ultrasound, if not seen on ultrasound NA |
| NA | Number_of_adenopathies_expected_on_ultrasound | number of adenopathies seen on ultrasound numerical, if not described properly but >1 = multiple, if not seen on ultrasound = NA |
| NA | MRI_breast_performed | Was an MRI of the breast performed at diagnosis (yes/no) |
| NA | diameter_MRI_at_diagnosis | diameter in mm of largest focus on MRI, if not performed/reported NA |
| NA | number_of_suspected_foci_MRI | number of foci seen on MRI numerical, if not described properly but multifocal = multiple, if not performed/reported = NA |
| NA | Number_of_adenopathies_expected_on_MRI_breast | number of adenopathies seen on MRI numerical, if not described properly but >1 = multiple, if not performed/reported = NA |
| Primary tumor, pre-treatment | primary_laterality | left or right or bilateral |
| NA | TNM_cT_at_diagnosis | clinical T-classification according to TNM classification of malignant tumors |
| NA | TNM_cN_at_diagnosis | clinical N-classification according to TNM classification of malignant tumors |
| NA | TNM_cM_at_diagnosis | clinical M-classification according to TNM classification of malignant tumors |
| NA | diameter_radiology_at_diagnosis (mm) | of largest focus, in mm (largest reported diameter on mammogram/ultrasound/MRI) |
| Pathology biopsy | tumor_grade_biopsy_breast | histological grade (bloom-score) reported on biopsy primary tumor. If multifocal, then largest focus is considered |
| NA | ER_Interpretation_biopsy_breast | Is the estrogen receptor expression considered positive or negative on the largest focus |
| NA | PR_Interpretation_biopsy_breast | Is the progesteron receptor expression considered positive or negative on the largest focus |
| NA | HER2_Interpreation_biopsy_breast | Is the expression of HER2 considered positive or negative on the largest focus |
| NA | HER2_FISH_biopsy_breast | HER2-FISH status (amplification/no amplicifation/NA) on biopsy of primary tumor if applicable |
| NA | HER2_ratio_biopsy_breast | HER2-FISH ratio on biopsy of primary tumor if applicable |
| Surgery | surgery_performed_primary_tumor_breast | did the patient get any type of surgery of the primary tumor/axilla? (Yes/no) |
| NA | surgery_date | date in dd/mm/yyyy of the surgery (first surgery of the primary tumor), no surgery performed = NA |
| NA | surgery_type_breast | mastectomy vs tumorectomy (= breast conserving surgery), no surgery performed = NA |
| NA | surgery_type_axilla | sentinel lymph node biopsy (SLN) vs axillary clearance (ALN) or SN followed by ALN in same or subsequent surgery (SLN + ALN), no surgery performed = NA |
| Other treatment primary | radiotherapy | radiotherapy performed at site of the primary tumor or axillary lymph nodes? (yes/no) |
| Metastatic disease | meta_brain_nonleptomeningeal_first_metastases | occurence of brain metastases with the exception of leptomeningeal disease at first diagnosis of metastatic setting (yes/no) |
| NA | meta_leptomeningeal_first_metastases | occurence of leptomeningeal metastases at first diagnosis of metastatic setting (yes/no) |
| NA | meta_bones_first_metastases | occurence of bone metastases at first diagnosis of metastatic setting (yes/no) |
| NA | meta_skin_first_metastases | occurence of skin metastases at first diagnosis of metastatic setting (yes/no) |
| NA | meta_lungs_first_metastases | occurence of lung metastases at first diagnosis of metastatic setting (yes/no) |
| NA | meta_liver_first_metastases | occurence of liver metastasesat first diagnosis of metastatic setting (yes/no) |
| NA | meta_abdomen_extrahepatic_first_metastases | occurence of abdominal (extrahepatical) metastases at first diagnosis of metastatic setting (yes/no) |
| NA | meta_reproductive_organs_first_metastases | occurence of metastases in reproductive organs at first diagnosis of metastatic setting (yes/no) |
| NA | meta_lymph_nodes_first_metastases | occurence of lymph node metastases at first diagnosis of metastatic setting (yes/no) |
| NA | meta_other_first_metastases | occurence of other metastases at first diagnosis of metastatic setting, + site (e.g. yes: pleura) |
| NA | Systemic_treatment_firstline | Type of treatment if applicable |
| NA | surgery_1st_line_metastatic | Surgery (resection) performed of a metastatis at first metastatic diagnosis? |
| NA | radiotherapy_1st_line_metastatic | Radiotherapy performed of a metastatis at first metastatic diagnosis? |
| NA | first_progression_distant_disease_metastatic | Was there already progression of the metastatic disease reported (yes/no)? |
| NA | date_first_progression_metastatic | if applicable date first progression was reported (dd/mm/yyyy) |
| NA | meta_brain_nonleptomeningeal_atfirstprogression | progression of brain metastases with the exception of leptomeningeal disease at first progression of metastatic setting (yes/no) |
| NA | meta_leptomeningeal_atfirstprogression | progression of leptomeningeal metastases at first progression of metastatic setting (yes/no) |
| NA | meta_bones_atfirstprogression | progression of bone metastases at first progression of metastatic setting (yes/no) |
| NA | meta_skin_atfirstprogression | progression of skin metastases at first progression of metastatic setting (yes/no) |
| NA | meta_lungs_atfirstprogression | progression of lung metastases at first progression of metastatic setting (yes/no) |
| NA | meta_liver_atfirstprogression | progression of liver metastasesat first progression of metastatic setting (yes/no) |
| NA | meta_abdomen_extrahepatic_atfirstprogression | progression of abdominal (extrahepatical) metastases at first progression of metastatic setting (yes/no) |
| NA | meta_reproductive_organs_atfirstprogression | progression of metastases in reproductive organs at first progression of metastatic setting (yes/no) |
| NA | meta_lymph_nodes_atfirstprogression | progression of lymph node metastases at first progression of metastatic setting (yes/no) |
| NA | meta_other_atfirstprogression | progression of other metastases at first progression of metastatic setting, + site (e.g. yes: pleura) |
| NA | systemic_treatment_secondline | Type of treatment if applicable |
| NA | radiotherapy_2nd_line_metastatic | Radiotherapy performed of a metastatis at first progression? |
| NA | number_of_lines_metastatic | Number of treatments a patient received due to progression at the time of last database update |
| Recurrences and death | date_last_update_file_database | cut off date of last update (dd/mm/yyyy) |
| NA | death | is the patient deceased? Yes or no |
| NA | date_of_death | date in dd/mm/yyyy of death if applicable otherwise NA |
| NA | cause_of_death | is the death realted to breast cancer or not? (breast cancer related, not breast cancer related, unknown or NA) |
| NA | date_last_FU | date in dd/mm/yyyy of last follow up (in own center, with other physician or in other centre with clear communication to your center) |
| NA | date_last_FU_Leuven | date in dd/mm/yyyy of last follow up for breast cancer in UZ Leuven |
For M0 we have the dates that they have a recurrence (locoregional or distant) and the dates that they progress on their first treatment they got for metastatic disease. For M1 we only have the date they progress after their first treatment. So we don’t have a PFS2 for them
I suggest to add a column ‘variable label’ in which a unique common name to the variable id is assigned; and also a column ‘format’ particularly for codifying the categories of the categorical variables.
0.2 Aims & Objectives
Primary Aim 1: Epidemiological Trends To analyze the temporal trends in the prevalence of ILC (vs. all BC) and M1 ILC (vs. all ILC) and M1 ILC vs all M1 BC diagnosed at UZ Leuven over a defined period of time (2000-2023).
Hypothesis 1a: The relative percentage of ILC diagnoses has increased over time.
Hypothesis 1b: The relative percentage of de novo M1 ILC diagnoses has increased over time, correlated with the implementation of enhanced imaging and pathological diagnostics.
The total number of BCs diagnosed at UZ Leuven through the years 2000-2023 are included in a different dataset (MBC data by Chantal). For now, we can certainly analyze a temporal trend of the prevalence of patients diagnosed with M1 ILC over total ILC patients over a defined period (2000-2023).
Primary Aim 2: Baseline Characteristics To compare the baseline clinical and pathological characteristics of patients diagnosed with M0 ILC versus de novo M1 ILC.
- Objective: To identify key differences (e.g., age, tumor size, grade, hormone receptor status) and compare these findings to documented differences in the general M0/M1 BC population.
We need to define together what are the baseline variables of interest for this analysis. The ones I identified for the moment are age_at_diagnosis; gender; BMI; menopausal_status; body_surface_area; smoking; alchool_abuse; hypertension; hyperlipidemia; diabetes… others?
Primary Aim 3: Metastatic Disease Profile To compare the nature (e.g., site, distribution, burden) of metastatic disease between two cohorts:
Patients diagnosed with de novo M1 ILC.
Patients initially diagnosed with M0 ILC who later developed a first recurrence (M0 -> M1).
We will consider these variables to characterize the site, also for de novo M1 patients, right? meta_brain_nonleptomeningeal_first_metastases, meta_leptomeningeal_first_metastases, meta_bones_first_metastases, meta_skin_first_metastases, meta_lungs_first_metastases, meta_liver_first_metastases, meta_abdomen_extrahepatic_first_metastases, meta_reproductive_organs_first_metastases, meta_lymph_nodes_first_metastases, meta_other_first_metastases, Systemic_treatment_firstline, surgery_1st_line_metastatic, radiotherapy_1st_line_metastatic, first_progression_distant_disease_metastatic, date_first_progression_metastatic, meta_brain_nonleptomeningeal_atfirstprogression, meta_leptomeningeal_atfirstprogression, meta_bones_atfirstprogression, meta_skin_atfirstprogression, meta_lungs_atfirstprogression, meta_liver_atfirstprogression, meta_abdomen_extrahepatic_atfirstprogression, meta_reproductive_organs_atfirstprogression, meta_lymph_nodes_atfirstprogression, meta_other_atfirstprogression
Primary Aim 4: Survival Outcomes To compare survival outcomes between patients diagnosed with M0 ILC and M1 ILC.
- Objective: To analyze PFS1 (time to first recurrence/progression), PFS2 (time from first to second recurrence/progression), and Overall Survival (OS).
- Findings will be benchmarked against existing literature (e.g., Mouabbi et al.).
We will analyze PFS between de novo M1 and M1 that were M0. Mouabbi et al. discuss data about metastatic breast cancer (MBC). We won’t analyze PFS2 as we don’t have part of the survival experience for these patients.
0.3 Preliminary Statistical Analysis Plan
0.3.1 Primary Aim 1: Epidemiological Trends
We focus on the prevalence of ILC diagnoses over the total BC diagnoses and the prevalence of de novo M1 diagnoses over the total ILC diangoses and the prevalence of de novo M1 ILC diagnoses over de novo M1 in general, for the UZ Leuven.
We don’t have the total number of diagnostic tests performed at UZ Leuven (but just the BC or ILC diagnoses). Thus, we will be able to model only a trend in the proportion of patients with ILC or de novo M1. It will be critical assessing if the denominator of this proportion is not influenced by other factors (e.g, improved diagnostic testing of IDC instead of ILC that lower the proportion of ILC not because the detection ability decreased for ILC but because the potential of detecting IDC improved).
0.3.2 Modeling the prevalence
For both hypothesis 1a and 1b, a prevalence, defined as \(\frac{\text{number of specic cases}}{\text{total number of cases}}\) , is the target. First off, a raw representation of the prevalence (Y axis) and Time (X-axis) will be presented, considering a time unit of one year. This non-parametric representation is just exploratory and it would be useful considering specific landmark time-points where specific diagnostic techniques were introduced to visually asses whether they have a clear impact on this prevalence.
To model the prevalence we could opt for either a binomial log-linear model or a poisson model. Literature states that the poisson approach might be more robust especially with a low proportions of cases. With high proportion of cases, both the log-binomial model and the poisson model provide almost the same performances.
The poisson model is structured as follows. Say we are considering Hypothesis 1a, the prevalence is linked to a linear combination of parameters (the effect of the covariates) with a link function (exponential):
\[\frac{\text{Number of ILC diagnoses}}{\text{Total number of BC diagnoses}} = exp\bigl[\beta_0 + \beta_1Y_2 + \beta_2Y_3 ... + \epsilon \bigl]\]
where \(Y_1, Y_2, .. Y_n\) are indicator variables (dummy variables) of the specific year of diagnosis. The model can also be expressed as:
\[\log\bigl[\text{Number of ILC diagnoses}\bigl] = \beta_0 + \beta_1Y_2 + \beta_2Y_3 ...+ \log\bigl[\text{Total number of BC diagnoses }\bigl] + \epsilon\]
It follows that \(\beta_1\) is the regression coefficient associated to specific \(Y_2\), \(\beta_2\) to \(Y_3\) and so on. And that for example, the estimated prevalence of ILC diagnoses at Year 2 will be defined as: \[\log\bigl[\text{Number of ILC diagnoses}\bigl] = \beta_0 + \beta_1Y_2 +\log\bigl[\text{Total number of BC diagnoses }\bigl]\] that is \[\frac{\text{Number of ILC diagnoses}}{\text{Total number of BC diagnoses}} = exp\bigl[\beta_0 + \beta_1Y_2 \bigl]\].
The exponentiated coefficients will give the prevalence ratio associated to a specific year where the reference value is Year 1. Say we want to compare prevalence at Year 2 vs prevalence at Year 1 computing a prevalence ratio, the contrast is defined as follows:
\[\frac{\frac{\text{Number of ILC diagnoses}}{\text{Total number of BC diagnoses }}_{\text{Year 2}}}{\frac{\text{Number of ILC diagnoses}}{\text{Total number of BC diagnoses }}_{\text{Year 1}}} = \frac{exp\bigl[\beta_0 + \beta_1Y_2 \bigl]}{exp\bigl[\beta_0 \bigl]}\] \[\log\biggl[\frac{\frac{\text{Number of ILC diagnoses}}{\text{Total number of BC diagnoses }}_{\text{Year 2}}}{\frac{\text{Number of ILC diagnoses}}{\text{Total number of BC diagnoses }}_{\text{Year 1}}}\biggl] = \beta_0+\beta_1Y_2-\beta_0 = \beta_1\]
Formal hypothesis tests can be conducted. As an example, the wald test has a null hypothesis of \(\beta_1 = \beta_0\), that is prevalence at Year 2 is not different to the prevalence at 1 (prevalence ratio = 1). Naturally, other prevalence ratios can be obtained considering as a reference values different years.
In the modeling framework displayed above we are considering ‘Year of diagnosis’ as a categorical variable. A good advantage is its straightforward interpretability. However it might be of interest displaying estimates of the prevalence of diagnosis of ILC or de novo M1 patients as a smooth function of time. To such end we could consider the time of diagnosis as a numerical covariate and fit a model extending its association with a non-linear effect. Modeling a non-linear effect is important as otherwise we would assume that the proportion of cases increases linearly as a function of time.
This has an advantage as it shows how the proportion of diagnoses varies smoothly with time, but also it is convenient as it can display how the prevalence ratio varies as a function of time, considering a specific landmark point (month/year of the introduction of a specific diagnostic technique). The non-linear effect of time can be modeled with simple restricted cubic splines.
Naturally the model has to be extended, that is, we should adjust for the introduction of new diagnostic techniques and for the time from the diagnosis and the introduction of those diagnostic techniques. We can have a binary covariate for the introduction of a new technique (NDT), that is 0 for the years before introduction and 1 for the years after introduction; as well as continuous covariates that quantify the ‘time passed between the introduction of the techniques and the diagnosis’ (TFI).
In its basic form the model will be:
\[\log\bigl[\text{Number of ILC diagnoses}\bigl] = \beta_0 + \beta_1Y_2 + \beta_2Y_3 ...+ \beta_nNDT + \beta_{n+1} TFI+ \log\bigl[\text{Total number of BC diagnoses }\bigl] + \epsilon\]
Now, \(\beta_n\) tests the hypothesis that a new introduction of a diagnostic technique (NDT) has an independent effect on the prevalence of a specific diagnosis, whereas \(\beta_{n+1}\) tests the effect of the time from the introduction of the diagnostic technique (TFI) on the long term of this proportion.
0.3.3 Primary Aim 2: Baseline Characteristics
To compare the baseline clinical and pathological characteristics of patients diagnosed with M0 ILC versus M1 ILC.
We can adopt a case-control study structure for this aim. First off, we can explore the data by comparing the distribution of the clinical and pathological variables adopting the relative frequency plots and mosaic plots for the categorical variables and representations of the empirical cumulative distribution function (ECDF) and/or histograms for the numerical variables.
As an exploratory analysis, we could then study the joint association of the clinical and pathological variables considering multivariate analysis techniques. For numerical covariates we could adopt a principal component analysis (PCA) to study the joint association of the variables and project the M0 ILC and M1 ILC in the multivariate space to see if they show clear separation. The same could be performed with the categorical variables, adopting a Multiple Correspondence Analysis framework.
Depending on the available information, we could opt for ML models to assess the multivariate relationships between clinical and pathological characteristics (predictors) and the type of ILC (M1 vs M0, the response). ML models are optimal when the amount of information (sample size) is high and there are no previous strong hypothesis about the relationship between predictors and the response. In this case, they can automatically detect interaction effects and non-linear effects. However, they have a drawback of not being easily interpretable, although there are some exceptions like Random Forests and Multivariate Adaptive Regression Splines.
Otherwise, we could opt for a classic logistic regression framework where the relationship of the predictors with the response is manually specified. The logistic model models the (log)-odds of being a case (say M1 ILC), \(\frac{P(ILC = M1)}{1-P(ILC=M1)}\) as a function of the covariates \(\textbf{X}\). Namely,
\[ \log\biggl[{\frac{P(ILC = M1)}{1-P(ILC=M1)}}\biggl] = \beta_0 + \beta^t\textbf{X} + \epsilon \]
where \(\beta\) is the vector of coefficients associated to the covariates in the matrix \(\textbf{X}\). Simplifying, let’s say we want to compare the effect of age (<50, > or = 50), we can do this by specifiying the model:
\[\log\biggl[{\frac{P(ILC = M1)}{1-P(ILC=M1)}}\biggl] = \beta_0 + \beta_1(Age \ge 50) + \epsilon\]
The effect of \(Age \ge 50\) is obtained considering the ratio of the odds of being a case for a hypothetical patient with \(Age \ge 50\) vs a patient with \(Age < 50\).
\[\log\biggl[\frac{\frac{P(ILC = M1| Age \ge 50)}{1-P(ILC=M1| Age \ge 50)}}{\frac{P(ILC = M1| Age < 50)}{1-P(ILC=M1| Age < 50)}}\biggl] = \beta_0 + \beta_1(Age \ge 50) - \beta_0 = \beta_1\]
it follows that by exponentiating \(\beta_1\) we obtain the effects of Age in terms of the Odds Ratio. Formal hypothesis tests can be performed: again the Wald test has a null hypothesis of \(\beta_1 = \beta_0\), that is, there is no effect of being older than 50 years on the odds of being a M1 ILC.
0.3.4 Primary Aim 3: Metastatic Disease Profile
For this aim I propose to adopt the same analysis plan of the aim before, with some extras. For example, it would be of interest assessing whether there is a pattern in terms of the site, the burden or the distribution of the metastases and whether this pattern is associated with de novo M1 or patients that develop M1 afterwards. We could adopt a hierarchical clustering approach to see if relevant profiles of sites, distribution and burden are revealed and then link these patterns to the two types of patients.
0.3.5 Primary Aim 4: Survival Outcomes
First off, as a first exploratory step, we adopt the Kaplan-Meier estimator to compute the cumulative survival functions, also conditional on the categories of the categorical covariates considered in the analysis. These display the cumulative proportion of patients alive (for Overall Survival) or free of the first recurrence (for PFS1). This initial exploratory analysis is separated between de novo M1 patients and M0 patients.
We will start from the time of first distant recurrence for M1 patients that were M0 at the time of diagnosis and from the time of diagnosis for de novo M1 patients. We are assuming that the history of M0 (the time spent free from distant metastasis ) does not affect the subsequent dynamics of progression (PFS1). We will add also a comparison between de novo M1 patients, M0-M1 patients and M0 patients that progressed to event different from distant recurrence.
For the direct comparison of de novo M1 and M1 patients we will model the progression free survival function with pseudo-observations conditional on being or not a de novo M1 patient, adjusting for the other prognostic factors.
Pseudo-observations are derived using a jackknife (leave-one-out) statistical procedure. For a specific quantity of interest, \(\theta(t)\), such as the survival probability at a fixed time \(\tau\), \(\theta(\tau) = S(\tau)\), the process is as follows:
First, compute the non-parametric estimate \(\hat{\theta}(\tau)\) using the full dataset of \(n\) subjects. For the survival function, this would be the Kaplan-Meier estimate, \(\hat{S}_{KM}(\tau)\).
Next, for each subject \(i=1, \dots, n\), temporarily remove that subject from the dataset and re-calculate the estimate on the remaining \(n-1\) subjects, yielding \(\hat{\theta}_{-i}(\tau)\).
The pseudo-observation for subject \(i\) at time \(\tau\) is then defined as: \[ \hat{\theta}_i(\tau) = n \cdot \hat{\theta}(\tau) - (n-1) \cdot \hat{\theta}_{-i}(\tau) \]
In the absence of censoring, \(\hat{\theta}_i(\tau)\) would simply be the observed outcome for subject \(i\) (e.g., the indicator variable \(I(T_i > \tau)\). The pseudo-observations thus serve as a complete-data substitute for the potentially unobserved outcome in the presence of censoring. These generated values can then be used as the response variable in a Generalized Linear Model (GLM) or, more robustly, a Generalized Estimating Equation (GEE) framework to estimate the effect of covariates. This allows for direct modeling of the survival probability at one or more time points without any proportional hazards assumption.
At least for M0 patients we could analyze their event history with a multi-state framework. M0 patients enter in the follow-up at state 1 (disease free). M0 patients can then progress to M1 or to death. M1 patients can progress to a distant recurrence (State 3) or again death.
With this framework we can model the probability of being in a given state (state probability) and the transition probability from state A to state B as a function of the follow-up time and prognostic information.
| Feature | Transition Intensity | Transition Probability | State Probability |
| Measures | Instantaneous rate of a specific transition (is the hazard) | Probability of a specific transition | Probability of being in a specific state |
| Time aspect | At a point in time (instantaneous) | Over an interval of time (from \(s\) to \(t\)) | At a point in time |
| Value range | 0 to \(\infty\) (it’s a rate) | 0 to 1 (it’s a probability) | 0 to 1 (it’s a probability) |
| Depends on… | Current time \(t\) (and covariates) | Start state \(i\), end state \(j\), start time \(s\), end time \(t\). (Calculated from intensities). | Initial population distribution and all transition probabilities into state \(j\). |
| Example Question | What is the instantaneous risk of progression at time t given that it was not seen before? | What is the probability of progression at 5 years, given that at t = 0 the patient is without progression ? | What is theprobability that a random person from the study is sick at 5 years? |