Background: Performing a statistical test requires a null hypothesis. In cancer genomics, a key challenge is the fast generation of accurate somatic mutational landscapes that can be used as a realistic null hypothesis for making biological discoveries.
Results: Here we present SigProfilerSimulator, a powerful tool that is capable of simulating the mutational landscapes of thousands of cancer genomes at different resolutions within seconds. Applying SigProfilerSimulator to 2144 whole-genome sequenced cancers reveals: (i) that most doublet base substitutions are not due to two adjacent single base substitutions but likely occur as single genomic events; (ii) that an extended sequencing context of ± 2 bp is required to more completely capture the patterns of substitution mutational signatures in human cancer; (iii) information on false-positive discovery rate of commonly used bioinformatics tools for detecting driver genes.
Conclusions: SigProfilerSimulator’s breadth of features allows one to construct a tailored null hypothesis and use it for evaluating the accuracy of other bioinformatics tools or for downstream statistical analysis for biological discoveries.
Background: Smoking has been associated with worse colorectal cancer patient survival and may potentially suppress the immune response in the tumor microenvironment. We hypothesized that the prognostic association of smoking behavior at colorectal cancer diagnosis might differ by lymphocytic reaction patterns in cancer tissue.
Methods: Using 1474 colon and rectal cancer patients within 2 large prospective cohort studies (Nurses’ Health Study and Health Professionals Follow-up Study), we characterized 4 patterns of histopathologic lymphocytic reaction, including tumor-infiltrating lymphocytes (TILs), intratumoral periglandular reaction, peritumoral lymphocytic reaction, and Crohn’s-like lymphoid reaction. Using covariate data of 4420 incident colorectal cancer patients in total, an inverse probability weighted multivariable Cox proportional hazards regression model was conducted to adjust for selection bias due to tissue availability and potential confounders, including tumor differentiation, disease stage, microsatellite instability status, CpG island methylator phenotype, long interspersed nucleotide element-1 methylation, and KRAS, BRAF, and PIK3CA mutations.
Results: The prognostic association of smoking status at diagnosis differed by TIL status. Compared with never smokers, the multivariable-adjusted colorectal cancer–specific mortality hazard ratio for current smokers was 1.50 (95% confidence interval = 1.10 to 2.06) in tumors with negative or low TIL and 0.43 (95% confidence interval = 0.16 to 1.12) in tumors with intermediate or high TIL (2-sided Pinteraction = .009). No statistically significant interactions were observed in the other patterns of lymphocytic reaction.
Conclusions: The association of smoking status at diagnosis with colorectal cancer mortality may be stronger for carcinomas with negative or low TIL, suggesting a potential interplay of smoking and lymphocytic reaction in the colorectal cancer microenvironment.
Epidemiological studies have identified many environmental agents that appear to significantly increase cancer risk in human populations. By analyzing tumor genomes from mice chronically exposed to 1 of 20 known or suspected human carcinogens, we reveal that most agents do not generate distinct mutational signatures or increase mutation burden, with most mutations, including driver mutations, resulting from tissue-specific endogenous processes. We identify signatures resulting from exposure to cobalt and vinylidene chloride and link distinct human signatures (SBS19 and SBS42) with 1,2,3-trichloropropane, a haloalkane and pollutant of drinking water, and find these and other signatures in human tumor genomes. We define the cross-species genomic landscape of tumors induced by an important compendium of agents with relevance to human health.
Colorectal cancer (CRC) is a heterogeneous disease of the intestinal epithelium that is characterized by the accumulation of mutations and a dysregulated immune response. Up to 90% of disease risk is thought to be due to environmental factors such as diet, which is consistent with a growing body of literature that describes an ‘oncogenic’ CRC-associated microbiota. Whether this dysbiosis contributes to disease or merely represents a bystander effect remains unclear. To prove causation, it will be necessary to decipher which specific taxa or metabolites drive CRC biology and to fully characterize the underlying mechanisms. Here we discuss the host–microbiota interactions in CRC that have been reported so far, with particular focus on mechanisms that are linked to intestinal barrier disruption, genotoxicity and deleterious inflammation. We further comment on unknowns and on the outstanding challenges in the field, and how cutting-edge technological advances might help to overcome these. More detailed mechanistic insights into the complex CRC-associated microbiota would potentially reveal avenues that can be exploited for clinical benefit.
Objective: The goal of this study is to use adjunctive classes to improve a predictive model whose performance is limited by the common problems of small numbers of primary cases, high feature dimensionality, and poor class separability. Specifically, our clinical task is to use mammographic features to predict whether ductal carcinoma in situ (DCIS) identified at needle core biopsy will be later upstaged or shown to contain invasive breast cancer.
Methods: To improve the prediction of pure DCIS (negative) versus upstaged DCIS (positive) cases, this study considers the adjunctive roles of two related classes: atypical ductal hyperplasia (ADH), a non-cancer type of breast abnormity, and invasive ductal carcinoma (IDC), with 113 computer vision based mammographic features extracted from each case. To improve the baseline Model A's classification of pure vs. upstaged DCIS, we designed three different strategies (Models B, C, D) with different ways of embedding features or inputs.
Results: Based on ROC analysis, the baseline Model A performed with AUC of 0.614 (95% CI, 0.496-0.733). All three new models performed better than the baseline, with domain adaptation (Model D) performing the best with an AUC of 0.697 (95% CI, 0.595-0.797).
Conclusion: We improved the prediction performance of DCIS upstaging by embedding two related pathology classes in different training phases.
Significance: The three new strategies of embedding related class data all outperformed the baseline model, thus demonstrating not only feature similarities among these different classes, but also the potential for improving classification by using other related classes.