Translational modeling of drug efficacy in cancer

  • Translationale Modellierung der Wirksamkeit von Medikamenten in Krebspatienten

Schätzle, Lisa-Katrin; Schuppert, Andreas (Thesis advisor); Wiechert, Wolfgang (Thesis advisor)

Aachen (2020)
Dissertation / PhD Thesis

Dissertation, Rheinisch-Westfälische Technische Hochschule Aachen, 2020


As a heterogeneous disease that involves a complex interplay of mechanisms across multiple molecular scales, cancer poses a challenge to the development of effective therapies. Thus, in the context of personalized medicine, significant effort is put into capturing genomic patterns that can predict how a specific compound will affect a patient’s survival. Since cancer cell lines mirror many important aspects of human biology and can be analyzed in standardized high-throughput experiments, they frequently serve as test specimens in large-scale pharmacological screens. Along with the resulting data sets, a variety of computational approaches exists to link the disease-specific molecular profiles to drug responses. However, as a result of the bias introduced by the developers’ fields of expertise, the limitation to isolated data sets and the focus on distinct model components instead of the complete modeling workflow, these models often lack robustness and fail to reproduce good results in other applications. Moreover, existing models rarely consider the translation of the observed in vitro mechanisms into the patient-relevant in vivo context. Within the scope of this thesis, an R-package was developed to systematically investigate translational modeling routines that train regression models on the gene expression data of cancer cell lines and subsequently apply them on patient data of the same structure to evaluate their clinical relevance. The package was then used in various setups to scan a defined modeling space and identify robust settings among cell response transformation-, homogenization-, feature selection-, feature preprocessing-, and regression approaches. The first part of the results section addresses the performance variation of translational models predicting diverse patient data sets as well as the challenges in finding universal guidelines for promising model settings issued by small data set sizes and inherent noise patterns. A direct comparison between translational models and pure cell line models exposes significant differences in the beneficial effects of model settings. Moreover, it highlights the extent of noise that is introduced by covariates accompanying the translational process from cell lines to patients. The following sections expand the model training to other cell line databases to disperse the interfering impact of training data peculiarities on the performance of translational models. Differences, both among databases and experimental protocols and within databases among variant response measures, are thematized to find a common ground for consecutive modeling concepts. Since the quality disparities within the training data are reflected in the performance patterns of the resulting patient models, these segments illustrate the close affiliation between data generation and data modeling. The subsequent systematic analysis of variance endorses the assumption that the overall model performance is affected by more factors than the choice of regression algorithm, even though the latter proves to be the major contributing factor. Furthermore, the ANOVA manifests the beneficial effects of simple model settings in translational workflows: binarizing the response values of the training data, homogenizing the in vitro and in vivo data with the RUV4 algorithm, applying a PCA or gene-wise z-score transformation to the gene expression features and using penalized linear regression methods improve the prediction results of patient survival independently of the drug being modeled. Finally, in order to counterbalance the confounding effects introduced by the training data resources, the last part of the investigations exploits the potential of the consensus concept in translational models. Instead of raising models that are based on different cell line databases against each other, the main focus is put on their integration to encounter the prediction problem from multiple perspectives. Despite exposing that even consensus models suffer from robustness deficiencies if they are optimized with regard to overly specific applications, stable consensus settings can be substantiated that reproducibly yield highly predictive performances. Combined, the findings presented in this thesis deepen the comprehension of translational model properties in their potential to predict the therapeutic outcomes of cancer patients. The developed concepts offer a robust strategy to yield predictions of remarkable accuracy, especially considering the direct translation from cell line to patient processes without interim stages of animal studies or comparable efforts. Thus, they can potentially guide future sensitivity models for anticancer agents towards increased clinical relevance.