Mass spectrometric data processing for metabolomics and fluxomics : a flexible evaluation framework with quality awareness

  • Auswertung massenspektrometrischer Rohdaten für Metabolomics und Fluxomics : ein flexibles Framework mit besonderer Berücksichtigung der Auswertequalität

von Haugwitz, Max; Wiechert, Wolfgang (Thesis advisor); Schuppert, Andreas (Thesis advisor)

Aachen (2016)
Dissertation / PhD Thesis

Dissertation, RWTH Aachen, 2016


Metabolomics and fluxomics have found ubiquitous applications in both applied and fundamental research disciplines ranging from functional genomics to metabolic engineering. Important experimental methods in these fields involve the utilization of stable isotope labeling, in particular 13C, in combination with liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS). The conversion of measurement data to biologically interpretable use data is a major determinant of the time necessary for and the quality of analyses. This thesis investigates data processing concepts for the evaluation of mass chromatographic data in the context of 13C-based metabolomics and fluxomics experiments acquired using multiple reaction monitoring, a particular LC-MS/MS measurement mode. As part of this thesis, a novel data evaluation workflow combining techniques from signal processing and pattern recognition is developed. Applicability of the workflow, implemented in the software framework MRMQuant developed in this thesis, is demonstrated by application to metabolomics, steady-state (data set CG STAT) and dynamic (data set CG DYN) fluxomics, and proteomics data. For CG DYN, containing 15,000 chromatograms, evaluation is sped up from 1.5 work weeks to 1.5 work days compared to the previously established vendor solution Analyst TM. The comparison of the results generated using MRMQuant, Analyst TM , and another state-of-the-art solution reveals that in case of CG STAT in the majority of cases the solutions agree well with respect to labeling fractions (in 97% of cases the absolute deviation in labeling between solutions is smaller than 2%), but relative differences, in particular for peak areas, can be substantially higher (40% of cases with a relative deviation greater 2.5% in case of peak areas). To judge the significance of deviations two user studies involving 10 operators each were carried out to investigate differences between integrations obtained by operators using manual integration techniques and using MRMQuant. The comparison reveals a strong variability of peak integration among human operators, and in complicated cases even for trained users a scatter of 10-20% is no exception. While this study demonstrates the limits of integration accuracy for complex chromatographic data, it is also shown that integrations obtained using the software are comparable to human operators. Irrespective of the question of absolute correctness of results, it is found that the major factor hindering further speed-up of the data evaluation is the manual verification of integrations to ensure a consistent data evaluation. For the first time One-Class Support Vector Machines are utilized to identify spurious integrations. Sensitivity strongly varies together with the stability of the chromatographic data, but for data sets with stable chromatographic conditions in several cases specificity and sensitivity is above 90%. However, measures have to be further optimized to also enable a detection of gradual integration errors. As a major result of this thesis, the MRMQuant framework, containing 67,000 lines of C++ code, has replaced the previously available software and is now established in routine usage at IBG-1.