Yanick Hagemeijer

Anna NiehuesI’m a PhD student at the University of Groningen (UG), where I work at the European Research Institute for the Biology of Ageing (ERIBA) and at Groningen Research Institute of Pharmacy (GRIP), my 2 promoters are Peter Horvatovich, from the department of Analytical Biochemistry at the UG and Victor Guryev from the Group of Genome Structure and Ageing (University Medical Center Groningen, UMCG) are both part of X-omics Netherlands Infrastructure Consortium as well. Currently, I am involved in the X-omics project with the goal of making a ‘portable’ pipeline to perform proteogenomics analysis from start to end.

I previously studied Bioinformatics at the Hogeschool van Arnhem & Nijmegen. During my Bsc I did an internship at the bacterial genomics group at the Center for Molecular and Biomolecular Informatics (CMBI). Following this research experience, I did another at the Hubrecht institute in Utrecht where Victor Guryev was my supervisor. After I graduated I went to Wageningen University & Research centre (WUR). Here, I did my master theses at the departments of Animal Breeding and Genetics (ABG) and Systems & Synthetic Biology (SSB). After finishing I applied for a job at the department of Experimental Cardiology at the UMCG. During my research there, I was working as a data-analyst/bioinformatician focused on performing GWAS analysis in the context of medical/cardiac phenotypes.

I regretted focusing solely on genomics and transcriptomics and not learning more about proteomics and other omics layers. This PhD within the X-omics Consortium allows me to work at the crossroads of sequencing and mass spectrometry data as part of the data integration and stewardship pillar. I’m interested in bringing the complexities of genomics and transcriptomics to the proteomics field. The integration of the patient specific protein variants to create personalized protein sequence databases, which can be used for database searching large LC-MS/MS datasets and to identify patient specific variants that have implications in complex diseases such as cancer and COPD. Currently, proteomics analysis pipelines rely on canonical sequences from (curated) public databases such as Ensembl and Uniprot. Simply including all possible ‘translate-able’ sequences leads to a large search space and low statistical power to identify protein variants. The goal of my PhD project is to provide a proteogenomics pipeline, which uses genomics and/or transcriptomics data to make a protein database containing all protein variants present in a clinical/biological proteomics sample that is both small and accurate without including large amounts of hypothetical proteins for which there is no support in the genomics or transcriptomics data. In case you have similar interests - let’s collaborate!

NWO logoThis research was (partially) funded by NWO, project 184.034.019