X-omics online workshop series “Strategies to Overcome Your Challenges in Multi-omics Data Integration”
In June 2020, X-omics organized the first online workshop series. Different aspects of challenges and solutions related to multi-omics data integration were addressed in four workshops.
The first workshop on “Data Standards and Multi-omics Integration” chaired by Jasmin Böhmer (UMC Utrecht, CMM) addressed standard data formats of -omics data.
The first invited speaker Juan Antonio Vizcaíno (EMBL-EBI, Hinxton, Cambridge) gave a comprehensive overview of established standards for proteomics, genomics, and transcriptomics data, including formats for raw, processed and archived data. He also gave an introduction to relevant standard initiatives such as the Proteomics Standards Initiative (PSI), and The Global Alliance for Genomics and Health (GA4GH). Finally, he shared his ideas on how new developments on standards for portable workflows, integrative data formats (e.g. for proteogenomics), and public databases for omics data and pathway related information can help integrate different types of omics data.
The next speaker Joeri van der Velde (UMCG, Groningen) introduced the Dutch FAIR Genomes initiative, which develops a connected semantic model for metadata to promote optimal (re)use of Next Generation Sequencing (NGS) data in research and healthcare.
Jasmin Böhmer (UMC Utrecht, CMM) addressed the requirements for the reuse of sensitive data. She explained how the implementation of Data Use Ontology (DUO) tags at the European Genome-phenome Archive (EGA) helps standardize data use conditions and querying available data, e.g. based on informed consent.
The workshop was concluded by an interactive quiz and a Q&A session.
Key take-home messages
- Use standard data formats
- Use controlled vocabularies to report metadata
- Deposit data in public repositories (open or controlled-access)
- FAIR metadata will facilitate (re)use of available data
- Publishing and archiving sensitive human genomics data via the EGA requires a Data Access Committee
- Informed consent conditions are crucial to enable and define future re-use by others
- Data Use Conditions help standardize
- The future is to enable automatized application reviews
- X-omics will carefully consider implementing DUO and DUOS standards
In the second workshop “Linked Data in Practice: An RDF-based Approach with SPARQLing-genomics” Jasmin Böhmer (UMC Utrecht, CMM) gave an introduction to linked data and FAIR data requirements. Roel Janssen (UMC Utrecht, CMM) gave a demo on how genomics data can be converted to the Resource Description Framework (RDF) and queried with the SPARQL Protocol And RDF Query Language. He showed how data can be harmonized by using ontological terms to capture metadata.
Key take-home messages
- We can create a knowledge graph of diverse omics data using RDF
- Ontologies help unifying terminology across research domains
- We can transform a spreadsheet into ontologically defined data
- Flexible data management requires easy data transformations, which can be done with SPARQL
- SPARQLing Genomics is a starting point for interoperable omics data
During the third workshop “Showcases of Multi-omics Data Integration”, chaired by Jenny van Dongen (Vrije Universiteit Amsterdam), Ayşe Demirkan (University of Surrey) talked about “Stories of Two manuscripts”*. She shared insights on the challenges the research team encountered while integrating different types of -omics data (focusing, in particular, on genomic, transcriptomic, methylomic, and metabolomic data) from different sources. Ayşe explained which tools and strategies helped with these challenges.
 Liu, J., Carnero-Montoro, E., van Dongen, J. et al. An integrative cross-omics analysis of DNA methylation sites of glucose and insulin homeostasis. Nat Commun 10, 2581 (2019)
 Liu, J., Lahousse, L., Nivard, M.G. et al. Integration of epidemiologic, pharmacologic, genetic and gut microbiome data in a drug–metabolite atlas. Nat Med 26, 110–117 (2020).
Key take-home messages
- Comprehensive data across many –omics layers in a single cohort is scarce and usually limited to small samples.
- BBMRI-NL brought together Dutch biobanks to allow for very large studies on single –omics layers such as metabolomics and genomics, and analysis across multiple omics (genomics, transcriptomics, methylomics and metabolomics) in a subset of cohorts.
- Reverse causation, confounding, and tissue-specificity are major challenges when analyzing relationships between disease traits and omics data such as epigenomics, transcriptomics, and metabolomics.
- Integration of information from multiple –omics layers can involve analyzing multiple omics data in one group of individuals, or can involve combining information obtained on different omics layers in distinct groups of individuals/samples. The second approach allows for inferences about –omics layers and tissues that were not measured in a cohort of interest and enhance insight into disease pathways across omics layers. Furthermore, it can enhance power by using external data with larger sample sizes and address questions regarding causality.
- Information obtained in distinct omics datasets (based on different individuals) can be integrated at the level of their results (summary statistics).
- Because genetic variants are generally not the consequence of a disease or a change in –omics profiles, information from omics Quantitative Trait Loci (omics QTLs) can be used to examine causal relationships between different omics layers and between omics layers and disease traits.
- Several analytic tools to integrate summary statistics are available, as are atlases of disease-omics associations (such as the GWAS atlas), and cross-omics associations (such as mQTL and eQTL databases).
- Challenges of such tools include differences between datasets in the exact set of measures available for a given omics layer (such as different metabolites being measured on different metabolomics platforms and differences in genome build and SNPs present in genomic datasets that used different reference data for imputation), and small sample sizes of omics QTL studies of internal tissues.
During the fourth workshop “Pitch Your Own Multi-omics Project”, chaired by Purva Kulkarni and Peter-Bram ’t Hoen (both Radboudumc, Nijmegen), three workshop participants- Arjan Hoogendijk (Sanquin), Tabea Riepe (Radboudumc, Nijmegen), Fiona A. Hagenbeek (Vrije Universiteit Amsterdam) were selected to give a brief pitch on their own multi-omics research project. Afterwards, we discussed these projects and the posed challenges in separate breakout sessions together with invited experts (Yang Li (Radboudumc, Nijmegen; Helmholtz Centre for Infection Research, Hannover), Peter Horvatovich (University of Groningen), Karlien Coene (Radboudumc, Nijmegen)) and other workshop participants.
We thank all the participants, the workshop organizing committee, the invited speakers and experts, and the X-omics project management team and look forward to future events.
Anna Niehues (workshop series coordinator) on behalf of the workshop series organizing committee (Jasmin Böhmer (UMC Utrecht, CMM), Jenny van Dongen (Vrije Universiteit Amsterdam), Victor Guryev (UMCG, Groningen), Yanick Paco Hagemeijer (University of Groningen), Peter-Bram ’t Hoen (Radboudumc, Nijmegen), Peter Horvatovich (University of Groningen), Purva Kulkarni (Radboudumc, Nijmegen), Anna Niehues (Radboudumc, Nijmegen), Gurnoor Singh (Radboudumc, Nijmegen))