Data analysis and FAIRification in X-omics – the ACTION demonstrator project
Anna Niehues5, Jenny van Dongen3, XiaoFeng Liao5, Naama Karu4, Alida Kindt4, Casper de Visser5, René Pool3, X-omics/ACTION FAIR and data analysis working group
The data analysis, integration and stewardship pillar of the Netherlands X-omics initiative aims to contribute to the realization of an integrated X-omics infrastructure and to facilitate multi-omics research by providing means for the creation, analysis and integration of FAIR -omics data. In addition to standardization of data and metadata, we envision a FAIR data cube that combines individual -omics data sets or pointers to these data sets with associated linked metadata. The FAIR data cube should provide an interface to query/search rich human- and machine-understandable metadata and extract relevant molecular data for subsequent analysis. Additionally, a ranking algorithm for identifying best matches based on a user-provided query is being considered. This will aid the integration of different types of omics data, and also promote the integration of -omics data from different sources, as well as facilitate submission to relevant data archives. The X-omics/NTR-ACTION demonstrator project is a first use case of our approach toward an integrated workflow combining data FAIRification and reproducible data analysis.
Our data analysis working group investigates the predictive value of multi-omics data for classification of aggressive behavior in children. The NTR-ACTION dataset comprises genomic data (SNP array data on DNA from buccal cells), epigenomic data (Illumina EPIC DNA methylation array data from buccal cells), and biomarker and urine metabolomics data (amines, organic acids, steroidal hormones). Samples were collected from approximately 1300 twins (mean age = 9.7, SD = 1.8, range from 5-13). Data were collected by the Netherlands Twin Register (Ligthart et al. 2019) as part of the ACTION (Aggression in Children: Unraveling gene-environment interplay to inform Treatment and InterventiON strategies) project (Hagenbeek et al 2020, van Dongen et al. 2021), with the goal to identify biomarkers for childhood aggression. Twins were invited for participation in the biomarker study based on their longitudinal data on aggressive behavior at ages 3, 7, and/or 9/10 years. At, or around these ages, parents of twins completed the Achenbach System of Empirically Based Assessment (ASEBA) Child Behavior Checklist (CBCL) and teachers of twins completed the ASEBA Teacher Rating Form (TRF). A design was chosen that selected twin pairs who had either concordant (low-low, high-high) or discordant (low-high) indices of childhood aggression.
For FAIRification of this multi-omics data set, we consider existing standards used in the respective communities of metabolomics and (epi)genomics, as well as requirements by commonly used data archives (e.g. EBI MetaboLights). We make use of the ISA (Investigation, Study, Assay, https://isa-tools.org/) framework to capture experimental metadata, and re-use and extend the semantic metadata schema developed by the FAIR Genomes project
For our data analysis workflow, we follow recommendations for FAIR software (https://fair-software.nl/) and discuss approaches for containerization and workflow management to facilitate reuse with other X-omics partners in the X-omics workflow working group.
The FAIR working group develops the FAIR data cube based on the principle that data should be "as open as possible and as closed as necessary". By incorporating a FAIR Data Point (FDP) component internally, the metadata can be as open as possible and be FAIR-at-the-source. The metadata contents (in our case, ISA metadata) are generated semi-automatically from the data source and exposed to user query in the form of linked metadata (in our case linked ISA). Resources (linked metadata about studies) are publicly accessible by anyone. Administration is needed to create/edit resources. The user control is implemented as part of the FDP. By setting up an access control mechanism between the FDP and the data (i.e., e.g, measured molecular profiles), the data can be as closed as necessary, which enables properly addressing aspects of legislation, privacy, and ethics. The knowledge graph content is generated via existing and newly developed tools. The FAIR Data Cube is an ongoing work hosted on Github (https://github.com/cmbi/FAIRDataCube).
X-omics/ACTION FAIR and data analysis working group: Fernanda de Andrade1, Jasmin Böhmer2, Jenny van Dongen3, Dorret Boomsma3, Fiona Hagenbeek3, Peter-Bram 't Hoen5, Naama Karu4, Alida Kindt4, Purva Kulkarni5, XiaoFeng Liao5, Leon Mei6, Anna Niehues5, René Pool3, Dieuwke Roelofs-Prins1, Gurnoor Singh5, Morris Swertz1, Joeri van der Velde1, Casper de Visser5, Michael van Vliet4, Gerben van der Vries1
1UMCG, 2UMCU, 3VU, 4LU, 5Radboudumc, 6LUMC