MODifieR: an Ensemble R Package for Inference of Disease Modules from Transcriptomics Networks

(1)

Systems biology

MODifieR: an Ensemble R Package for Inference of

Disease Modules from Transcriptomics Networks

Hendrik A. de Weerd

1,2,†

, Tejaswi V. S. Badam

1,2,†

, David Martı´nez-Enguita

2

,

Julia A

˚ kesson

1,2

, Daniel Muthas

3

, Mika Gustafsson

2,†,

* and

Zelmina Lubovac-Pilav

1,†,

*

1

School of Bioscience, Systems Biology Research Center, Sko¨vde 541 45, Sweden,

2

Department of Physics, Chemistry and Biology,

Linko¨ping University, Linko¨ping 581 83, Sweden and

3

Translational Science and Experimental Medicine, Early Respiratory,

Inflammation and Autoimmunity, BioPharmaceuticals R&D, AstraZeneca, Mo¨lndal 43183, Sweden

*To whom correspondence should be addressed.

†_{The authors wish it to be known that, in their opinion, the first two authors and last two authors should be regarded as Joint Authors.} Associate Editor: Lenore Cowen

Received on December 4, 2019; revised on March 27, 2020; editorial decision on April 1, 2020; accepted on April 2, 2020

Abstract

Motivation: Complex diseases are due to the dense interactions of many disease-associated factors that dysregulate

genes that in turn form the so-called disease modules, which have shown to be a powerful concept for

understand-ing pathological mechanisms. There exist many disease module inference methods that rely on somewhat different

assumptions, but there is still no gold standard or best-performing method. Hence, there is a need for combining

these methods to generate robust disease modules.

Results: We developed MODule IdentiFIER (MODifieR), an ensemble R package of nine disease module inference

methods from transcriptomics networks. MODifieR uses standardized input and output allowing the possibility to

combine individual modules generated from these methods into more robust disease-specific modules, contributing

to a better understanding of complex diseases.

Availability and implementation: MODifieR is available under the GNU GPL license and can be freely downloaded

from https://gitlab.com/Gustafsson-lab/MODifieR

and as a Docker image from https://hub.docker.com/r/ddeweerd/

modifier.

Contact: zelmina.lubovac@his.se or mika.gustafsson@liu.se

Supplementary information:

Supplementary data

are available at Bioinformatics online.

1 Introduction

Various algorithms have been proposed to infer groups of disease-associated genes using networks, i.e. disease modules. Recently, the DREAM community compared different module inference methods that were based on network topologies to identify disease-associated genes (Choobdar et al., 2019) which led to the tool MONET (Tomasoni et al., 2019). However, no similar tools exist for extract-ing disease-specific modules by integratextract-ing gene expression differen-ces of patients and controls. In addition, related work proves the benefits of using a consensus approach by integrating results from individual network-based methods for determining gene–disease associations (Navlakha and Kingsford, 2010). We, therefore, pro-pose the R package MODule IdentiFIER (MODifieR), which lets the user easily run nine popular module inference methods within a unified framework and inspect the result. Six methods are integrated from previous packages DIAMOnD (Ghiassian et al., 2015), DiffCoEx (Tesson et al., 2010), MCODE (Bader and Hogue, 2003),

MODA (Li et al., 2016), ModuleDiscoverer (Vlaic et al., 2018) and WGCNA (Langfelder and Horvath, 2012), while three methods have been packaged from our previous publications, namely Clique-Sum (Bruhn et al., 2014;Gustafsson et al., 2014) and Correlation-Clique (Gawel et al., 2019;Hellberg et al., 2016).

MODifieR processes transcriptomic data from RNA-Seq or microarrays to differentially expressed genes (DEGs) and normal-ized expression matrices, encapsulating them into standardnormal-ized in-put objects. We illustrate the use of the tool by applying it to public gene expression datasets from the asthmatic cohorts of the U-BIOPRED project. (Supplementary MaterialS1).

2 Software implementation

The main concept of MODifieR is illustrated inFigure 1. The gen-eral workflow is centered around two main objects: MODifieR_input objects, that contain preprocessed data from

VC_{The Author(s) 2020. Published by Oxford University Press.} ₃₉₁₈

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Bioinformatics, 36(12), 2020, 3918–3919 doi: 10.1093/bioinformatics/btaa235 Advance Access Publication Date: 9 April 2020 Applications Note

(2)

microarray or RNA-Seq and MODifieR_module objects, containing inferred modules and related data.

2.1 Input

Input objects are created from transcriptomic data from microarrays or RNA-Seq data. Differential expression analysis of microarray data is performed on probe level using LIMMA (Ritchie et al., 2015). In case of multiple probes mapped to the same gene, the low-est P-value is selected. For co-expression analysis, probes are col-lapsed into genes using WGCNA collapseRows function (Miller et al., 2011). RNA-Seq differential expression analysis follows the standard edgeR workflow (Robinson et al., 2010). The preprocess-ing for co-expression-based analysis consists of variance-stabilizpreprocess-ing transformation from the DESeq2 package (Love et al., 2014) and an optional quantile normalization. The objects contain DEGs, an ex-pression matrix, grouping of samples and a list of parameters used when creating the object.

To work with a pre-computed list of DEGs or a co-expression matrix, the data can be enclosed in an input object. If a method requires DEGs, it also requires a protein–protein interaction net-work to overlay the DEGs on, provided separately when calling the module inference function. For details, see sub-chapter Inference methods in Vignette.

2.2 Module inference

All module inference methods take the input object described in Section 2.1 as their primary input. The same input object can be used for all inference methods.

The output is always a MODifieR_module object. All MODifieR_module objects contain at least the disease module genes and a list of parameters used to generate the object. Several inference methods offer post-processing functions, such as splitting the mod-ule into sub-modmod-ules.

2.3 Consensus modules

Two approaches for generating a consensus module by integrating individual modules are provided in MODifieR. The first is a simple count-based method where a consensus module is derived from a set of MODifieR modules by counting the frequency of each gene in the individual methods. Supplementary Material S1 provides an ex-ample of the inference of an asthma count-based consensus module. The second method is a network-based approach called S2B (double specific betweenness), relying on betweenness centrality, originally developed to find the overlap between gene sets associated with two diseases (Garcia-Vaquero et al., 2018). The integration methods also return a MODifieR_module object.

2.4 Exporting objects

All objects can be exported to either a series of csv files or an Excel workbook with multiple sheets.

3 Conclusion

MODifieR is a novel R package that provides a set of common dis-ease module inference methods using standardized input and output. It helps to derive more robust disease-specific modules by combining results from these methods. MODifieR provides a valuable comple-ment to existing tools for module-based biomarker discovery and thus contributes to understanding complex diseases.

Funding

This work was supported by the Knowledge Foundation, Swedish Research Council and Swedish foundation for strategic research.

Conflict of Interest: none declared.

References

Bader,G.D. and Hogue,C.W. (2003) An automated method for finding mo-lecular complexes in large protein interaction networks. BMC Bioinformatics, 4, 2.

Bruhn,S. et al. (2014) A generally applicable translational strategy identifies S100A4 as a candidate gene in allergy. Sci. Trans. Med., 6, 218ra214. Choobdar,S. et al.; The DREAM Module Identification Challenge

Consortium. (2019) Assessment of network module identification across complex diseases. Nat. Methods, 16, 843–852.

Garcia-Vaquero,M.L. et al. (2018) Searching the overlap between network modules with specific betweenness (S2B) and its application to cross-disease analysis. Sci. Rep., 8, 11555.

Gawel,D.R. et al. (2019) A validated single-cell-based strategy to identify diag-nostic and therapeutic targets in complex diseases. Genome Med., 11, 47. Ghiassian,S.D. et al. (2015) A DIseAse MOdule Detection (DIAMOnD)

algo-rithm derived from a systematic analysis of connectivity patterns of disease proteins in the human interactome. PLoS Comput. Biol., 11, e1004120. Gustafsson,M. et al. (2014) Integrated genomic and prospective clinical

stud-ies show the importance of modular pleiotropy for disease susceptibility, diagnosis and treatment. Genome Med., 6, 17.

Hellberg,S. et al. (2016) Dynamic response genes in CD4þ T cells reveal a net-work of interactive proteins that classifies disease activity in multiple scler-osis. Cell Rep., 16, 2928–2939.

Langfelder,P. and Horvath,S. (2012) Fast R functions for robust correlations and hierarchical clustering. J. Stat. Softw., 46, i11.

Li,D. et al. (2016) MODA: MOdule differential analysis for weighted gene co-expression network. bioRxiv, 053496.

Love,M.I. et al. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol., 15, 550.

Miller,J.A. et al. (2011) Strategies for aggregating gene expression data: the collapseRows R function. BMC Bioinformatics, 12, 322.

Navlakha,S. and Kingsford,C. (2010) The power of protein interaction net-works for associating genes with diseases. Bioinformatics, 26, 1057–1063. Ritchie,M.E. et al. (2015) LIMMA powers differential expression analyses for

RNA-sequencing and microarray studies. Nucleic Acids Res., 43, e47. Robinson,M.D. et al. (2010) edgeR: a bioconductor package for differential

expression analysis of digital gene expression data. Bioinformatics, 26, 139–140.

Tesson,B.M. et al. (2010) DiffCoEx: a simple and sensitive method to find dif-ferentially coexpressed gene modules. BMC Bioinformatics, 11, 497. Tomasoni,M. et al. (2019) MONET: a toolbox integrating top-performing

methods for network modularisation. bioRxiv, 611418.

Vlaic,S. et al. (2018) ModuleDiscoverer: identification of regulatory modules in protein–protein interaction networks. Sci. Rep., 8, 433.

Fig. 1. Workflow of MODifieR.Printi (A) Creating input objects. (B) Disease mod-ules are inferred by individual methods. (C) Consensus module is derived by inte-grating modules from individual methods

MODifieR 3919