The R markdown is available from the pulldown menu for Code at the upper-right, choose “Download Rmd”, or download the Rmd from GitHub.

1 Introduction

This tutorial will teach you how to use the DisGeNET Cytoscape App to retrieve data from DisGeNET [1] using the R programming language. A tutorial on how to use the DisGeNET Cytoscape App from Cytoscape is available here.

About DisGeNET

DisGeNET is a discovery platform containing one of the largest publicly available collections of genes and variants associated with human diseases [1]. DisGeNET integrates data from expert curated repositories, GWAS catalogues, animal models and the scientific literature. The data in the platform are homogeneously annotated with controlled vocabularies and community-driven ontologies. Additionally, several original metrics are provided to assist the prioritization of genotype–phenotype relationships. More information on the database contents, statistics and attributes is available here.

Requirements

To run the tutorial you will need to install RCy3:

if(!"RCy3" %in% installed.packages()){
    if (!requireNamespace("BiocManager", quietly=TRUE))
        install.packages("BiocManager")
    BiocManager::install("RCy3")
}
library(RCy3)

You will also need to install and launch Cytoscape:

cytoscapePing()

Finally, install the DisGENET App to access DisGeNET from within Cytoscape. This can be done from the Cytoscape App Store or, if you are running Cytoscape 3.7.0 or above, by simply running this command:

installApp('DisGeNET')  #only available in Cytoscape 3.7.0 and above

Then, install and load the library httr

install.packages("httr")
library(httr)

2 Tutorial

This tutorial will guide you on how to use the DisGeNET Cytoscape App to:

We will create an object for the REST calls to the DisGeNET automation module. The parameters are:

The function returns a string in url format (url), with the given parameters.

disgenetRestUrl<-function(netType,host="127.0.0.1",port=1234,version="v7"){
  if(is.null(netType)){
    print("Network type not specified.")
  }else{
    url<-sprintf("http://%s:%i/disgenet/%s/%s",host,port,version,netType)
  }
  return (url)
}

# Creating the url for GDAs
disgenetRestUrl(netType = "gene-disease-net")
## [1] "http://127.0.0.1:1234/disgenet/v7/gene-disease-net"

Next, we will create an object that will execute the REST calls to the DisGeNET automation module in Cytoscape and retrieve the operation results. The parameters are:

The function returns the object result, a list with the results of the operation.

disgenetRestCall<-function(netType,netParams){
  url<-disgenetRestUrl(netType)
  restCall<-POST(url, body = netParams, encode = "json")
  result<-content(restCall,"parsed")
  return(result)
}

2.1 Create a GDA network associated to a disease

In this example, we will create a gene-disease network for Asthma using the CURATED data in DisGeNET. Below, an example of the object netParams to create the GDA network

geneDisParams <- list(
  source = "CURATED",
  diseaseSearch = "Asthma",
  geneSearch = " ",
  initialScoreValue = "0.0",
  finalScoreValue = "1.0"
)

# Generate the gene-disease network
geneDisResult <- disgenetRestCall("gene-disease-net",geneDisParams)

The network has now been created in Cytoscape!

To change the layout, run the function layoutNetwork. This function receives as an input the network identifier, and a layout name. To investigate the names of the possible layouts, run the function getLayoutNames().

layoutNetwork("force-directed", network = as.numeric(geneDisResult$result$networkSUID))

You can also change the network title, for example:

title <- "my-asthma-GDA-network"
renameNetwork(title, network =   as.numeric(geneDisResult$result$networkSUID))

Create a GDA network associated to a disease using a keyword as input

In this example, we will create a gene-disease network for the diseases containing the keyword Asthma using DisGeNET CURATED data

geneDisParams <- list(
  source = "CURATED",
  diseaseSearch = "*Asthma*",
  geneSearch = " ",
  initialScoreValue = "0.1",
  finalScoreValue = "1.0"
)
geneDisResult <- disgenetRestCall("gene-disease-net",geneDisParams)
layoutNetwork("force-directed", network = geneDisResult$networkResult$networkName)

2.2 Create a GDA network associated to a list of genes

list <-  c("TBX21", "IL5", "IL13", "CCL11", "TGFB1", "ICAM1", "IL4", "IL6", "MMP9", "NOS2", "CCL2", "CCL5", "SCGB3A2", "ADRB2", "ALOX5",  "GSDMB",  "TSLP", "IL33")
geneDisParams <- list(
  source = "CURATED",
  geneSearch = paste(list, collapse = ";"),
  diseaseSearch=  "",
  initialScoreValue = "0.6",
  finalScoreValue = "1.0"
)

geneDisResult <- disgenetRestCall("gene-disease-net",geneDisParams)

2.3 Create a VDA network associated to a list of diseases

In this example, we will create a VDA network associated to:

  • Allergic asthma, C0155877
  • Adult onset asthma, C0741260
  • IgE-mediated allergic asthma, C1827849
variantDisParams <- list(
  source= "ALL",
  diseaseSearch= "Allergic asthma;Adult onset asthma;IgE-mediated allergic asthma",
  initialScoreValue= "0.0",
  finalScoreValue = "1.0",
  showGenes= "false"
)

variantDisResult <- disgenetRestCall("variant-disease-net",variantDisParams)
layoutNetwork("kamada-kawai" , network = variantDisResult$networkResult$networkName)

2.4 Create a VDA network associated to a list of variants

For the following examples, we will use the data from the publication Eighty-eight variants highlight the role of T cell regulation and airway remodeling in asthma pathogenesis. The publication reports a genome-wide association meta-analysis of 69,189 cases and 702,199 controls from Iceland and UK biobank. They find 88 asthma risk variants at 56 loci.

We have made the data available with this tutorial. To read the list of variants, run

variants <-read.csv("https://raw.githubusercontent.com/jpinero/disgenet-app-notebook/main/data/variants.tsv")
variants <- variants$x

There are 88 variants in total

Create a VDA network associated to a list of variants

We will create the network of all the variants from the GWAS study in DisGeNET.

variantDisParams <- list(
  source= "ALL",
  variantSearch=  paste(variants, collapse = ";"), 
  initialScoreValue= "0.0",
  finalScoreValue = "1.0",
  showGenes= "false"
)


variantDisResult <- disgenetRestCall("variant-disease-net",variantDisParams)

layoutNetwork('force-directed defaultSpringCoefficient=0.0001 defaultSpringLength=10')

# We will store the network identifiers to perform the enrichment later 
suid <- as.numeric(variantDisResult$result$networkSUID)

We can retrieve the information associated to the nodes like this:

nodes <-getTableColumns('node')

There are 64 variants from the dataset that are reported in DisGeNET, associated to 143 different diseases.

library(dplyr)
library(DT)
dt  <- nodes %>% filter(nodeType=="disease") %>%  arrange(desc(nrAssociatedVariants)) %>% 
  select(diseaseId,diseaseName, nrAssociatedVariants) %>% head(20)

datatable(dt, caption = "Table 1: Top 20 diseases associated to the list of variants" ,  rownames = FALSE, class = 'cell-border stripe', filter = 'top',
          options = list( pageLength = length(dt$diseaseName),  autoWidth = TRUE,   columnDefs = list(list(width = '100px',  targets = c(0, 1,2))) ))

We can also check which of the variants from the GWAS study are already described as related to asthma.

variantDisParams <- list(
  source= "ALL",
  diseaseSearch= "*asthma*",
  variantSearch=  paste(variants, collapse = ";"), 
  initialScoreValue= "0.0",
  finalScoreValue = "1.0",
  showGenes= "false"
)


variantDisResult <- disgenetRestCall("variant-disease-net",variantDisParams)

layoutNetwork('force-directed defaultSpringCoefficient=0.0001 defaultSpringLength=10')

We can retrieve the information associated to the network edges like this:

edges <-getTableColumns('edge')

Let’s explore the evidence (represented as the edges in the network). We will show only the first 10 rows.

dt  <- edges %>% select(score, sentence, pmid) %>% arrange(desc(score)) %>% unique() %>% head(10)
datatable(dt, caption = "Table 2: Evidences supporting the associations with Asthma" ,  rownames = FALSE, class = 'cell-border stripe', filter = 'top',
              options = list(   pageLength = 20, autoWidth = T, 
                                columnDefs = list(list(width = '100px', targets = c(0, 2)))))

Next we will create a VDA network for a list of variants filtering by disease class. We will retrieve the network of diseases belonging to the class Respiratory Tract Diseases associated to the variants from the GWAS study. This time, we will specify the parameter showGenes as true. Notice that the genes annotated to the variants are now shown.

variantDisParams <- list(
  source= "ALL",
  diseaseClass= "Respiratory Tract Diseases",
  variantSearch=  paste(variants, collapse = ";"), 
  initialScoreValue= "0.0",
  finalScoreValue = "1.0",
  showGenes= "true"
)


variantDisResult <- disgenetRestCall("variant-disease-net",variantDisParams)
layoutNetwork('force-directed defaultSpringCoefficient=0.0001 defaultSpringLength=10')

2.5 Perform a variant enrichment analysis

The variant-enrichment endpoint receives the identifiers of the network containing the variants of interest, and performs an enrichment analysis over the diseases in DisGeNET. The input list of variants should be identified with DBSNP identifiers. The function has other mandatory arguments:

  • networkId: SUID of the network
  • columnName: the name of the column containing the variants
  • source: the source database (by default, database = “CURATED”)
  • pvalueThreshold: a p-value cutoff to filter the results of the enrichment
  • minNumberOfVariants: the minimum number of variants annotated to the disease
  • newNetwork: if TRUE, a new network will be created with the results of the enrichment

The p-values resulting from the multiple Fisher tests are corrected for false discovery rate using the Benjamini-Hochberg method.

To retrieve the name of the column containing the variants

# this retrieves the name of the columns of the node table of the network 
colnames(getTableColumns('node', network = suid))
##  [1] "SUID"                    "shared name"            
##  [3] "name"                    "selected"               
##  [5] "nodeType"                "diseaseId"              
##  [7] "diseaseName"             "diseaseClass"           
##  [9] "diseaseClassName"        "variantId"              
## [11] "variantClass"            "chromosome"             
## [13] "coordinates"             "most_severe_consequence"
## [15] "DSI"                     "DPI"                    
## [17] "associatedGenes"         "nrAssociatedDiseases"   
## [19] "nrAssociatedVariants"    "styleName"              
## [21] "styleSize"

To perform the enrichment:

VarDisEnrichParams <- list(
  networkId = suid,
  columnName = "variantId",
  typeId = "DBSNP",  
  source =  "CURATED",
  pvalueThreshold = 0.05,
minNumberOfVariants  = 2,
  newNetwork = "true"
)

vda_enrich_result <- disgenetRestCall("variant-enrichment",VarDisEnrichParams)
layoutNetwork('force-directed defaultSpringCoefficient=0.0001 defaultSpringLength=10')

2.6 Perform a gene enrichment analysis

To perform a gene enrichment analysis, first it is necessary to create the network containing the genes of interest. We will use as an example the genes associated to the variants from the list of variants of the previous publication.
First, load the data into R:

edges <-read.csv("https://raw.githubusercontent.com/jpinero/disgenet-app-notebook/main/data/variant_gene_network.tsv",sep = "\t")

There are 57 genes in total

Next, we will create the nodes, and edges:

nodes <- data.frame( id = c(as.character(edges$target), as.character(edges$source)), 
                     group = c(rep("gene", length(edges$target)), rep("variant", length(edges$source))) )


net_res <- createNetworkFromDataFrames(nodes =nodes, edges = edges, title = "gene-variant-net")
layoutNetwork('circular')
setVisualStyle("BioPAX_SIF")
##                 message 
## "Visual Style applied."

Once the network has been created, we will retrieve the network identifier

suid <-as.numeric(net_res)

Now we are ready to perform the enrichment. The gene-enrichment endpoint receives an object with the following parameters:

  • networkId: SUID of the network containing the list of genes of interest
  • columnName: the name of the column containing the genes
  • typeId: the identifier of the gene: SYMBOL for gene symbols, or ENTREZID for NCBI Gene Identifiers.
  • source: the source database (by default, database = “CURATED”)
  • pvalueThreshold: a p-value cutoff to filter the results of the enrichment
  • minNumberOfGenes: the minimum number of genes annotated to the disease
  • newNetwork: if TRUE, a new network will be created with the results of the enrichment

The p-values resulting from the multiple Fisher tests are corrected for false discovery rate using the Benjamini-Hochberg method.

library(dplyr)
getTableColumnNames(table = "node", network = suid)
## [1] "SUID"        "shared name" "name"        "selected"    "id"         
## [6] "group"
genDisEnrichParams <- list(
  networkId = suid,
  columnName = "id",
  typeId = "SYMBOL",  
  source =  "CURATED",
  newNetwork = "true"
)

gda_enrich_result <- disgenetRestCall("gene-enrichment",genDisEnrichParams)
top_nodes <- getTableColumns('node', network = gda_enrich_result$result$newNetworkProperties$SUID)

top_nodes  <- top_nodes %>% filter(nrAssociatedGenes>4) %>%  arrange(desc(nrAssociatedGenes)) %>% select(diseaseId, diseaseName, nrAssociatedGenes)
datatable(top_nodes, caption = "Table 3: Top diseases resulting from the variant enrichment" ,  rownames = FALSE, class = 'cell-border stripe', filter = 'top',
              options = list(   pageLength = 10, autoWidth = T ))

3 References

  1. Piñero, Janet and Ramírez-Anguita, Juan Manuel and Saüch-Pitarch, Josep and Ronzano, Francesco and Centeno, Emilio, Sanz, Ferran and Furlong, Laura I, {The DisGeNET knowledge platform for disease genomics: 2019 update , Nucleic Acids Research, 48, Issue D1, D845–D85508 (2020)

  2. Olafsdottir, T.A., Theodors, F., Bjarnadottir, K. et al.Eighty-eight variants highlight the role of T cell regulation and airway remodeling in asthma pathogenesis. Nat Commun 11, 393 (2020).