logo The R markdown is available from the pulldown menu for Code at the upper-right, choose “Download Rmd”, or download the Rmd from GitHub.

1 Introduction

This tutorial will teach you how to use the DISGENET Cytoscape App to retrieve data from DISGENET [1] using the R programming language. To run this tutorial, you need to register for a free trial account at DISGENET, and retrieve you API

Once you have completed the registration process, go to your user profile…

… and retrieve your API key

Copy your API key, and go to the Cytoscape Apps menu, click on DISGENET -> Set API key, and paste it. You would need to do this process just the first time.

About DISGENET

DISGENET is a discovery platform containing one of the largest publicly available collections of genes and variants associated with human diseases [1]. DISGENET integrates data from expert curated repositories, Clinical trials, GWAS catalogs, animal models and the biomedical literature. The data in the platform are homogeneously annotated with controlled vocabularies and community-driven ontologies. Additionally, several original metrics are provided to assist the prioritization of genotype–phenotype relationships. More information on the database contents, statistics and attributes is available at https://www.disgenet.com.

Requirements

To run the tutorial you will need to install RCy3:

if(!"RCy3" %in% installed.packages()){
    if (!requireNamespace("BiocManager", quietly=TRUE))
        install.packages("BiocManager")
    BiocManager::install("RCy3")
}
library(RCy3)
## Warning: package 'RCy3' was built under R version 4.2.1

You will also need to install and launch Cytoscape:

Finally, install the DISGENET App to access DISGENET from within Cytoscape. This can be done from the Cytoscape App Store.

Then, install and load the library httr

if (!requireNamespace("httr", quietly = TRUE)) {
  install.packages("httr")
}
library(httr)

Load other necessary libraries.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(DT)
## Warning: package 'DT' was built under R version 4.2.3

2 Tutorial

This tutorial will guide you on how to use the DISGENET Cytoscape App to:

We will create an object for the REST calls to the DISGENET automation module. The parameters are:

The function returns a string in url format (url), with the given parameters.

disgenetRestUrl<-function(netType,host="127.0.0.1",port=1234,version="v8"){
  if(is.null(netType)){
    print("Network type not specified.")
  }else{
    url<-sprintf("http://%s:%i/disgenet/%s/%s",host,port,version,netType)
  }
  return (url)
}

# Creating the url for GDAs
disgenetRestUrl(netType = "gene-disease-net")
## [1] "http://127.0.0.1:1234/disgenet/v8/gene-disease-net"

Next, we will create an object that will execute the REST calls to the DISGENET automation module in Cytoscape and retrieve the operation results. The parameters are:

The function returns the object result, a list with the results of the operation.

disgenetRestCall<-function(netType,netParams){
  url<-disgenetRestUrl(netType)
  restCall<-POST(url, body = netParams, encode = "json")
  result<-content(restCall,"parsed")
  return(result)
}

2.1 Create a GDA network associated to a disease

In this example, we will create a gene-disease network for Asthma (Concept C0004096) using the CURATED data in DISGENET. Below, an example of the object netParams to create the GDA network. Notice that we are also filtering by the DISGENET score (0.8-1).

geneDisParams <- list(
  source = "CURATED",
  diseaseSearch = "C0004096",
  geneSearch = " ",
  geneProteinClass = "Any",
  diseaseClass = "Any",
  initialScoreValue = "0.8",
  finalScoreValue = "1.0"
)

# Generate the gene-disease network
geneDisResult <- disgenetRestCall("gene-disease-net",geneDisParams)

The network has now been created in Cytoscape!

To change the layout, run the function layoutNetwork. This function receives as an input the network identifier, and a layout name. To investigate the names of the possible layouts, run the function getLayoutNames().

layoutNetwork("force-directed", network = as.numeric(geneDisResult$result$networkSUID))

You can also change the network title, for example:

title <- "my-asthma-GDA-network"
renameNetwork(title, network =   as.numeric(geneDisResult$result$networkSUID))

Create a GDA network associated to a disease using a keyword as input

In this example, we will create a gene-disease network for the diseases containing the keyword Asthma using DISGENET CURATED data. Notice that 3 different concepts related to Asthma subtypes are retrieved.

geneDisParams <- list(
  source = "CURATED",
  diseaseSearch = "*Asthma*",
  geneSearch = " ",
   geneProteinClass = "Any",
  diseaseClass = "Any",
  initialScoreValue = "0.5",
  finalScoreValue = "1.0"
)
geneDisResult <- disgenetRestCall("gene-disease-net",geneDisParams)
layoutNetwork("force-directed", network = geneDisResult$networkResult$networkName)

2.2 Create a GDA network associated to a list of genes

In this example, we will perform a query to DISGENET data (only the Clinvar subset) with a list of gehes. The list of genes is separated using ,, but ; are also accepted as separators.

list <-  c( "PSEN1", "PSEN2", "TARDBP", "PRNP")
geneDisParams <- list(
  source = "CLINVAR",
  geneSearch = paste(list, collapse = ";"),
  diseaseSearch=  "",
     geneProteinClass = "Any",
  diseaseClass = "Any",
  initialScoreValue = "0",
  finalScoreValue = "1.0"
)

geneDisResult <- disgenetRestCall("gene-disease-net",geneDisParams)

2.3 Create a VDA network associated to a list of diseases

In this example, we will create a VDA network associated to:

  • Allergic asthma, C0155877
  • Adult onset asthma, C0741260
  • IgE-mediated allergic asthma, C1827849
variantDisParams <- list(
  source= "CURATED",
  diseaseSearch= "Allergic asthma;Adult onset asthma;IgE-mediated allergic asthma",
  initialScoreValue= "0.7",
  finalScoreValue = "1.0",
  showGenes= "false"
)

variantDisResult <- disgenetRestCall("variant-disease-net",variantDisParams)
layoutNetwork("kamada-kawai" , network = variantDisResult$networkResult$networkName)

2.4 Create a VDA network associated to a list of variants

For the following examples, we will use the data from the publication Eighty-eight variants highlight the role of T cell regulation and airway remodeling in asthma pathogenesis. The publication reports a genome-wide association meta-analysis of 69,189 cases and 702,199 controls from Iceland and UK biobank. They find 88 asthma risk variants at 56 loci.

We have made the data available with this tutorial. To read the list of variants, run

variants <-read.csv("https://raw.githubusercontent.com/jpinero/disgenet-app-notebook/main/data/variants.tsv")
variants <- variants$x

There are 88 variants in total

Create a VDA network associated to a list of variants

We will create the network of all the variants from the GWAS study in DisGeNET.

variantDisParams <- list(
  source= "ALL",
  variantSearch=  paste(variants, collapse = ","), 
  initialScoreValue= "0.0",
  finalScoreValue = "1.0",
  showGenes= "false"
)


variantDisResult <- disgenetRestCall("variant-disease-net",variantDisParams)

layoutNetwork('force-directed defaultSpringCoefficient=0.0001 defaultSpringLength=10')

# We will store the network identifiers to perform the enrichment later 
suid <- as.numeric(variantDisResult$result$networkSUID)

We can retrieve the information associated to the nodes like this:

nodes <-getTableColumns('node')

There are 88 variants from the dataset that are reported in DisGeNET, associated to 194 different diseases.

dt  <- nodes %>% filter(nodeType=="disease") %>%  arrange(desc(nrAssociatedVariants)) %>% 
  select(diseaseId,diseaseName, nrAssociatedVariants) %>% head(20)

datatable(dt, caption = "Table 1: Top diseases associated to the list of variants" ,  rownames = FALSE, class = 'cell-border stripe', filter = 'top',
          options = list( pageLength = length(dt$diseaseName),  autoWidth = TRUE,   columnDefs = list(list(width = '100px',  targets = c(0, 1,2))) ))

We can also check which of the variants from the GWAS study are already described as related to asthma.

variantDisParams <- list(
  source= "ALL",
  diseaseSearch= "*asthma*",
  variantSearch=  paste(variants, collapse = ", "), 
  initialScoreValue= "0.0",
  finalScoreValue = "1.0",
  showGenes= "false"
)


variantDisResult <- disgenetRestCall("variant-disease-net",variantDisParams)

layoutNetwork('force-directed defaultSpringCoefficient=0.0001 defaultSpringLength=10')

We can retrieve the information associated to the network edges like this:

edges <-getTableColumns('edge')

Let’s explore the evidence (represented as the edges in the network). We will show only the first 10 rows.

dt  <- edges %>% select(score, `evidence index`, `initial year`,  `final year`) %>% arrange(desc(score)) %>% unique() %>% head(10)
dt$score <- round(dt$score, digits = 2)
datatable(dt, caption = "Table 2: Details supporting the associations with Asthma" ,  rownames = FALSE, class = 'cell-border stripe', filter = 'top',
              options = list(   pageLength = 20, autoWidth = T, 
                                columnDefs = list(list(width = '100px', targets = c(0, 2)))))

Next we will create a VDA network for a list of variants filtering by disease class. We will retrieve the network of diseases belonging to the class Respiratory Tract Diseases associated to the variants from the GWAS study. This time, we will specify the parameter showGenes as true. Notice that the genes annotated to the variants are now shown.

variantDisParams <- list(
  source= "ALL",
  diseaseClass= "Respiratory Tract Diseases",
  variantSearch=  paste(variants, collapse = ", "), 
  initialScoreValue= "0.0",
  finalScoreValue = "1.0",
  showGenes= "true"
)


variantDisResult <- disgenetRestCall("variant-disease-net",variantDisParams)
layoutNetwork('force-directed defaultSpringCoefficient=0.0001 defaultSpringLength=10')

2.5 Perform a variant enrichment analysis

The variant-enrichment endpoint receives the identifiers of the network containing the variants of interest, and performs an enrichment analysis over the diseases in DisGeNET. The input list of variants should be identified with DBSNP identifiers. The function has other mandatory arguments:

  • networkId: SUID of the network
  • columnName: the name of the column containing the variants
  • source: the source database (by default, database = “CURATED”)
  • pvalueThreshold: a p-value cutoff to filter the results of the enrichment
  • minNumberOfVariants: the minimum number of variants annotated to the disease
  • newNetwork: if TRUE, a new network will be created with the results of the enrichment

The p-values resulting from the multiple Fisher tests are corrected for false discovery rate using the Benjamini-Hochberg method.

To retrieve the name of the column containing the variants

# this retrieves the name of the columns of the node table of the network 
colnames(getTableColumns('node', network = suid))
##  [1] "SUID"                    "shared name"            
##  [3] "name"                    "selected"               
##  [5] "nodeType"                "diseaseId"              
##  [7] "diseaseName"             "diseaseClass"           
##  [9] "diseaseClassName"        "variantId"              
## [11] "variantClass"            "chromosome"             
## [13] "coordinates"             "most_severe_consequence"
## [15] "DSI"                     "DPI"                    
## [17] "associatedGenes"         "nrAssociatedDiseases"   
## [19] "nrAssociatedVariants"    "styleName"              
## [21] "styleSize"

To perform the enrichment:

VarDisEnrichParams <- list(
  networkId = suid,
  columnName = "variantId",
  typeId = "DBSNP",  
  source =  "ALL",
  pvalueThreshold = 0.05,
minNumberOfVariants  = 25,
  newNetwork = "true"
)

vda_enrich_result <- disgenetRestCall("variant-enrichment",VarDisEnrichParams)
layoutNetwork('force-directed defaultSpringCoefficient=0.0001 defaultSpringLength=10')

2.6 Perform a gene enrichment analysis

To perform a gene enrichment analysis, first it is necessary to create the network containing the genes of interest. We will use as an example the genes associated to the variants from the list of variants of the previous publication.
First, load the data into R:

edges <-read.csv("https://raw.githubusercontent.com/jpinero/disgenet-app-notebook/main/data/variant_gene_network.tsv",sep = "\t")

There are 57 genes in total

Next, we will create the nodes, and edges:

nodes <- data.frame( id = c(as.character(edges$target), as.character(edges$source)), 
                     group = c(rep("gene", length(edges$target)), rep("variant", length(edges$source))) )


net_res <- createNetworkFromDataFrames(nodes =nodes, edges = edges, title = "gene-variant-net")
layoutNetwork('circular')
setVisualStyle("BioPAX_SIF")
##                 message 
## "Visual Style applied."

Once the network has been created, we will retrieve the network identifier

suid <-as.numeric(net_res)

Now we are ready to perform the enrichment. The gene-enrichment endpoint receives an object with the following parameters:

  • networkId: SUID of the network containing the list of genes of interest
  • columnName: the name of the column containing the genes
  • typeId: the identifier of the gene: SYMBOL for gene symbols, or ENTREZID for NCBI Gene Identifiers.
  • source: the source database (by default, database = “CURATED”)
  • pvalueThreshold: a p-value cutoff to filter the results of the enrichment
  • minNumberOfGenes: the minimum number of genes annotated to the disease
  • newNetwork: if TRUE, a new network will be created with the results of the enrichment

The p-values resulting from the multiple Fisher tests are corrected for false discovery rate using the Benjamini-Hochberg method.

getTableColumnNames(table = "node", network = suid)
## [1] "SUID"        "shared name" "name"        "selected"    "id"         
## [6] "group"
genDisEnrichParams <- list(
  networkId = suid,
  columnName = "id",
  typeId = "SYMBOL",  
  source =  "CURATED",
    pvalueThreshold = 0.05,
  minNumberOfGenes  = 5,
  newNetwork = "true"
)

gda_enrich_result <- disgenetRestCall("gene-enrichment",genDisEnrichParams)
print(gda_enrich_result)
## $message
## [1] "Operation completed succesfully"
## 
## $result
## $result$newNetworkProperties
## $result$newNetworkProperties$SUID
## [1] 84486
## 
## $result$newNetworkProperties$`Number of genes`
## [1] 23
## 
## $result$newNetworkProperties$`Number of diseases`
## [1] 7
# top_nodes <- getTableColumns('node')
# print(top_nodes)


# top_nodes  <- top_nodes %>% filter(nrAssociatedGenes>4) %>%  arrange(desc(nrAssociatedGenes)) %>% select(diseaseId, diseaseName, nrAssociatedGenes)
# datatable(top_nodes, caption = "Table 3: Top diseases resulting from the variant enrichment" ,  rownames = FALSE, class = 'cell-border stripe', filter = 'top',
#               options = list(   pageLength = 10, autoWidth = T ))

3 References

  1. Piñero, Janet and Ramírez-Anguita, Juan Manuel and Saüch-Pitarch, Josep and Ronzano, Francesco and Centeno, Emilio, Sanz, Ferran and Furlong, Laura I, {The DisGeNET knowledge platform for disease genomics: 2019 update , Nucleic Acids Research, 48, Issue D1, D845–D85508 (2020)

  2. Olafsdottir, T.A., Theodors, F., Bjarnadottir, K. et al.Eighty-eight variants highlight the role of T cell regulation and airway remodeling in asthma pathogenesis. Nat Commun 11, 393 (2020).