salvagente NCGs Help Page salvagente

NCGs is a web resource to analyze duplicability, orthology and network properties of cancer genes. Cancer genes are defined in three ways:
  1. The list of cancer genes derived from the manually curated Cancer Gene Census (375 CGC-genes), from Futreal PA et al. (2004)
  2. Four lists of genes that were recently found mutated in breast, colorectal, pancreatic cancers and glioblastoma (380 CAN-genes):
  3. The list of genes mutated in lung adenocarcinoma, as part of the Tumor Sequencing Project (26 TSP-genes), from Ding L et al.


Table of Contents:
back to Top

Search Page
From the homepage the user may query the database in three ways:
  1. Quick Gene Search: a single gene or a user-defined list of genes
  2. Simple Query Using Gene List: a simple query to be chosen among the available ones
  3. Complex Query Using Gene Lists, which allows to select filters based on duplicability, gene appearance and network properties

Search Gene
indexQuickSearch

The user may insert a single gene identifier or a list of gene identifiers, to be chosen among four possibilities:
  1. Gene symbol: to query for lists of genes, use * (e.g. *RAS will display 3 genes: HRAS, KRAS, NRAS)
  2. Entrez identifier (e.g. 3265)
  3. RefSeq protein identifier (e.g. NP_001123914)
  4. Ensembl protein identifier (e.g. ENSP00000309845)

Browse Gene Lists
indexSimpleQuery

The user may choose among several pre-compiled queries, which allow to extract genes that share mutations in the same cancer types.

Complex Query Using Gene Lists
indexComplexQuery

The gene lists for the complex queries allow the user to analyze lists of cancer genes with similar properties. Four types of filters are implemented:
  1. Cancer genes: the choice is between all cancer genes, CGC-genes or CAN-genes
  2. Duplicability: duplicable or singleton genes
  3. Protein Interaction Network (PIN): hubs or non-hubs; hubs are defined as the top 5% most connected nodes in the human PIN: in our case, hubs correspond to genes with degree higher than 40
  4. Appearance in evolution: genes appeared in the same evolutionary period

back to Top

Results Page
The results page contains four sections for each gene:
  1. Gene description
  2. Duplicability properties
  3. Orthology properties
  4. Network properties

Gene Description
queryGene

This section includes the general information about the analyzed gene: symbol, description, information about which cancer gene lists it belongs to, and links to external databases, such as Entrez, HPRD, OMIM, RefSeq, Ensembl.

Duplicability properties
queryDuplicability

Duplicability is defined as in Rambaldi D et al. (2008): it is measured by aligning the corresponding protein sequences directly to the human genome, using the BLAST-like Aligment Tool (BLAT). We define as duplicates all additional genomic matches covering at least 60% of the query length. Singletons are all those genes which do not have any additional hit above 60% of the query length.
The button Duplicated loci opens a new page which describes all the duplicated loci related to the studied gene.

Orthology properties
queryOrthology

The appearance of a gene is defined as the deepest taxonomic branch of the tree of life where an ortholog can be detected. In order to retrive orthology relationships eggNOG is used.
Seven branches of the tree of life are defined: The button Orthologs opens a new page which describes all the orthology relationships of the gene of interest in detail.

Network properties
queryNetwork

Two types of network properties are reported:
  • Degree, which measures the number of connections for every single node in the network;
  • Clustering Coefficient, which measures the interconnectivity of the neighbors surrounding a node, and has values that range between 0 (the neighbors are not connected) and 1 (all the neighbors are connected), as defined in Watts DJ and Strogatz SH (1998);
  • Betweenness, which measures the protein centrality in a protein protein interaction network, is defined as the number of shortest paths between any other node crossing a given node Kwang-Il Goh et al.(2002).
  • Hub: top 5 % of the most connected proteins in the network;
  • Centrality: proteins with top 5 % values of the betweenness.
The network properties are derived from several databases of PINs:

Dataset Version Nodes Interactions Publications
BioGRID 2.0.49 (Feb 1st 2009) 7163 23588 8815
IntAct Jan 23rd 2009 7066 22119 1374
MINT Feb 5th 2009 5151 12653 1210
DIP Jan 26th 2009 1108 1326 739
HPRD Sep 1st 2007 8697 34938 17770
Total 11988 68498 19886

The button Network opens a new page which describes all the network properties of the gene of interest in detail.

back to Top

Duplicability Page
duplociTable

Duplicability is defined as in Rambaldi D et al. (2008): it is measured by aligning the corresponding protein sequences directly to the human genome, using the BLAST-like Aligment Tool (BLAT). We define as duplicates all additional genomic matches covering at least 60% of the query length. Singletons are all those genes which do not have any additional hit above 60% of the query length.
Four types of Hit are defined, depending on the genomic location of the duplicated locus:
  • Best Hit, which corresponds to the original gene locus;
  • Other Gene Hits, which include other gene loci where the gene of interest is duplicated (may be either one gene hit or more than one)
  • Genomic, which include loci with no known genes mapped (no genes are defined by the UCSC Genome Browser, but mRNAs or ESTs may be present);
The default cutoff to display genomic hits is 60% of the original length, but the user is allowed to choose different cutoffs from the widget next to the table. The range of choice varies from 10% of the query length to 100%.
duplociCutoff

back to Top

Orthology Page
The orthology relationships are derived from eggNOG.

Tree of life
orthologyTree

The Tree of Life, which is displayed, includes all the genes which are orthologs of the gene of interest; information about how many organisms and how many genes are found for each branch is diplayed, which gives an idea of how many duplications the gene has undergone during evolution.

Orthology table
orthologyTable

The Orthology Table describes the relationship between the analyzed human gene and its orthologs, which can be 1 to 1 (no duplications occurred), 1 to N (one-to-many relationship: a duplication occurred in a lineage that is not Primates), N to 1 (many-to-one relationship: a duplication occurred in Human, but not in the other lineages), N to N (many-to-many relationship: duplications occurred early in evolution and have been kept), 0 (no orthologs are found in a particular lineage).

back to Top

Network Page
The network is displayed using Medusa.
Network visualization
networkVisualization

The first-level network for the gene of interest (which is in the center of the image) is displayed: The primary interactions (i.e. the interaction between the gene of interest and the other genes)are colored in green, while the secondary interactions (i.e. the interactions among the inteactors of the gene of interest) are colored in pink.
The thickness of the lines representing the interactions is based on the number of experiments which support the interaction: the thinner lines represent single experiments, while the thicker ones represent interactions found in more than one experiment.
Singleton genes are colored in red, while duplicable genes are colored in blue.
The shape of the nodes changes depending on whether the genes belong to the list of cancer genes or not: squares represent CGC-genes, triangles represent CANgenes, rhombs represent TSP-genes , and circles represent the other human genes.

Network Table
networkTable

The Table includes all the duplicability, evolutionary and network properties of the genes which interact with the gene of interest, ordered by network degree. The second column describes whether a gene belongs to the list of cancer genes, while the last column describes the experiments (Pubmed IDs) that have determined the interaction.
The fields where no information is provided are labelled with N/A.

back to Top