Divining Biological Pathway Knowledge from High-throughput ...

Divining Biological Pathway Knowledge from High-throughput ...

EGAN Tutorial: A Basic Use-case July 2010 Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center University of California, San Francisco

(AKA BCBC HDFCCC UCSF) Preamble This was made using EGAN version 1.3 The EGAN graphical user interface is evolving

Icons may change Menus may change Button/widget placement may change New functionality/data will be added This document probably wont change as quickly Please contact the developers if you notice a major

discrepancy between this document and the latest version of EGAN Overview This document represents a brief demonstration of EGAN functionality; you will learn to

Select a gene list of interest using experiment results Visualize a gene list of interest Link out to external web resources and literature Calculate enrichment scores for gene sets Visualize enriched gene sets as association nodes

Export files and screenshots EGAN is a sandbox, which means theres a lot more you can do Import gene sets from published gene lists Investigate/compare gene lists from multiple experiments Characterize pre-defined gene sets

e.g. all targets of miR-9 or all PPI gene neighbors of PPARG Use a gene list from an experiment to construct a network module Ok, lets begin

Launch the EGAN demo from the website http://akt.ucsf.edu/EGAN/downloads.php If you have any questions/comments http://akt.ucsf.edu/EGAN/contact.php Post a question/comment in the EGAN

discussion forum Welcome to EGAN! Heres a brief overview of the user interface. This is the Network View.

This is the Birds-eye View. This is the Node Table, currently displaying Entrez Gene nodes.

This is the Node Types Table, with the Entrez Gene row selected. Lets examine the Entrez Gene Node Table. Click on the divider and drag the edge of the Node

Table all the way to the left of the screen. Most experiments in EGAN are represented by three columns, statistic, sign and p-value. Its up to you to know how these values pertain to your experiment.

This example experiment represents differential expression between basal-type and luminal-type breast cancer cell lines. This statistic column displays a linear coefficient where + values indicate higher expression in luminal-type and values

indicate higher values in basal-type. This p-value column displays the unadjusted p-value for the coefficient. Click on the p-value column header to sort the gene rows.

Next, left-click on the sign column header to sort the Entrez Gene Node Table. The gene rows will be sorted into two sets, - and +, with p-value providing a secondary sort within each set (since we clicked on that header just before). So semantically, were now looking at genes with higher expression in basal-type breast cancer cell lines, sorted by p-value.

Selecting genes is easy; just click on the top row (MSN) and drag the cursor downwards until you reach a specific pvalue cutoff.

The Node Types Table shows that there are now 24 genes selected in the Entrez Gene Node Table. That was the simple way to select genes. However, we want to select genes by a combination of coefficient and p-value.

Click the D button in the Entrez Gene row of the Node Types Table. This will deselect all Entrez Gene nodes. Now, click the M button at the top of the Entrez Gene Node Table. This will bring up a dialog that will allow us to use multiple criteria to

select genes. If you have multiple experiments visible, you can use this dialog to specify gene selection criteria across multiple experiments. Fill out the appropriate values and click the Select Nodes button.

The Node Types Table indicates that there are 40 genes selected in the Entrez Gene Node Table. Ok, were ready to continue with the analysis. Click the Hide node table button.

You can always bring the Node Table back by clicking the Show node table button. Before we show the selected genes on the Network View, lets save them to a group. Click the G button.

Enter a descriptive name for this group and click OK. Next, left-click on the Custom Node row in the Node Types Table below and show the Custom Node Node Table.

Weve saved this group of 40 genes so we can return to it more quickly in the future. Now, hide the Node Table so we can see the Network View. Click the button to the right. Whenever you have nodes selected, this button will show them on the

Network View. Then click the green layout button above. The Node Types Table shows that there are 40 genes now visible in the Network View.

De-select these nodes either by clicking the D button to the right or by clicking the D button in the Entrez Gene row of the Node Types Table. The Network View shows our 40 selected genes

connected by protein-protein interactions, literature cooccurrence and chromosomal adjacency edges. Note that your actual layout of nodes will look slightly different the layout algorithm is non-deterministic. Navigating/manipulating the Network View

Panning

Zooming

Left-click on a node and drag You will move that node and all other selected nodes in the Network View Tool tip information

Left-click on a node to select it Node selection is shared between the Network View and the Node Table For incremental node selection, hold the shift key while you select nodes

If you left-click on empty space then drag, the red rectangle will define an area of selection Left-clicking in empty space will deselect all visible nodes Moving nodes

Scroll the mouse wheel to zoom in and out If you dont have a mouse wheel, use the + and - buttons at the top of the Network View

Node selection

Right-click on empty space and then drag the cursor to pan around Hover the cursor over a node

You will see a tool tip showing information about that node Node context menu

Right-click on a node Its the same menu that you will see if you right-click on that nodes row in the Node Table Most gene-gene edges in EGAN are supported by literature references. To investigate these references, right-click on an

edge to bring up the edge context menu. The article will be shown in your default web browser. And most nodes in EGAN are backed by external web references. To investigate these references, right-click

on a node to bring up the node context menu. The reference will be shown in your default web browser. Now its time to calculate gene set enrichment scores for these visible genes.

Click on the E button below and choose Association visible enrichment. EGAN will calculate hypergeometric enrichment statistics for all loaded gene sets using the visible genes. Click on the KEGG row below in the Node

Types Table, then click the button to the right to show the KEGG Node Table. The KEGG Node Table now has two extra columns in yellow. Visible Neighbors shows the number of genes in each KEGG pathway that are also visible in the Network View.

Visible Enrichment shows the over-representation statistic for each KEGG pathway calculated using the hypergeometric distribution. Click on the header of the Visible Enrichment column to sort the KEGG Node Table. The enrichment statistics show us that the Pyruvate

metabolism pathway is enriched, because 2 of our genes in our visible set of 40 are also in that pathway. The important question is: which genes? Click the checkbox in the Visible column to make Pyruvate metabolism visible as an association node in

the Network View. Then click on the divider to the left and drag it back so the Network View and Node Table share the horizontal space.

Click the * button at the top to perform an automatic layout, then zoom and pan the Network View to focus on the Pyruvate metabolism association node. You could alternatively click the @ button next to Pyruvate metabolism in the KEGG Node Table it will center the graph on that node.

Note how the Pyruvate metabolism association node has edges connecting it to LDHB and AKR1B1. These edges indicate that those genes belong to the Pyruvate metabolism pathway. All gene sets in EGAN are visualized as association nodes in this way; it provides the advantage of allowing

multiple overlapping gene sets to be visualized together. Right-click on the Pyruvate metabolism association node and use the pop-up menu to show its web reference at KEGG.

When you link out from a visible KEGG association node, visible genes that belong to that pathway will be highlighted in red. To produce an informative hypergraph for our visible set of 40 genes, we just have to

add more enriched association nodes of different types. Click the checkbox in the Visible column of the Node Table for each node you want to add. Note how here I have opted to add only some Gene Ontology Process nodes, and

not all nodes enriched beyond a specific cutoff. I chose to add cell migration, but not cell motility, because those two gene sets are mostly redundant; adding both would not improve interpretability. Producing an informative-but-concise

hypergraph takes some practice. Focus on how each node fits in the context of your experiment. Click this button to maximize the Network View.

This hypgergraph was constructed by selectively adding enriched association nodes of different types. The layout was produced by a combination of automatic and manual steps. The enrichment score suffixes were shown by selecting Nodes -> Node labels -> Suffix -> Visible Enrichment from the D menu to the left. A few last things and then were done. You can export this graph to PDF using the PDF

button to the left. You can also save a snapshot of this graph for future EGAN analysis using the ! button to the right. You may want to save the Visible Enrichment

column in each types Node Table as a permanent column; this way you can do other enrichment analyses in the future and your previous statistics will be preserved. Click the Save Visible Enrichment statistics option in the E menu below.

Make sure to use a descriptive name! Finally, you can export gene set enrichment statistics to a spreadsheet (tab-delimited text) by clicking the TXT button at the top of each Node Table.

In review This document represents a brief demonstration of EGAN functionality; you learned to

Select a gene list of interest using experiment results

Visualize a gene list of interest Link out to external web resources and literature Calculate enrichment scores for gene sets Visualize enriched gene sets as association nodes Export files and screenshots

If you have any questions/comments http://akt.ucsf.edu/EGAN/contact.php Post a question/comment in the EGAN discussion forum

Recently Viewed Presentations

  • Realism as a dramatic art form dates from the late 19th ...

    Realism as a dramatic art form dates from the late 19th ...

    Modern American tragic heroes-In the Modernist era (late 19th and early 20th century), a new kind of tragic hero was created out of a result of this "classical" definition. -Modern playwrights have continued to create characters whose tragic flaws lead...
  • UC San Diego PACE PROGRAM Fitness for Duty Program

    UC San Diego PACE PROGRAM Fitness for Duty Program

    UC San Diego PACE Program. Objectives. Review the PACE FFD Program's: History. Processes. Evaluations completed to date. Discuss challenges faced and goals for the future. How Our FFD Program Came To Be. Robust health screening has always been part of...
  • Agricultural Clearance Orders

    Agricultural Clearance Orders

    Referral Process. Phone interview or send application. Document results. If hired, provide applicant with ETA 790. If the applicant indicates that he/she meets the qualifications and accepts the job referral, then staff shall follow instructions on the job order in...
  • MORPHOLOGY OF RANGE PLANTS Plant Morphology  Describes the

    MORPHOLOGY OF RANGE PLANTS Plant Morphology Describes the

    Plant Morphology. Describes the physical form and external structures of a plant. Range Plants -- OBJ 1: PPT. There are 14 slides in this presentation
  • Leading by Putting Your Followers First - Office of Diversity ...

    Leading by Putting Your Followers First - Office of Diversity ...

    Co-workers. Effectiveness of immediate supervisor most important variable in predicting levels of employee satisfaction, engagement and commitment ... Culture is the integrated pattern of thoughts, communications, actions, customs, belief, values, and institutions associated, wholly or partially ...
  • Strategy Quiz - Miss M's English Blog

    Strategy Quiz - Miss M's English Blog

    Strategy Quiz Lines 1-7 The internet search engine Google, with whom I spend more time than with my loved ones, is planning to put the contents of the world's greatest university libraries online, including the Bodleian in Oxford and those...
  • Ashford District Governor Briefing Autumn 2015 Welcome Agenda

    Ashford District Governor Briefing Autumn 2015 Welcome Agenda

    2016/20 Draft Commissioning Plan. Demand for school places is mainly caused by inward migration connected to house-building and the birth rate (at 2 to 3 points above the County and National rates).
  • You Can Prevent Contamination Objectives: 2-2 Biological, chemical,

    You Can Prevent Contamination Objectives: 2-2 Biological, chemical,

    How Contamination Happens. People can contaminate food when: They don't wash their hands after using the restroom. They are in contact with a person who is sick