A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole Goble and the rest of the myGrid team. 1 Open Source Upper Middleware for Bioinformatics (Web) Service-based architecture Targeted at Tool Developers, Bioinformaticians and Service Providers Newcastle Sheffield Manchester Nottingham Hinxton Southampton 2 myGrid People Core Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole
Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pockock Milena Radenkovic, Stefan RennickEgglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe. Users Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK Hannah Tipney, May Tassabehji, Andy Brass, St Marys Hospital, Manchester, UK Postgraduates Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) Robin McEntire (GSK) Collaborators Keith Decker 3 Roadmap - start services data 4 Tenet I High level Middleware
services for data intensive resource interoperation for Bioinformatics Information Grid not computational Grid Exploratory, ad hoc For individuals In silico experiment as workflow Distributed query processing Information Management 6 Tenet II High level services for e-Science experimental management; Scientific discovery is personal & global. Federated third party registries for workflows and services Workflow and service discovery for
reuse and repurposing Annotate Registry Register Sharing knowledge and sharing components Find Provenance Event notification Personalisation 7 Tenet III Open Source and Open Services No control or influence over service providers Open to third party metadata and services Open extensible architecture Assemble your own
components Designed to work together Toolkit Semantic Discovery Pedro View UDDI registry Taverna Event Freefluo Notification WfEE Info. Model Soaplab Haystack Provenance Browser Gateway
& Portal LSID mIR 8 Tenet IV (Web) Service architecture Publication, discovery, interoperation, composition, decommissioning of myGrid services WS-I -> OGSA / WSRF Metadata driven Ontologies Common information model Semantic Web technologies RDF, OWL 9 Tenet V Middleware for Tool Developers
Bioinformaticians Service Providers Biologists are indirectly supported by the portals and apps these develop. 10 Roadmap discover services run workflows data management services workflows workflows data 11 Data-intensive bioinformatics ID DE DE
DE GN OS OC OC KW FT FT SQ MURA_BACSU STANDARD; PRT; 429 AA. PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE (EC 184.108.40.206) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE ENOLPYRUVYL TRANSFERASE) (EPT). MURA OR MURZ. BACILLUS SUBTILIS. BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; BACILLUS. PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. ACT_SITE 116 116 BINDS PEP (BY SIMILARITY). CONFLICT 374 374
S -> A (IN REF. 3). SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI 12 Use Scenarios Graves Disease Autoimmune disease of the thyroid Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle Discover all you can about a gene Annotation pipelines and Gene expression analysis Services from Japan, Hong Kong, various sites in UK Williams-Beuren Syndrome Microdeletion of 155 Mbases on Chromosome 7 Hannah Tipney, May Tassabehji, Andy Brass, St Marys Hospital, Manchester, UK Characterise an unknown gene Annotation pipelines and Gene expression analysis Services from USA, Japan, various sites in UK 13
Manually filling a genomic gap Two major steps: Extend into the gap: Similarity searches; RepeatMasker, BLAST Characterise the new sequence: NIX, Interpro, etc Numerous web-based services (i.e. BLAST, RepeatMasker) Cutting and pasting between screens Large number of steps Frequently repeated info now rapidly added to public databases Dont always get results Time consuming Huge amount of interrelated data is produced handled in lab book and files saved to local hard drive Mundane Much knowledge remains undocumented Bioinformatician does the analysis 14 Query nucleotide sequence
Pink: Outputs/inputs of a service Purple: Taylor-made services Green: Emboss soaplab services Yellow: Manchester soaplab services Grey: Unknowns RepeatMasker ncbiBlastWrapper GenBank Accession No WBS Workflows: URL inc GB identifier Translation/sequence file. Good for records and publications prettyseq GenBank Entry Amino Acid translation Identifies PEST seq Identifies FingerPRINTS MW, length, charge, pI,
etc Predicts Coiled-coil regions tblastn Vs nr, est, est_mouse, est_human databases. Blastp Vs nr Predicts cellular location Sort for appropriate Sequences only epestfind 6 ORFs Seqret pscan pepstats pepcoil Nucleotide seq (Fasta) sixpack ORFs transeq
Octanol? RepeatMasker ncbiBlastWrapper Restriction enzyme map CpG Island locations and % Repetative elements Blastn Vs nr, est databases. 15 Candidate gene pool Graves Disease Bioinformatics Annotation Pipeline What is known about my candidate gene? Peter Li1, Claire Jennings2, Simon Pearce2 and Anil Wipat1, (2003) 1
School of Computing Science and 2Institute of Human Genetics, University of Newcastle-upon-Tyne. Genotype Assay Design System 3D Protein Structure Is this SNP present in my samples? What is the structure of the protein product encoded by my candidate gene? Gene ID Query PDB & display protein structure Medline Primer Design EMBL GO Emboss Eprimer application in SoapLab Use primers designed by myGrid to amplify region flanking SNP on the gene
Interpro Swiss-ProtAMBIT Restriction Fragment Length Polymorphism experiment BLAST DQP Obtain information about protein & extract information about active site SNP Query OMIM PDB Selection of restriction enzyme Emboss Restrict in SoapLab SNP Determine whether coding SNP affects the active site of the protein Talisman
SN P SN P AMBIT 16 Experiment life cycle Forming experiments Discovering and reusing experiments and resources Personalisation Executing and monitoring experiments Sharing services & experiments Managing lifecycle,
provenance and results of experiments 17 (e-)Scientists Experiment Can workflow be used as an experimental method? How many times has this experiment been run? Analyze How do we manage the results to draw conclusions from them? How reliable are these results? Collaborate Can we share workflows, results, metadata etc? Publish Can we link to these workflows and results from our papers? Review Can I find, comprehend and review your work? How was that result derived? 18 Collections of Tasks Domain
Tasks Service Providers Building Workflow Enactment Bioinformaticians Scientists Storage Data Provenance Management Querying Description Service Finding Discovery Annotation providers 19 Bioinformaticians Registry Querying/sharing/
federating/registering Annotation/description Query & Retrieve Discovery View invoking Annotation providers Pedro Annotation tool Taverna WF Builder Workflow Execution FreeFluo Enactor Interface Description Store data/ knowledge mIR
Others Vocabulary Ontology Store Service WSDLSoap- Providers lab Haystack Provenance Browser Data descriptions Scientists 20 Tool Providers Taverna Talisman Web Portal Gateway Registries
Service and Workflow Discovery Ontologies Ontology Mgt Views Metadata Mgt FreeFluo Workflow Enactment Engine Personalisation Provenance Event Notification Core services myGrid Information Repository OGSA-DQP Distributed Query Processor Web WebService Service(Grid (GridService)
Service)communication communicationfabric fabric SoapLab GowLab Legacy apps Legacy apps Native Web Services AMBIT Text Extraction Service 21 External services Service Providers Work bench Grid Service Stack Applications
Bioinformaticians my Two+ Paths Innovative work Core functionality Services Soaplab Service and workflow and Gowlab registration Workflow enactment Semantic discovery engine Freefluo Provenance Workflow workbench Taverna management Data integration Text mining OGSADQP Information model & management In between Event notification Gateway 22
Tool Providers Taverna Talisman Web Portal Gateway Registries Service and Workflow Discovery Ontologies Ontology Mgt Views Metadata Mgt FreeFluo Workflow Enactment Engine Personalisation Provenance Event Notification
Core services myGrid Information Repository OGSA-DQP Distributed Query Processor Web WebService Service(Grid (GridService) Service)communication communicationfabric fabric SoapLab GowLab Legacy apps Legacy apps Native Web Services AMBIT Text Extraction Service
23 External services Service Providers Work bench Grid Service Stack Applications Bioinformaticians my 24 Run the Workflow Viewing intermediate results 25 Run the Workflow 26 Drilling Down:
my Grid and Semantics Workflow and service discovery Prior to and during enactment Semantic registration Workflow assembly Semantic service typing of inputs and outputs Provenance of workflows and other entities Experimental metadata glue Use of RDF, RDFS, DAML+OIL/OWL Instance store, ontology server, reasoner Materialised vs at point of delivery reasoning. my Grid Information Model 27 Semantic Discovery Pedro data capture tool View annotations on workflow Drag a workflow entry into the
explorer pane and the workflow loads. Drag a service/ workflow to the scavenger window for inclusion 28 into the workflow Tutorial focus Innovative work Core functionality Services Soaplab Service and workflow and Gowlab registration Workflow enactment Semantic discovery engine Freefluo Provenance Workflow workbench Taverna management Data integration Text mining OGSADQP Information model & management In between Event notification Gateway
29 Roadmap services Registry workflows 1. Describe services 2. Discover services Taverna workflows workbench 3. Write & run workflows LSID authorities data 4. Provenance & data 30 Sessions on Details Workflows - hands on with Taverna Semantics
Timetable split sessions Session 1 Group 1 hands on (Swanson) Group 2 semantics (Newhaven) Teabreak (short) Session 2 Group 1 semantics (Newhaven) Group 2 hands on (Swanson) Discussions and Conclusions 31 Questions? http://www.mygrid.org.uk http://taverna.sf.net http://freefluo.sf.net/ 32
* ROB * * ROB * ROB * Presentation Overview Welcome and Introductions CCTA Enhancements (Year 2A, 2B, and 3) Q & A Livingston County Promotes CCTA web-submittal Biometric Devices & CCTA Monroe County - Managing and Monitoring the Child...
Process skills. in data, graph and diagram analysis. Event parameters - check the event parameters in the rules for resources allowed. ... Closed vs. Open communities. Closed - sharp boundaries. Open - Lack boundaries. Species abundance and diversity. Trophic Structure...
DVR State Plan Public ForumOrder of Selection Amendments. September 18, 2017. Thank you for joining us today. We will begin in a few moments. Please take this time to connect to audio and familiarize yourself with the chat and captioning...
Erik Chevrier. Discussion. What causes you to become aggressive? How do you feel, how do you cope, what do you do? What is Aggression? Hostile aggression is an act of aggression stemming from feelings of anger and aimed at inflicting...
Abraham Maslow & Carl Rogers. Humanistic Theory: Emphasizes that individuals control their own behavior and focuses on conscious experiences. On the second sheet of paper write down your ideal self is or who you want to be.
POLK COUNTY SCHOOL BOARD EMPLOYEE HEALTH CLINIC. ... Two welcoming waiting rooms, one for well patients and one for sick patients, are available for your comfort while waiting for your appointment. Sick waiting room ...
A peek into the review, processing, and updating of Voting Districts into the MAF/TIGER Database. Jennie Karalewich. Census Redistricting and Voting Rights Data Office * They may not look different. Agenda for Today's Presentation. Census Geography.
Ready to download the document? Go ahead and hit continue!