Cloud Data mining and FutureGrid

Cloud Data mining and FutureGrid

FutureGrid Overview for VSCSE Summer School on Science Clouds Science Cloud Summer School [email protected] University July 30 2012 Geoffrey Fox [email protected] Informatics, Computing and Physics Pervasive Technology Institute Indiana University Bloomington FutureGrid key Concepts I FutureGrid is an international testbed modeled on Grid5000 July 15 2012: 223 Projects, ~968 users Supporting international Computer Science and Computational Science research in cloud, grid and parallel computing (HPC) The FutureGrid testbed provides to its users: A flexible development and testing platform for middleware and application users looking at interoperability, functionality, performance or evaluation FutureGrid is user-customizable, accessed interactively and supports Grid, Cloud and HPC software with and without VMs A rich education and teaching platform for classes See G. Fox, G. von Laszewski, J. Diaz, K. Keahey, J. Fortes, R. Figueiredo, S. Smallen, W. Smith, A. Grimshaw, FutureGrid - a reconfigurable testbed for Cloud, HPC and Grid Computing, Bookchapter draft FutureGrid key Concepts II Rather than loading images onto VMs, FutureGrid supports Cloud, Grid and Parallel computing environments by provisioning software as needed onto bare-metal using Moab/ xCAT (need to generalize) Image library for MPI, OpenMP, MapReduce (Hadoop, (Dryad), Twister), gLite, Unicore, Globus, Xen, ScaleMP (distributed Shared Memory), Nimbus, Eucalyptus, OpenNebula, KVM, Windows .. Either statically or dynamically

Growth comes from users depositing novel images in library FutureGrid has ~4400 distributed cores with a dedicated network and a Spirent XGEM network fault and delay generator Image1 Image2 ImageN Choose Load Run FutureGrid Partners Indiana University (Architecture, core software, Support) San Diego Supercomputer Center at University of California San Diego (INCA, Monitoring) University of Chicago/Argonne National Labs (Nimbus) University of Florida (ViNE, Education and Outreach) University of Southern California Information Sciences (Pegasus to manage experiments) University of Tennessee Knoxville (Benchmarking) University of Texas at Austin/Texas Advanced Computing Center (Portal) University of Virginia (OGF, XSEDE Software stack) Center for Information Services and GWT-TUD from Technische Universtitt Dresden. (VAMPIR) Red institutions have FutureGrid hardware FutureGrid: a Grid/Cloud/HPC Testbed 12TF Disk rich + GPU 512 cores NID: Network Impairment Device Private

FG Network Public FutureGrid Distributed Testbed-aaS Bravo Delta (IU) India (IBM) and Xray (Cray) (IU) Hotel (Chicago) Foxtrot (UF) Sierra (SDSC) Alamo (TACC)6 Compute Hardware hotel Total RAM Secondary System type # CPUs # Cores TFLOPS Storage Site Status (GB) (TB) IBM iDataPlex 256 1024 11 3072 180 IU Operational Dell 192 768 8 1152 30 TACC Operational

PowerEdge IBM iDataPlex 168 672 7 2016 120 UC Operational sierra IBM iDataPlex 168 672 7 2688 96 Cray XT5m 168 672 6 1344 180 IU Operational foxtrot IBM iDataPlex

64 256 2 24 UF Operational Bravo Large Disk & memory 32 128 1.5 768 3072 (192GB per node) 1536 (192GB per node) 192 (12 TB per Server) IU Operational 192 (12 TB per Server)

IU Operational 192 IU On Order Name india alamo xray Delta Echo (ScaleMP) Large Disk & 192+ 32 CPU 14336 memory With 32 GPUs GPU Tesla GPUs TOTAL Cores 4384 Large Disk & Memory 32 CPU 192 ?9

2 6144 SDSC Operational Storage Hardware System Type Capacity (TB) File System Site Status Xanadu 360 180 NFS IU New System DDN 6620 120 GPFS UC New System SunFire x4170

96 ZFS SDSC New System Dell MD3000 30 NFS TACC New System IBM 24 NFS UF New System Substantial back up storage at IU: Data Capacitor and HPSS Support Traditional Drupal Portal with usual functions Traditional Ticket System System Admin and User facing support (small) Outreach group (small) Strong Systems Admin Collaboration with Software group 4 Use Types for FutureGrid TestbedaaS 223 approved projects (968 users) July 14 2012 USA, China, India, Pakistan, lots of European countries Industry, Government, Academia Training Education and Outreach (10%) Semester and short events; interesting outreach to small universities Computer science and Middleware (59%) Core CS and Cyberinfrastructure; Interoperability (2%) for Grids and Clouds; Open Grid Forum OGF Standards Fractions are as Computer Systems Evaluation (29%) of July 15 2012 XSEDE (TIS, TAS), OSG, EGI; Campuses add to > 100% New Domain Science applications (26%) Life science highlighted (14%), Non Life Science (12%) Generalize to building Research Computing-aaS 9 Recent Projects Have Competitions Last one just finished Grand Prize Trip to SC12 Next Competition starts Beginning of August for this Science Cloud Summer School 10 FutureGrid Supports Education and Training Jerome Mitchell HBCU Cloud View of Computing workshop June 2011 Cloud Summer School July 30August 3 2012 with 10 HBCU attendees Mitchell and Younge building Cloud Computing Handbook loosely based on my book with Hwang and Dongarra Several classes around the world each semester Possible Interaction with (200 team) Student Competition in China organized by Beihang Univ. 11 First FutureGrid Challenge Competition Core Computer Science FG-172 Cloud-TM from Portugal: on distributed concurrency control (software transactional memory): "When Scalability Meets Consistency: Genuine Multiversion Update Serializable Partial Data Replication, 32nd International Conference on Distributed Computing Systems (ICDCS'12) (top conference) used 40 nodes of FutureGrid Core Cyberinfrastructure FG-42,45 LSU/Rutgers: SAGA Pilot Job P* abstraction and applications. SAGA/BigJob use on clouds Core Cyberinfrastructure FG-130 USC : Optimizing Scientific Workflows on Clouds. Scheduling Pegasus on distributed systems with overhead measured and reduced. Used Eucalyptus on FG. Interesting application FG-133 from Univ. Arkansas: Supply Chain Network Simulator Using Cloud Computing with dynamic virtual machines supporting Monte Carlo simulation with Grid Appliance and Nimbus 12

FutureGrid Tutorials Cloud Provisioning Platforms Educational Grid Virtual Appliances Using Nimbus on FutureGrid [novice] Running a Grid Appliance on your desktop Nimbus One-click Cluster Guide Running a Grid Appliance on FutureGrid Using OpenStack Nova on FutureGrid Using Running an OpenStack virtual appliance on FutureGrid Eucalyptus on FutureGrid [novice] Running Condor tasks on the Grid Appliance Connecting private network VMs across Nimbus Running MPI tasks on the Grid Appliance clusters using ViNe [novice] Running Hadoop tasks on the Grid Appliance Using the Grid Appliance to run FutureGrid Cloud

Deploying virtual private Grid Appliance clusters using Clients [novice] Nimbus Cloud Run-time Platforms Building an educational appliance from Ubuntu 10.04 Running Hadoop as a batch job using MyHadoop Customizing and registering Grid Appliance images using [novice] Eucalyptus Running SalsaHadoop (one-click Hadoop) on HPC High Performance Computing environment [beginner] Basic High Performance Computing Running Twister on HPC environment Running Hadoop as a batch job using MyHadoop Running SalsaHadoop on Eucalyptus Performance Analysis with Vampir Running FG-Twister on Eucalyptus Instrumentation and tracing with VampirTrace Running One-click Hadoop WordCount on Experiment Management Eucalyptus [beginner] Running interactive experiments [novice] Running One-click Twister K-means on Eucalyptus Running workflow experiments using Pegasus Image Management and Rain Pegasus 4.0 on FutureGrid Walkthrough [novice] Using Image Management and Rain [novice] Pegasus 4.0 on FutureGrid Tutorial [intermediary] Storage

Pegasus 4.0 on FutureGrid Virtual Cluster [advanced] Using HPSS from FutureGrid [novice] 13 Selected List of Services Offered FutureGrid Cloud PaaS Hadoop Twister HDFS Swift Object Store 02/27/2020 IaaS Nimbus Eucalyptus OpenStack ViNE GridaaS Genesis II Unicore SAGA Globus HPCaaS MPI OpenMP CUDA TestbedaaS FG RAIN Portal Inca Ganglia Devops

Exper. Manag./Pegasus 14 ViNe1 Genesis II Unicore MPI

OpenMP ScaleMP Old Ganglia Pegasus3

Inca Portal2 PAPI Globus 02/27/2020 Echo Eucalyptus Delta Bravo

OpenStack Xray Nimbus Alamo Foxtrot Sierra Hotel India myHadoop Services Offered 1. ViNe can be installed on the other resources via Nimbus 2. Access to the resource is requested through the portal 3. Pegasus available via Nimbus and Eucalyptus images

15 FutureGrid Technology and Project Requests Total Projects and Categories Software Components Portals including Support use FutureGrid Outreach Monitoring INCA, Power (GreenIT) Experiment Manager: specify/workflow Research Image Generation and Repository Above and below Intercloud Networking ViNE Virtual Clusters built with virtual networks Nimbus OpenStack Eucalyptus Performance library Rain or Runtime Adaptable InsertioN Service for images Security Authentication, Authorization, Note Software integrated across institutions and between middleware and systems Management (Google docs, Jira, Mediawiki) Note many software groups are also FG users FutureGrid offers

Computing Testbed as a Service Research Computing aaS SaaS PaaS IaaS Custom Images Courses Consulting Portals Archival Storage System e.g. SQL, GlobusOnline Applications e.g. Amber, Blast Cloud e.g. MapReduce HPC e.g. PETSc, SAGA Computer Science e.g. Languages, Sensor nets Hypervisor Bare Metal Operating System Virtual Clusters, Networks

FutureGrid Uses Testbed-aaS Tools Provisioning Image Management IaaS Interoperability IaaS tools Expt management Dynamic Network Devops FutureGrid Usages Computer Science Applications and understanding Science Clouds Technology Evaluation including XSEDE testing Education and 18 Training Research Computing as a Service Traditional Computer Center has a variety of capabilities supporting (scientific computing/scholarly research) users. Could also call this Computational Science as a Service IaaS, PaaS and SaaS are lower level parts of these capabilities but commercial clouds do not include 1) Developing roles/appliances for particular users 2) Supplying custom SaaS aimed at user communities 3) Community Portals 4) Integration across disparate resources for data and compute (i.e. grids)

5) Data transfer and network link services 6) Archival storage, preservation, visualization 7) Consulting on use of particular appliances and SaaS i.e. on particular software components 8) Debugging and other problem solving 9) Administrative issues such as (local) accounting This allows us to develop a new model of a computer center where commercial companies operate base hardware/software A combination of XSEDE, Internet2 and computer center supply 1) to 9)? 19 FG Challenge 2: A Competition for You 6 prizes of up to $500 awarded to best projects submitted in next 2 months Up to 3 prizes awarded for projects submitted by September 1 Remaining prizes for projects submitted by October 1 Criteria include: Innovation, Scaling, Utility Quality of associated publications acknowledging FutureGrid Contributions to Education and Outreach International and/or Interdisciplinary Collaboration If you are working in a global project like FG241, submit a request for your own project You must email [email protected] when you want to submit project; indicate if this is a student project Aim at least 4 out of 6 prizes go to students 20 Web Resources Science Cloud Summer School 2012 website: Science Cloud Summer School schedule: FG-241 Science Cloud Summer School 2012 project page: Instructions for obtaining FutureGrid accounts for Science Cloud Summer School 2012: Science Cloud Summer School 2012 Forum: https:// er-school-2012 Twitter hashtag: #ScienceCloudSummer 21 Many Thanks to Funding Organizations: NSF, Lilly Foundation VSCSE: Sharon Glotzer, Eric Hofer, Scott Lathrop, Meagan Lefebvre Video Infrastructure: Mike Miller (NCSA), Chris Eller, Jeff Rogers Organizers and AIs at 10 sites Speakers acknowledged as they are announced IU Hospitality: Mary Nell Shiflet Staff at FutureGrid: John Bresnahan, Ti Leggett, David Gignac, Gary Miksik, Barbara Ann O'Leary, Javier Diaz Montes, Sharif Islam, Koji Tanaka, Fugang Wang, Gregor von Laszewski Many dedicated students 22

Recently Viewed Presentations

  • Light Account Next steps - The Mosaic Company

    Light Account Next steps - The Mosaic Company

    Topics. Overview. Benefits. FAQ. Next Steps. Upgrade. Help. Next Steps- Sign-up for Light Account from order. Click on on . Process Order . button in the PO notification letter
  • Internetworking I - Carnegie Mellon School of Computer Science

    Internetworking I - Carnegie Mellon School of Computer Science

    Key data structures Defined in /usr/include/netdb.h Hostent is a DNS host entry that associates a domain name (e.g., with an IP addr ( DNS (Domain Name Service) is a world-wide distributed database of domain name/IP address mappings.
  • Static Electricity

    Static Electricity

    Slide 10 Static Electricity Three Ways of Creating a Charge Friction Slide 14 Slide 15 Rubbing Separates charges on Objects Slide 17 Slide 18 Slide 19 CHARGING BY CONDUCTION Conduction Slide 22 Slide 23 Conservation of Charge Slide 25 Charging...
  • Occupation and Memory group

    Occupation and Memory group

    Occupational Therapy Service East and North CHCP Community Elderly Mental Health Service Influential Factors on Group Development Recovering Ordinary Lives (The strategy for occuptaional therapy in mental health services 2007 - 2017, A vision for the next ten years) Scottish...
  • Stand Up, Speak Out - Redlands Unified School District

    Stand Up, Speak Out - Redlands Unified School District

    Stand Up, Speak Out. A Bullying Awareness Presentation. by the RUSD Elementary School Counseling Team . ... - Is the person who is bothering you trying to hurt or humiliate you on purpose, or are they unaware of how hurtful...
  • Pengantar Teknologi Web -

    Pengantar Teknologi Web -

    Pengantar Teknologi Web Keamanan + Keamanan Web Why worry? (cont.) Pengantar Security Keamanan komputer -> fisik Dari bencana alam Dari pencuri Dari serangan / bom Keamanan sistem informasi -> non fisik Dari sisi software dan data Komponen Security (CIA-AN) Confidentiality:...
  • Mission


    Extend Eclipse Plug-ins to support grid computing High Performance Computing and Monte Carlo Methods Host Institution - North Carolina A&T State University Dr. Yaohang Li, Primary Instructor Spring 2005, 10 students enrolled at NCAT from Departments of Computer Science, Mechanical...
  • Presentation Name

    Presentation Name

    SCOPUL: Efectuarea diagnosticului diferentiat dintre ACV H si ACV I TC ca metoda de electie TC permite diagnosticarea: 1.Hematoamelor intracerebrale 100% 2.Hemoragii in sistemul licvorian 30-80-100% (in dep. De timpul efectuarii, cant de singe, locul si calibrul vasului lezat) 3.AVCI...