LFRic: A new model for the Met Office

LFRic: A new model for the Met Office Steve Mullerworth Bath DG Workshop 1st June 2016 Crown copyright Met Office Met Offices Unified Model Unified Model (UM) supports: Operational forecasts at Mesoscale (resolution approx. 12km 4km 1km) Global scale (resolution approx. 17km) Global and regional climate predictions (global resolution around 100km, run for 10-100 years) Seasonal predictions + Research mode (1km - 10m) and single column

model 26 years old this year Crown copyright Met Office Operational NWP Models: Jun 2016 Global Operational: 17km Climate: ~100km 10-100 year simulations Seasonal: 50km Euro 4 Operational 4km 70 levels

UKV Operational: 1.5 km 70 levels Crown copyright Met Office Scalability (March 2010) 14 12 10 N512L70 - no I/O 8 N512L70 - full I/O UKV

6 HadGEM3-AO 4 2 0 0 Crown copyright Met Office 500 1000 1500

2000 The finger of blame At 25km resolution, grid spacing near poles = 75m At 10km reduces to 12m! Crown copyright Met Office

GungHo 2011-2016 Met Office, NERC, STFC partnership involving Imperial College, STFC, Universities of Bath, Exeter, Manchester, Leeds, Reading and Warwick Develop a dynamical core that is Scientifically as good as current ENDGame dynamics Scales on future architecture Used for future dynamics research Formulation Uses finite element/volume methods (FEM/FV) Supports a higher order schemes Supports quads (cubed- or diamond-sphere) or triangles

Crown copyright Met Office LFRic: Replacement of the UM Develop a replacement for the UM by around 2020 Project name LFRic after Lewis Fry Richardson Replacement to go operational mid-2020s Based on GungHo dynamics and GungHo computational science recommendations Crown copyright Met Office GungHo Single Model

Architecture Crown copyright Met Office Driver Algorithm PSy Kernel Crown copyright Met Office Set-up Distributed memory Time-step control IO Modular

Couplingscience (internalcomponents and external) Field operations (local partition) Auto-generated Concurrent running? Break field into chunks for kernels Distributed memory updates Shared memory concurrency Small chunk of field-data

Contiguous memory data Currently, vertical column Looping up the column Unstructured mesh requires indirect mapping with dof-map Dof-map addresses bottom layer of field Data in vertical column of dofs is contiguous in memory Therefore, to get data for a cell on level K, add (K-1) to values in the dof-map of cell at base of column By looping up column we hope to balance out cost of dofmap lookup Vectorisation Cache re-use Crown copyright Met Office

Current status of implementation Infrastructure provides basic driver layer and classes for: Algorithms and kernels GungHo fields on different function spaces Flexible choice of (quad-based) mesh, and FEM order Support for distributed memory parallel and shared-memory colouring PSyclone Auto-generates PSy-layer parallel code Being used for development of GungHo science Support being added for finite difference physics Crown copyright Met Office

Mixed FEM Function spaces. Lowest order Higher order W0 Crown copyright Met Office W2 W3 LFRic Object Stack (parallel) Init

UGRID file Global Mesh Partition Run Mesh Function Space Field

Cell ordering for concurrent comms & computation 1. 2. ... 3. 4. 5. ... 6. 7. 8. Inner n+1 cells

Inner n cells Inner 1 cells Edge cells Halo 1 cells Halo n-1 cells Halo n cells Halo n+1 cells Dynamo and PSyclone Prototype GungHo implementation developed at Met Office Comprises both LFRic infrastructure and GungHo science Originally hand-written PSy-layer: PSyKAl-lite Named Dynamo

PSyclone currently developed at STFC Daresbury Written to clone the hand-written PSy-layer code Crown copyright Met Office Dynamo kernel example ... From matrix_vector_kernel_mod.F90 Dynamo algorithm post PSyclone Going MPI and OpenMP Parallel

Dynamo infrastructure and Psyclone support PSyclone script > python generator.py oalg alg.f90 -opsy psy.f90 file.f90 -s script.py No Scientists were harmed PSyclone generated code MPI halo swap Colouring and OMP

Halo is not up to date Spot the difference first parallel run Dynamo Initial MPI results Dynamo Initial OpenMP results Serial time Parallel speedup Parallel time 6.25

900 5 675 3.75 450 2.5 225

1.25 0 1 2 3 Number of OPENMP threads 4 0

Speedup W allclock tim e in seconds Elapsed time Total Speedup 1125 W allclcock tim e in seconds 225 150

75 0 1 2 4 8 Timestepping (Xeon Phi) Speedup (Xeon Phi) 4,000 Timestepping (Xeon) Speedup (Xeon)

30 3,000 23 2,000 15 1,000 8

Speedup Wallclock time in seconds Xeon and Xeon Phi 0 1 2 3 4

5 6 7 8 9 0 Current Software Dynamo prototype comprises

LFRic infrastructure GungHo dynamics implementation F2003 PSyclone PSy-layer code generation Python BSD-style licence Crown copyright Met Office Immediate plans Support for multigrid solver development Support for coupling to physics

Separate software infrastructure and science Extend PSyclone optimisations Resolve preliminary scalability issues Crown copyright Met Office High level Schedule Support scientific evaluation Early 2017: Dynamical core trials with simple physics Early 2018: Aquaplanet (no land) 2020: Operational and coupled configurations 2020-2X: Trials and deployment Crown copyright Met Office

Credits & Questions PSyKAl structure and earliest developments based on GungHo collaboration computational science recommendations NERC, STFC and Met Office collaboration Imperial College, Bath, Reading, Leeds, Manchester, Warwick Current development Met Office STFC University of Manchester Mike Rezny at Monash University Crown copyright Met Office

