Quantum Chromodynamics (QCD) is the quantum field theory of the strong nuclear interaction and it explains how quarks and gluons are bound together to make more familiar objects such as the proton or neutron, which form the nuclei of atoms. Besides describing these known phenomena, understanding QCD is required to interpret results coming from the Large Hadron Collider (LHC) at CERN, Geneva, which has recently started running. However, as QCD is a non-linear theory, it is only analytically solvable in a specific region of parameter space. A more general solution requires a numerical approach.
Four dimensional space-time is discretised onto a finite lattice (of spacing a). The partial differential equations are replaced by finite difference equations, giving rise to a sparse linear algebra. In quantum field theory measureable quantities are determined from the path integral, an infinite dimensional integral. To calculate this numerically Monte Carlo integration is used. Thus, in Philip Colella's "seven dwarfs" classification of algorithms, Lattice QCD uses three: Structured grids, Sparse Linear Algebra, andMonte Carlo. The local nature of the interaction naturally lends itself to a parallel data decomposition where each process works on a sub-volume with a well defined communication pattern between nearest neighbour processors.
These computational patterns have specific performance criteria, and these have been targeted in both hardware and software. I describe the QCDOC machine, the bespoke QCDOC engine, and how by using and developing the SciDAC lattice QCD software modules qdp++/Chroma, UKQCD has high performance codes which run on QCDOC and its more commercial descendants, Bluegene/L and Bluegene/P. In particular, by using BAGEL, an assembler generator for QCD and UKQCD's own API to qdp++/Chroma, UKQCD has been able to focus on developing both the physics applications and the necessary performance kernels. Moreover, I describe the development of threaded code in the data parallel code which enables 'mixed-mode' programming thus future proofing these codes as we move into the 'many-core' era. Finally I detail a preliminary attempt at porting these complex codes onto the Cell processor.