Under the microscope: Molecular Simulations using OpenMM

Folding@home uses software developed as part of OpenMM, code that allows molecular dynamics to be simulated. This tutorial gives you an introduction to: a) simulating simpler 'protein folding' using OpenMM, and b) containerising your software application using Docker.

Background

'Folding@home' [1] uses computational-science software developed as part of OpenMM [2]. The open-source OpenMM code allows molecular simulations to be conducted, with ease-of-use (and ease-of-understanding) by choosing the Python language. But why are molecular simulations of interest?

Why is 'protein folding' important?

Molecular simulations allow scientist (and the more curious type) to understand protein folding, a key biological process that is fundamental to every living thing.

Being able to accurately predict the shape of proteins could accelerate research in every field of biology. That could lead to important breakthroughs like finding new medicines, or finding proteins that break down industrial and plastic waste or efficiently capture carbon from the atmosphere [3].

Little is known about protein folding

But we only know the exact 3-dimensional structure of a tiny fraction of the 200 million proteins known to science. Molecular dynamics software such as OpenMM allow us to investigate protein folding evolution (from a mechanistic approach), while other software such as AlphaFold [4] can predict final states (using a machine learning approach).


Introducing OpenMM

OpenMM is computational-based software that allows the user to simulate protein folding. It's a mechanistic approach that allows users to see the time-dependant, evolutionary structure during folding. A more simple simulation configuration script is shown below, with comments added by prepending them with a particular symbol.

# make available OpenMM software to use in this 'script'
from openmm.app import *
from openmm import *
from openmm.unit import *
from sys import stdout

# choose protein to be modelled
pdb = PDBFile('input.pdb')

# choose modelling parameters that approximate the protein's dynamics
# & interaction
forcefield = ForceField('amber14-all.xml', 'amber14/tip3pfb.xml')
system = forcefield.createSystem(pdb.topology, nonbondedMethod=PME,
        nonbondedCutoff=1*nanometer, constraints=HBonds)

# choose numerical 'integration' method
integrator = LangevinMiddleIntegrator(300*kelvin, 1/picosecond, 
									  0.004*picoseconds)
simulation = Simulation(pdb.topology, system, integrator)

# setup initial conditions ready for simulation
simulation.context.setPositions(pdb.positions)
simulation.minimizeEnergy()

# setup monitoring then simulate!
simulation.reporters.append(PDBReporter('output.pdb', 500))
simulation.reporters.append(StateDataReporter(stdout, 500, step=True,
        potentialEnergy=True, temperature=True))
simulation.step(5000)

So how do we run that simulation script? We could either install OpenMM following the guide [5] maintained by the developers of the code, or we could use another software Docker which allows us to easily create a reproducible environment that we can tinker with - and break.


Introducing Docker

Docker allows us an easily reproducible environment that we can tinker with - and break - without fear of messing up our computer. More technically it provides isolation between our application (including all it's dependencies) and our host operating system.

Once Docker has been installed [6], you can then create a 'sandboxed' environment that has OpenMM installed by following these steps:

Create a Dockerfile via the command line:

mkdir openmm
cd ./openmm
touch Dockerfile

Afterwhich define the Dockerfile to use Miniconda as the base image then install OpenMM and other associated software via Conda:

# syntax=docker/dockerfile:1

FROM continuumio/miniconda3
WORKDIR /
RUN conda install -c conda-forge openmm -y
RUN conda install ipython -y
Dockerfile contents

Then create a Docker image that can be used as a base computational platform for OpenMM simulations by:

docker build -t openmm .

Launch this image in a container and run a simulation:

sudo docker run -i -t -v "/home/nchowlett/openmm/src:/home/" openmm
cd /home
ipython
%run ./code.py

Note we are using Docker's bind mounts feature to share code from our operating system to the container, allowing any code changes to be seen by Docker.

To turn off the container, first get the container's ID:

sudo docker ps

Then use this ID to select which container to turn off:

sudo docker stop <ID>

If you haven't had enough Docker, check out a more general introduction on Docker's website.


Conclusion

After following along with this tutorial you have hopefully conducted your first molecular simulation! Some people in the world would say thank-you (at least I would). Specifically, you have:

  1. Built a easily reusable, sharable molecular-simulation app
  2. Simulated a simpler protein using this app

If you're interested in learning more about OpenMM perhaps check out their presentations available online [7].


References

[1] Folding@home seeks to unravel the mysteries of protein dynamics, including the folding process, and their roles in health and disease: https://www.youtube.com/watch?v=RGGzMQ2oFrA

[2] OpenMM 7: Rapid development of high performance algorithms for molecular dynamics: https://doi.org/10.1371/journal.pcbi.1005659

[3] Protein folding explained: https://www.youtube.com/watch?v=KpedmJdrTpY

[4] Highly accurate protein structure prediction with AlphaFold: https://www.nature.com/articles/s41586-021-03819-2

[5] OpenMM User Guide - Getting Started: http://docs.openmm.org/latest/userguide/application/01_getting_started.html

[6] Docker - Get Docker (guide): https://docs.docker.com/get-docker/

[7] Introduction to Molecular Dynamics: https://www.youtube.com/watch?v=_TiQYNWJwYg