mdcore

About
Features
Download
Publications
Documentation
Tutorial
Compiling mdcore
Using mdcore
Examples

Compiling mdcore

Download and unpack the sources. This should create a directory named "mdcore". Change to this directory and execute the configuration script as follows

cd mdcore
./configure

The "configure" script will try to guess the best compiler options and the location of the required libraries. To see the available options, type

./configure --help

The most interesting options are

--enable-mpi: Compile with MPI to enable basic distributed-memory parallelism.
--with-gcc-arch=arch: Optimize for the given architecture. If compiling on the same architecture on which the simulations will be run, use "--with-gcc-arch=native".
--with-cuda=path: Specifies the prefix where the NVidia CUDA SDK is installed and enables parallelism on GPUs.
--with-cell=path: Specifies the prefix where the IBM Cell-SDK and toolchain are installed and enables parallelism on the Cell's SPUs.

Once the configuration process has completed successfully, compile the library by typing

make

This will generate the mdcore libraries in the sub-folder "src/.libs".

Some notes on OpenMP on Linux

mdcore relies on both pthreads and OpenMP for parallelization, using the former to parallelize the force computation and the latter for more simple tasks such as updating the particle positions or computing the system temperature. Depending on your OpenMP-implementation, the following environment variables may be useful:

OMP_NUM_THREADS=n: Use only n parallel threads for the OpenMP-parallel regions. This has no effect on the number of threads specified when initializing mdcore with "engine_start( &e , nr_runners )". If you are running on a CPU or CPUs with Hyper-Threading, it can be useful to specify the number of physical cores as opposed to the number of virtual threads.
OMP_WAIT_POLICY=PASSIVE: On some Linux systems, the default behavior for inactive OpenMP threads is to cycle idly without relinquishing the CPU. This can obstruct other threads created with pthreads, which is why this variable should be set to PASSIVE.

For convenience, these variables can be set on the command line:

OMP_WAIT_POLICY=PASSIVE OMP_NUM_THREADS=8 ./simulation ...

Using mdcore

Writing programs using mdcore is relatively simple, and consists of four basic steps:

Initializing an mdcore "engine" structure,
Adding particles to the engine,
Adding potentials to the engine,
Running the engine.

Any program that uses mdcore needs to include the mdcore header file:

#include <mdcore.h>

The program must also be linked against any of the following library objects generated when compiling mdcore:

libmdcore_single: Single-precision version of the mdcore library. This is the default version you should use.
libmdcore: Double-precision version of the library, should be used only for testing purposes as it is not necessarily more precise than the single-precision version. In order to use double-precision, you must also compile your code with the
libmdcore_single_mpi: If conifgured with the "--enable-mpi" option, this library will contain the single-precision version of mdcore with all the MPI-capable features enabled.
libmdcore_mpi: Likewise, this library contains the double-precision, MPI-enabled mdcore.
libmdcore_single_cuda: If conifgured with the "--with-cuda", this library will contain the single-precision version of mdcore with the CUDA functionality enabled.
libmdcore_cell: If conifgured with the "--with-cell", this library will contain the single-precision version of mdcore with the Cell/BE functionality enabled.

The main object with which programs interact with mdcore is the "engine":

struct engine e;

which is initialized with the function "engine_init", e.g.

if ( engine_init( &e , origin , dim , L , cutoff ,
     space_periodic_full , max_types , engine_flags ) != 0 ) {
    errs_dump(stdout);
    abort();
    }

where the parameters have the following types and meaning:

e: The engine struct to initialize.
origin: A pointer to a vector of three doubles indicating the least coordinate in each dimension.
dim: A pointer to a vector of three doubles indicating the extent of the simulation space in each dimension.
L: A pointer to a vector of three doubles containing the minimum cell size in each spatial dimension. The actual cell size used will be computed from "dim[k]/floor(dim[k]/L[k])" and may be less than the interaction cutoff used.
cutoff: A double-precision value indicating the interaction cutoff distance to use throughout the simulation.
space_periodic_full: The periodicity of the simulation box. Use any logical combination of the pre-defined constants
- space_periodic_x
- space_periodic_y
- space_periodic_z
- space_periodic_none
- space_periodic_full
e.g. "space_periodic_x | space_periodic_y" for periodicity in the x and y dimension, but not in z.
max_types: The maximum number of particle types that will be added to the engine.
engine_flags: Flags that can be used to control the behaviour of the engine. Several flags are specified in engine.h, the most important of which are:
- engine_flag_cuda: Compute non-bonded interactions on a CUDA device. When using this option, you can set the CUDA device number to use with the function "engine_cuda_setdevice". Note that this option will cause an error if the program is not linked with a CUDA-enabled version of mdcore.
- engine_flag_verlet, engine_flag_verlet_pairwise, engine_flag_verlet_pseudo: Use Verlet lists, pairwise Verlet lists, or pseudo-Verlet lists, respectively. Note that this option only makes sense if the effective cell size is larger than the cutoff distance. You can control the cell size with the parameter "L".
- engine_flag_affinity: Force each "runner" to run each on a different core. Note that this may cause poor performance on multi-core systems with hyperthreaded CPUs since the kth runner will be assigned to the kth core.
- engine_flag_unsorted: Use unsorted cell pair interactions. This may be faster for sparse systems or small cell sizes relative to the cutoff distance.
- engine_flag_parbonded: Compute the bonded interactions in parallel.
- engine_flag_async: Use asynchronous communication in MPI, i.e. interleave the communication and computation.
Several flags can be joined with the logical "or" operator.

The function engine_init, like all interface functions in mdcore, returns zero on success and a negative number on failure. If an error occurs, an error trace can be printed with the function "errs_dump"

Once the engine has been initialized, particles can be added to it, either individually or all at once. Each particle has a particle type associated with it. Particle types can be specified with the function "engine_addtype", e.g.:

if ( ( tid = engine_addtype( &e , mass , charge ,
     name , name2 ) ) != 0 ) {
    errs_dump(stdout);
    abort();
    }

with the parameters

&e: A pointer to an initialized engine.
mass: A double-precision value with the constant mass of the particle type.
charge: A double-precision value with the constant charge of the particle type.
name: A string with the particle name, e.g. "OH".
name2: A secondardy name string, may also be NULL.

This function returns the type ID of the newly created particle type, or <0 on error.

Individual particles can be added with the function "space_addpart", e.g.:

if ( space_addpart( &(e.s) , &p , x ) != 0 ) {
    errs_dump(stdout);
    abort();
    }

with the parameters

&(e.s): A pointer to the "struct space" within an initialized engine.
&p: A pointer to a "struct part" containing the particle data. Note that this data will be copied into the space, e.g. the original "struct part" does not need to be preserved.
x: A pointer to a vector of three double-precision values with the spatial location of the particle.

Alternatively, several particles can be added at once with the function "engine_load", e.g.

if ( engine_load( &e , x , v , type , 
     pid , vid , q , flags , N ) != 0 ) {
    errs_dump(stdout);
    abort();
    }

with the parameters

&e: A pointer to an initialized engine.
x: A pointer to an N times 3 array of doubles containing the particle positions.
v: A pointer to an N times 3 array of doubles containing the particle velocities.
type: A pointer to an array of N integers containing the particle type IDs.
pid: A pointer to an array of N integers containing the particle IDs. This should be, for each particle, a unique number between 0 and N-1
vid: A pointer to an array of N integers containing the virtual particle IDs.
q: A pointer to an array of N doubles containing the individual particle electrostatic charges.
flags: A pointer to an array of N integers containing the particle flags.
N: The total number of particles to load.

Note that if for any of the parameters v, flags, vid, or q are NULL, the respective values of the particles will be initialized to zero. As with "space_addpart", the particle data is copied to the engine and can be retreived similarly via the function "engine_unload".

Note that the particle positions and velocities are specified in nanometers and nanometers per picosecond respectively.

Once the particle types have been specified, interaction potentials can be created and associated to pairs of types. The interaction potentials themselves are stored as least-squares piecewise polynomials which can either be constructed from one of the pre-defined potential function types:

potential_create_harmonic: Creates a harmonic potential of the form v(r) = K(r-r0)^2.
potential_create_LJ126: Creates a Lennard-Jones potential of the form v(r) = A/r^12 - B/r^6.
potential_create_Coulomb: Creates a Coulombic potential v(r) = 1/(4*Pi*r) shifted such that it goes to zero at the end.
potential_create_Ewald: Creates a potential representing the real-space part of an Ewald summation, e.g. v(r) = q*erfc(kappa*r)/r.
potential_create_LJ126_Ewald: Creates a combined Lennard-Jones and Ewald potential.
potential_create_LJ126_Coulomb: Creates a combined Lennard-Jones and shifted Coulomb potential.
potential_create_harmonic_angle: Creates a harmonic angular potential of the form v(x) = K(arccos(x)-theta0)^2.
potential_create_harmonic_dihedral: Creates a harmonic dihedral potential of the form v(x) = K(1 + cos(n*arccos(x))-theta0).

All potential_create functions require, as their first two parameters, the interval [a,b] over which the potential should be created. Beyond b, the potential is zero, and between 0 and a, the potential is linearly extended from the derivative at a. The last parameter is the tolerance relative to the maximum value of the potential to which the polynomial should be fit. For example, a Lennard-Jones potential for the interaction between neon atoms can be constructed using:

struct potential pot_NeNe;
if ( ( pot_NeNe = potential_create_LJ126( 0.2 , cutoff , 
     2.6513e-06 , 5.7190e-03 , 1.0e-3 ) ) == NULL ) {
    errs_dump(stdout);
    abort();
    }

Non-bonded interaction potentials between particles are added to the engine using the "engine_addpot" function, e.g.

if ( engine_addpot( &e , pot , tid1 , tid2 ) < 0 ) {
    errs_dump(stdout);
    abort();
    }

where tid1 and tid2 are the type IDs of the interacting particles as returned by "engine_addtype", as described earlier.

Similarly, bonded interactions are added with the function "engine_bond_addpot", e.g.

if ( engine_bond_addpot( &e , pot , tid1 , tid2 ) < 0 ) {
    errs_dump(stdout);
    abort();
    }

Which particles themselves are bonded is then specified with

if ( engine_bond_add( &e , pid1 , pid2 ) < 0 ) {
    errs_dump(stdout);
    abort();
    }

where pid1 and pid2 are particle IDs as specified when adding particles to the system. Similarly, angular and dihedral potentials and bonds are added with engine_angle_addpot and engine_dihedral_addpot, and engine_angle_add and engine_dihedral_add, respectively.

If the non-bonded interaction between to bonded particles is to be excluded, this has to be specified explicitly via the "engine_exclusion_add" function, e.g.

if ( engine_exclusion_add( &e , pid1 , pid2 ) < 0 ) {
    errs_dump(stdout);
    abort();
    }

If any such exclusions have been added to the engine, redundancies between them can be removed by calling "engine_exclusion_shrink".

Finally, holonomic constraints are added using the "engine_rigid_add" function, e.g.

if ( engine_rigid_add( &e , pid1 , pid2 , d ) < 0 ) {
    errs_dump(stdout);
    abort();
    }

where d is the prescribed distance between the particles pid1 and pid2.

After all the particles, potentials, bonded interactions, and holonomic constraints have been added to an engine, it has to be started with "engine_start", e.g.

if ( engine_start( &e , nr_runners , nr_queues ) < 0 ) {
    errs_dump(stdout);
    abort();
    }

where nr_runners and nr_queues are the number of threads and task queues to use, respectively. As of this point, the engine is ready to integrate the equations of motion for the particles.

Once the engine has been started, the computations for each time step, i.e. computing the bonded and non-bonded interactions, resolving the holonomic constraints and updating the particle positions and velocities, are computed with the function "engine_step", e.g.:

e.time = 0;
e.dt = 0.005;
for ( i = 0 ; i < nrsteps ; i++ )
    if ( engine_step( &e ) != 0 ) {
        errs_dump(stdout);
        abort();
        }

where the time step e.dt is specified in picoseconds. Between the time steps, the particle data can be uloaded and re-loaded as described earlier with the functions engine_unload and engine_load, e.g. to adjust the velocities or plot output.

Example programs

The directory "mdcore/examples" contains a number of small simulations which offer a useful starting point for experimenting with mdcore and implementing your own simulation.

jac: The Goint Amber-Charmm (JAC) benchmark consisting of a DHFR protein solvated in TIP3P rigid water molecules. The simulation is executed with the following parameters
./jac 5dhfr_cube.psf 5dhfr_cube.pdb n steps
where n is the number of processors to use and steps are the number of time steps to simulate. The files 5dhfr_cube.psf and 5dhfr_cube.pdb contain the simulation structure. The interaction parameters are read from the file par_all22_prot.inp, which is in the same directory.
The source file jac.c is set up to deal with all simulation options, e.g. verlet lists, pairwise verlet lists, MPI parallelisation, GPU parallelisation. The different options can be set by changing the engine flags in the call to "engine_init" at the start of the program.
argon: A simple simulation consisting of 10x10x10 cells of bulk argon at 100 K (the number of particles is adjusted depending on the cell edge length). The simulation is executed with the following parameters
./argon n steps dt L
where n is the number of processors to use, steps are the number of time steps of length dt picoseconds to take and L is the cell edge length to use.
Two other executables, "argon_verlet" and "argon_pwverlet" are provided, which use Verlet and pairwise Verlet lists respectively. The "skin" width used in the lists is the supplied edge length L minus a fixed cutoff of 1.0 nm.
Velocity scaling with a coupling constant of 0.1 is used during the first 10000 steps to maintain a temperature of 100 K.
bulk: A bulk water simulation consisting of 8x8x8 cells of rigid SPC/E water molecules at 300 K (the number of molecules is adjusted depending on the cell edge length). The water molecules are kept rigid using the SHAKE algorithm to half the digits of machine precision. The simulation is executed with the same set of parameters as "argon" above.
The executable "test" is linked with the double-precision version of mdcore whereas "test_single" is linked with the single-precision version.
If mdcore was configures with the "--with-cell" option, then "test_cell" executes the simulation using as many SPUs as are available.
hybrid: Similar to the "bulk" simulation above, yet with 16x16x16 cells of edge length 1.0 nm. The "hybrid" simulation requires mdcore to have been compiled with the "--enable-mpi" option and is executed as follows:
mpirun -x OMP_WAIT_POLICY -x OMP_NUM_THREADS -np m ./hybrid n steps
where m is the number of MPI nodes to use and n is the number of threads to use on each node. Since bisection is used to split the domain, it is recommended, for proper load balancing, that m be a power of 2.