Solvated Systems

From CHARMM tutorial
Jump to navigationJump to search

Description of Solvation

Solvation is surrounding the solute, generally a protein or DNA strand, with a solvent. This is very useful in biomolecular simulations since biochemical reactions generally take place in a viscous environment. Though CHARMM supports methods of estimating solvent effects implicitly, in many simulations it is useful to be able to explicitly place the system being studied into a solvated environment. There is no specific command for solvation, but it can be done through a series of commands.

Basic Solvation Input Script

50%

Preparation
This tutorial will surround a protein in water. First, the user must have a coordinate file containing enough waters to cover the entire system. One with roughly 140,000 atoms can be downloaded from the tutorial website http://www.charmmtutorial.org along with the other files used in this section.

The protein file will be read in before the water file. After the proteins structure and coordinates have been read in, the user must align the protein at the geometric origin with the "COOR ORIEnt" command.

At this point the waters that will used for solvation should be read in. Since we need to generate a new segment for the waters, we must read them in as a sequence and then execute the GENErate command. Once the file containing the waters is opened the following command should be issued:


READ SEQUence TIPS 46656

The "SEQUence" command will read in the residues. Since this PDB file is purely waters, the RESId "TIPS" can be specified followed by the number of residues in the PDB (46656). Once the waters are read in the sequence can be generated:

GENErate WATErs NOANgle NODIhedral

The water coordinates should now be read and appended to the PSF using the append option to "READ COOR".

 * Solvate Protein
 *
 bomlev -2
 
 ! Read in Topology and  Parameter files
 
 open unit 1 card read name top_all27_prot_na.rtf
 read RTF card unit 1
 close unit 1
 
 open unit 1 card read name par_all27_prot_na.prm
 read PARA card unit 1
 close unit 1
 
 ! Read protein structure from PSF file
 
 open read formatted unit 27 name 1yjp-full-min.psf
 read psf card unit 27
 close unit 27
 
 open read unit 10 card name 1yjp-full-min.crd
 read coor card unit 10
 close unit 10
 
 ! orient about origin so the alignment is correct with the water crystal
 coordinate orient

Since the protein has been centered about the origin, the waters must also be centered about the origin with the "COOR ORIEnt" command. To make sure that the waters do not rotate, use the "NOROtation" command. Using...

COORdinate ORIEnt NOROtation SELEct SEGId WATErs END

will ensure that only the waters will be centered about the origin.

There may be overlapping atoms between the protein and the newly added waters, which will cause dynamics simulations to fail. The overlapping water molecules will have to be removed with the "DELEte" command. "DELEte ATOM SORT" informs CHARMM that the user will be deleting atoms and after that is done, the PSF will be re-sorted with the selected atoms removed.

A select command will specify which atoms are to be removed.

! Delete waters which overlap with protein
DELEte ATOM SORT SELEct .BYREs. (SEGId WATErs .AND. TYPE OH2 .AND. -
((.NOT. (SEGId WATErs .OR. HYDRogen)) .AROUnd. 2.5)) END

Since the residues are known and were read in with the "SEQU" command, it would be a good idea to select the atoms for deletion by their residues. A segid of "WATErs" was given to the water PDB so to select waters, the user can input "SEGId WATErs". Just to be sure all waters are selected, "TYPE OH2" will select all atoms with the atom name, "OH2". Now that the command has ensured that only waters will be chosen, it must be guaranteed that the waters will only be removed if they are too close to the protein. Select offers an "AROUnd" feature that will select atoms within a given proximity. Select should be followed by a number in angstroms and 2.5 is a reasonable value.

This tutorial positioned both the waters and the protein at the origin, the sphere will be centered at the origin.

! Delete extraneous waters
DELEte ATOM SORT SELEct .BYREs. -
(.NOT.(POINt 0. 0. 0. CUT @SAFESPHERE) .AND. ( SEGId WATErs )) END
! where @SAFESPHERE is the size of the protein (25 Å) plus some buffer
! region (e.g. 20 Å) for a total diameter of 45 Å.

If the user wants to minimize and/or apply molecular dynamics to the solvated PDB then a sphere will result in errors when using images; therefore, the user may want to solvate in a regular geometric structure. First, the user must determine the X,Y, and Z length of the box. This tutorial will remove anything outside the 45x45x45 Å box.

! Takes statistics on coordinates to figure out the minimum, maximum, and
! average values of the X, Y, and Z coordinates of the protein. These are
! stored in XMIN,XMAX,XAVE,YMIN,YMAX,YAVE,ZMIN,ZMAX,ZAVE.
COORdinate STATistics SELEct protein END
! If you want to solvate 10 Å from the edge of the protein then
! the distance is multipled by two and then added to the diameter of
! the structure.
CALC XDIM = ( ABS ( ?XMAX - ?XMIN ) + 20
CALC YDIM = ( ABS ( ?YMAX - ?YMIN ) + 20
CALC ZDIM = ( ABS ( ?ZMAX - ?ZMIN ) + 20

Be advised that the "SELE protein END" atom selection will need to be replaced with a real atom selection that gets all of the protein atoms. For a regular PDB without bulk water, "SELE all END" (selecting all atoms) is a reasonable choice.

Not all 46,656 waters may be necessary in the solvation of a protein. Removing extraneous waters can result in faster data gathering and reduce clutter. CHARMM has a "CUT" command which can cut off all atoms in a radius specified in angstroms. The user will have to scan their PDB and decide a good cutoff based on the Cartesian coordinates of their protein. Alternatively, it is possible to figure out a good radius for the sphere based on the size of the largest axis of the structure.

This script shows one additional useful feature of CHARMM, namely this use of variables (in this case greatervalue and gvsq). Variables within scripts can be assigned via the set command or calculated via the calc command. When a variable is referenced, it must be preceded by the "@" character, as shown below. Variables can also be set on the command line when the CHARMM executable is invoked, e.g. ./charmm variable1=value1 variable2=value2 < charmm.inp, In this script sample, the @greatervalue variable contains the length of the longest axis of the protein plus the buffer value.

! OK, now we set a safe spherical diameter (based on the minimum  
! sphere that will circumscribe a cube of this dimension.
calc gvsq = @greatervalue * @greatervalue
calc safesphere = sqrt( @gvsq  + ( 2 * @gvsq ) )
calc safesphere = @safesphere/2

The input script should now look like:

 * Solvate Protein
 *
 bomlev -2
 
 ! Read in Topology and  Parameter files

open unit 1 card read name top_all27_prot_na.rtf
read RTF card unit 1
close unit 1

open unit 1 card read name par_all27_prot_na.prm
read PARA card unit 1
close unit 1

! Read protein structure from PSF file
open read formatted unit 27 name 1yjp-full-min.psf
read psf card unit 27
close unit 27

open read unit 10 card name 1yjp-full-min.crd
read coor card unit 10
close unit 10

! orient about origin so the alignment is correct with the water crystal
coordinate orient

! Takes statistics on coordinates to figure out the minimum, maximum, and
 ! average values of the X, Y, and Z coordinates of the protein. These are
 ! stored in XMIN,XMAX,XAVE,YMIN,YMAX,YAVE,ZMIN,ZMAX,ZAVE.
COORdinate STATistics SELEct protein END

! Read in water sequence
READ SEQUence TIPS 46656

! Generate new segment for the water
GENErate WATers NOANgle NODIhedral

! Read the water PDB coordinates and append them to the protein
OPEN READ FORMATTED UNIT 1 NAME water.crd
READ COOR CARD UNIT 1 APPEND
CLOSE UNIT 1

COORdinate ORIEnt NOROtations SELEct SEGId WATErs END

! Delete extraneous waters 
DELEte ATOM SORT SELEct .BYREs. -
(.NOT.(POINt 0. 0. 0. CUT @SAFESPHERE) .AND. ( SEGId WATErs )) END

! xdim, ydim, and zdim are variables that store the dimensions of
 ! the protein along the x, y, and z axis respectively
 ! Because the user requested 10 distance from the edge,
 ! the distance is multipled by two and then added to the diameter of
 ! the structure.
CALC XDIM = ( ABS ( ?XMAX - ?XMIN ) + 20  )
CALC YDIM = ( ABS ( ?YMAX - ?YMIN ) + 20  )
CALC ZDIM = ( ABS ( ?ZMAX - ?ZMIN ) + 20  )

Now that we have the dimensions of the system defined it is time to build the crystal and setup the images that will be employed in molecular dynamics simulations that use periodic boundary conditions.

To start things off, use the "CRYStal" command followed by the "DEFIne CUBIc" so CHARMM knows we are simulating a protein in a water cube. Following this command, the Cartesian dimensions of the cubic box (the total length of each of the three sides) must be given. As shown above the structure can be used to determine the dimensions of the box. Using the values of XDIM, YDIM and ZDIM the lattice parameters (a, b and c) can be defined. Additionally, the angles of the shape need to be defined and since a cubic box is requested alpha, beta and gamma should be set to "90. 90. 90." These commands should be followed by:

CRYStal BUILd NOPER 0

NOPEr is the number of space group operations. For a cube, "NOPEr" is zero since there are no operations needed for a cubic crystal. All together you should have something that resembles the following:

Once the crystal structure is defined, it is necessary to create the actual image atoms, which is accomplished via the IMAGe command. The image set-up determines how atoms moved outside the periodic box are shifted. The two most commonly used shifting options are to re-center by residue (BYREs) and by segment (BYSEg). Generally, connected polymers such as protein segments should be set to shift by segment (so they are not "broken up" by shifting) and disconnected portions of the structure (e.g. waters and ions in solvation) should be recentered by residue. The final crystal and image commands should look like:

CRYSTAL DEFINE CUBIc @XDIM @XDIM @XDIM 90.0 90.0 90.0
CRYSTal BUILd NOPEr 0
IMAGe BYREs SELEct .NOT. SEGId PROTein END
IMAGe BYSEg SELEct SEGId PROTein END

Note that we assume the protein (or other connected segments) are in SEGId PROTein. The select statement will need to be modified for each individual structure, depending on which parts need to have re-centering done by residue and which by segment.

Next the image list must be built and atoms that are located outside of the box must be deleted. This is done by forcing an image update (using the IMAGe command), which will cause all atoms that are not inside the periodic box to be move inside it. By saving the original coordinates in the comparison set (via COOR COPY COMP), we can compare them with the modified coordinates, allowing us to see exactly which atoms have changed position. The COOR DIFF command accomplishes this and saves the distances into the main coordinate set (overwriting the set that was there previously). The SEL1 atom selection chooses those atoms that have moved (i.e. have nonzero x, y, and z coordinates from COOR DIFF). These atoms, which were originally outside the unit cell, are then deleted.

 ! COPY MAIN COORDINATES TO COMPARISON SET
 COOR COPY COMP
 
 ! SET WRNLEV DOWN TO REDUCE CLUTTER...
 WRNLev 1
 
 ! UPDATE THE IMAGE LISTS. IF ANY ATOMS MOVE DURING THIS PROCESS, IT MEANS
 ! THAT THEY ARE EXTRANEOUS TO THE CRYSTAL STRUCTURE THAT WE WANT TO BUILD
 ! AND SHOULD BE DELETED. THEREFORE, COOR DIFF IS USED TO DETECT MOVING
 ! ATOMS, WHICH ARE THEN SELECTED FOR DELETION.
 UPDAte INBFrq -1
 
 COOR DIFF
 DEFIne SEL1 SELEct .BYREs. (PROPerty X .NE. 0.0 .OR. -
 PROPerty Y .NE. 0.0 .OR. PROPerty Z .NE. 0.0) END
 
 ! COPY COMPARISON COORDINATES BACK TO MAIN
 COOR SWAP
 
 ! DELECT ATOMS THAT ARE OUTSIDE THE BOX
 DELEte ATOM SELEct SEL1 END
 
 ! WRITE OUT A CRYSTAL TRANSFORM FILE
 OPEN UNIT 1 CARD WRITe NAME 1yjp_cube.xtl
 CRYStal WRITe CARD UNIT 1
 * SOLVATED STRUCTURE
 * SOLVATION: CUBE WITH A CRYSTAL DIMENSION OF @XDIM
 *

It is now advisable to minimize a newly solvated structure before applying molecular dynamics, otherwise, the initial motion of the protein may be very jerky as the system tries to reach a stable conformation. Therefore, the recommended method is to first minimize the system to stabilize the energy and then gradually add energy back to it by heating during dynamics (will be discussed later).

Now the non-bond values must be set. Start off the command with "NBONd". In order to save time and memory, CHARMM allows the user to create a cutoff so the van der Waals energy (which is uses the common Lennard-Jones potential) is only calculated if two atoms are within a given distance of one another. When two atoms approach the cutoff distance, a switching function removes the contributing energies from the atoms without creating a discontinuity. CHARMM will keep track of the coordinates via "INBFRq"to check if they are within the cutoff value. The "CTOFnb" and "CTONnb" parameters define the range within which the switching function is used. The numbers associated with cutnb, ctofnb, and ctonnb are in angstroms and a description of these parameters follows.

Graphical view of cutoffs

Taking the above figure as an example, a pair of atoms that are separated by 8 (CTONnb) to 12 Å (CTOFnb) will initiate the switching function. CHARMM will keep track of atom pairs between 12 and 16 Å until the distance between the atoms in the pairs is greater than CUTNb (16 Å). A graph of the potential energy between two would look like:

Graphical view of cutoffs

As soon as pairwise distance between atoms is between 8 and 12 Å, CHARMM will activate the switching function bringing the contribution value to zero as shown by the graph. This is done using the "VSWItch" option to the NBONd command.

This tutorials values for CUTNb, CTOFnb, and CTONnb will be specified with the options:

CUTNb 16.0 CTOFnb 12. CDIE EPS 1. CTONnb 8.

The molecules near the edge of a box act as if they are interacting with images of other molecules that are also near the edge of the box in the system. The user controls this by defining the ends of the boundaries in the crystal command discussed above. The diagram below shows the original system (in red) with CHARMM-created images surrounding the original system. The circles represent the radius in which the original system atoms interact with the image atoms.

Graphical view of images

Setting the radius size can be done with the "CUTIm" option, which creates a cutoff for atoms interacting with other images atoms. This tutorial will use the value 16.0. In general, CUTIm should always equal CUTNb! CHARMM will further need to know how often to update the image list and this can be done with the "IMGFrq" command. Setting it to -1 will let CHARMM know to only update it when necessary.

If two atoms become too close, CHARMM has a warning system implemented to inform the user. To set the warning distance (in Å) use the command "WMIn" followed by a number. 1.5 is the default.

Generally, the most costly part of the calculation is the long-range electrostatic interactions between atoms. CHARMM supports the Particle-Mesh Ewald (PME) summation method which is a particularly efficient way of calculating these. More information can be found by consulting ewald.doc in the CHARMM documentation.This can be set up by the command "EWALd PMEW". The grid points used by PME have to be set with the "FFTX" "FFTY" and "FFTZ" options. The values for these grid points should be near the value of the simulation box size. These must be integers and cannot have any other prime factors besides 2,3 and 5. Therefore, a number like 32 would work whereas 49 would cause an error. In general, powers of 2 are suggested. KAPPa is another value that has to be given for an Ewald summation and it is the real number that governs the width of the Gaussian distribution central to the summation. A general value is .34. The "SPLIne ORDEr 6" command can help speed up the process of the Ewald summation because it uses approximations to determine the polynomials in the function of cubic spline interpolation. Shake, which is described above, may also be used.

The input script should resemble the following:

 * Solvate Protein
 *
 bomlev -2
 
 ! Read in Topology and  Parameter files
open unit 1 card read name top_all27_prot_na.rtf
read RTF card unit 1
close unit 1

open unit 1 card read name par_all27_prot_na.prm
read PARA card unit 1
close unit 1

! Read protein structure from PSF file
open read formatted unit 27 name 1yjp-full-min.psf
read psf card unit 27
close unit 27

open read unit 10 card name 1yjp-full-min.crd
read coor card unit 10
close unit 10

! orient about origin so the alignment is correct with the water crystal
coordinate orient

! Takes statistics on coordinates to figure out the minimum, maximum, and
 ! average values of the X, Y, and Z coordinates of the protein. These are
 ! stored in XMIN,XMAX,XAVE,YMIN,YMAX,YAVE,ZMIN,ZMAX,ZAVE.
COORdinate STATistics SELEct protein END

! Read in water sequence
READ SEQUence TIPS 46656

! Generate new segment for the water
GENErate WATers NOANgle NODIhedral

! Read the water PDB coordinates and append them to the protein
OPEN READ FORMATTED UNIT 1 NAME water.crd
READ COOR CARD UNIT 1 APPEND
CLOSE UNIT 1

COORdinate ORIEnt NOROtations SELEct SEGId WATErs END

! Delete extraneous waters 
DELEte ATOM SORT SELEct .BYREs. -
(.NOT.(POINt 0. 0. 0. CUT @SAFESPHERE) .AND. ( SEGId WATErs )) END

! xdim, ydim, and zdim are variables that store the dimensions of
 ! the protein along the x, y, and z axis respectively
 ! Because the user requested 10 distance from the edge,
 ! the distance is multipled by two and then added to the diameter of
 ! the structure.
CALC XDIM = ( ABS ( ?XMAX - ?XMIN ) + 20  )
CALC YDIM = ( ABS ( ?YMAX - ?YMIN ) + 20  )
CALC ZDIM = ( ABS ( ?ZMAX - ?ZMIN ) + 20  )

CRYSTAL DEFINE CUBIc @XDIM @XDIM @XDIM 90.0 90.0 90.0

CRYSTal BUILd NOPEr 0
IMAGe BYREs SELEct .NOT. SEGId PROTein END
IMAGe BYSEg SELEct SEGId PROTein END

! COPY MAIN COORDINATES TO COMPARISON SET
COOR COPY COMP

! SET WRNLEV DOWN TO REDUCE CLUTTER...
WRNLev 1

! UPDATE THE IMAGE LISTS. IF ANY ATOMS MOVE DURING THIS PROCESS, IT MEANS
 ! THAT THEY ARE EXTRANEOUS TO THE CRYSTAL STRUCTURE THAT WE WANT TO BUILD
 ! AND SHOULD BE DELETED. THEREFORE, COOR DIFF IS USED TO DETECT MOVING
 ! ATOMS, WHICH HAD SELECTED FOR DELETION.
UPDAte INBFrq 0

COOR DIFF
DEFIne SEL1 SELEct .BYREs. (PROPerty X .NE. 0.0 .OR. -
PROPerty Y .NE. 0.0 .OR. PROPerty Z .NE. 0.0) END

! COPY COMPARISON COORDINATES BACK TO MAIN
COOR SWAP

! DELECT ATOMS THAT ARE OUTSIDE THE BOX
DELEte ATOM SELEct SEL1 END

! WRITE OUT A CRYSTAL TRANSFORM FILE
 OPEN UNIT 1 CARD WRITe NAME 1yjp_cube.xtl
 CRYStal WRITe CARD UNIT 1
 * SOLVATED STRUCTURE
 * SOLVATION: CUBE WITH A CRYSTAL DIMENSION OF @XDIM
 *
 
 ! FIX HYDROGEN BONDS TO THEIR VALUES IN THE PARAMETER FILE
SHAKe BONH PARAmeters

! NON-BOND SETUP (SAME VALUES AS WILL BE USED IN DYNAMICS) 
 NBONd INBFrq -1 ELEC FSWItch VDW VSWItch CUTNb 16. CTOFnb 12. CTONnb 8. -
 CUTIm 16.0 IMGFRQ -1 WMIN 1.0 EWALd PMEW FFTX 48 FFTY 48 FFTZ 48 KAPPa -
 .34 SPLIne ORDEr 6
 
 ENERgy
 
 MINI SD NSTEp 100
 MINI ABNR NSTEp 1000 NPRInt 100 TOLG 0.05
 
 OPEN WRITE UNIT 1 CARD NAME 1yjp-full-solv-min.pdb
 WRITE COOR PDB UNIT 1
 * new_1yjp-11213-min.pdb
 *
 
 OPEN WRITE UNIT 1 CARD NAME 1yjp-full-solv-min.crd
 WRITE COOR CARD UNIT 1
 * new_1yjp-11213-min.crd
 *
 
 STOP