Sb intro1

From CHARMM tutorial
Jump to navigationJump to search

A first Intro to CHARMM scripts

Welcome to a first tour of CHARMM. If you have not done so, this may be a good idea to look at the prerequisites that we assume for this introduction and throughout the rest of this tutorial.

Starting up CHARMM

While you can run CHARMM interactively, one usually tells the program what to do by means of a script. Under Unix (at least for non-parallel versions of the program), this means that in order to execute a (short) CHARMM calculation, one runs from the command line (Unix Shell prompt)

charmm_executable < charmm_input_script.inp 

exploiting input redirection available under all Unix shells. (TODO: what about other OS (Windows, OSX ?), explain non-parallel remark. Possibly should state somewhere that tutorial assumes Unix/Linux??). Since as we shall see shortly CHARMM output tends to be verbose, one normally also redirects the output to a file, thus ending up with

charmm_executable < charmm_input_script.inp > charmm_output_file.out 

Of course, instead of charmm_executable use the filename starting up CHARMM at your location and replace charmm_input_script.inp and charmm_output_file.out by the names of the actual script/output file which you want to run (to which you want to save your output).

Some generic elements of CHARMM scripts / scripting

Titles

First, let's do something really silly and start up charmm reading from an empty file; which can be easily accomplished by executing

charmm_executable < /dev/null

CHARMM prints a header telling you copyright info, version and some more stuff, followed by a warning

RDTITL> No title read.

     ***** LEVEL  1 WARNING FROM <RDTITL> *****
     ***** Title expected.
     ******************************************
     BOMLEV (  0) IS NOT REACHED. WRNLEV IS  5

The job finishes by printing some status info. The interesting part is the warning from which we learn that CHARMM expected a "title". Indeed, each CHARMM script should start with a title, and if the main script tells CHARMM to read from another file, the program also expects to find a title at the beginning of that file.

A title should not be confused with comments. E.g., it can only occur at the beginning of a file (we'll explain the apparent exceptions when we encounter them). Title lines start with a star or asterisk (*); to indicate the end of the title give a line containing only a star. (A title can consist of up to 32 consecutive lines) Thus,

* This would be a short title
* 

If you start CHARMM with a short file containing the above snippet (=title), you get the title echoed in uppercase letters

RDTITL> * THIS WOULD BE A SHORT TITLE
RDTITL> *


instead of the warning when using the empty file.

Comments

Having blabbered so much about titles, what are comments: A comment in a CHARMM script is everything following an exclamation mark ! I.e.,

! this is a comment on a line by itself

and this would be a line containing a CHARMM command, followed by a comment

ENERgy ! as you might expect, this command calculates an energy

Ending a CHARMM script

So far, CHARMM finished when it reached the end of the script file (as the line

                   NORMAL TERMINATION BY END OF FILE

in the output informs you. It's OK to end a CHARMM script in this manner, but the preferred way of stopping CHARMM is by issuing the

stop

command. We, thus, can create a first, completely useless CHARMM script, looking like

* A CHARMM script doing nothing
*
 
! we really should be doing something, but in the meantime all we know is
 
stop

In addition to the title, the comment is echoed as well. Note that CHARMM now prints upon finishing

                   NORMAL TERMINATION BY NORMAL STOP

A first full example

One of the major hurdles in getting started with CHARMM is the initial setting up of one's simulation system (usually starting from a PDB file of the system (protein) of interest). This is where sites such as http://www.charmming.org prove so helpful, and we shall take a detailed look at what is involved in the initial setup shortly. For the time being, however, let's assume that we have accomplished this first step and that we want to carry out a short calculation with our system, e.g., a minimization. The model system is crambin (PDB entry: 1CRN), and you can download the necessary files from the University of Vienna or individually from the list below. Create a directory and unpack the tar.gz archive there. You end up with five files

  1. top_all27_prot_na.rtf
  2. par_all27_prot_na.prm
  3. 1crn.psf
  4. 1crn_init.crd
  5. 1crn_1.inp

The first four are data files needed to set up the system, the fifth file is the CHARMM script itself. Before we can dig in, we need to say a few words about CHARMM data structures

Some comments about CHARMM data structures

Incidentally, the 1-4 you just downloaded correspond to the four most important data structures required in almost any CHARMM run (you also may want to take a look at the overview of CHARMM data structures):

RTF (residue topology file)
The so-called residue topology file contains the information how the building blocks of larger molecules (e.g., amino acids in the case of proteins) look like. In the context of CHARMM the term residue is used not only for amino acids, but also, e.g., for nucleic acids or sugars, as well as small molecules, e.g., water. For each residue, the RTF provides information about the atoms it contains, the atom type, and the atomic charges. Further, it specifies the connectivity of the atoms (which atom is (chemically) bonded to what other atoms). From these data, CHARMM can usually derive information about angles and dihedral angles itself, although one can/could also specify them by hand. CHARMM has RTFs for nucleic acids, lipids, proteins and carbohydrates. E.g., the file top_all27_prot_na.rtf we use here is part of the CHARMM distribution and contains residue information for amino acids and nucleic acids parametrized with the CHARMM27 force field (TODO: should probably give some instructive refs). Without going into details, a short look at the file should give you an idea of the type of information it contains.
Parameters
The force field parameters in (more or less) human readable form; e.g., force constants of the bond stretching terms, Lennard-Jones radii and well-depths for all atom types etc. (The notable exception are partial atomic charges, which are listed in the RTF --- can you think of the reason why?!) The file par_all27_prot_na.prm used in our example is the companion of top_all27_prot_na.rtf and contains parameters for all compounds described by this RTF file (amino acids and nucleic acids). Again, a short glimpse at the file should give you an idea of the info it contains.
PSF (protein structure file)
The RTF and parameter files (and once processed by CHARMM the corresponding data structures) contain information that applies to, e.g., all proteins by specifying how amino acids look like. Obviously, you want to simulate a specific protein, characterized by its unique primary sequence. The PSF datastructure contains exactly this system specific information, such as the exact list of all bond stretching terms in your protein, all angle bending terms etc. and much more. Before you can do anything useful with CHARMM, this information must have been created internally (we'll see later how this is done). Once created, the PSF data structure can be saved to a file for later reuse; this is what we shall be doing in our crambin example.
Note that the PSF does not contain the force field parameters. As just mentioned, all parameters are read from a parameter file and stored in (an) appropriate data structure(s). The parameters that are needed for the specific system of interest (e.g. crambin) are internally stored in something called the CODES data structure. Normally, you don't have to be concerned with this; however, if you get a CODES error, this normally indicates that your parameter file was incomplete or inconsistent.
Coordinates
Finally, you need coordinates for each atom in the system. The most frequent source of coordinates in real work (with proteins) are PDB (protein database) files, but they need some "massaging" before they can be used by CHARMM. Therefore, we use preprepared coordinates for our example.

Normally, one RTF and parameter file suffices for a complete class of systems (e.g., proteins), and these files are provided together with CHARMM. (More formally, together with the form of the potential energy function, the RTF and parameter file constitute what is referred to as "the CHARMM force field". PSF and coordinates, on the other hand, are specific to the system being studied. At some point, they have to be created by the user of the program (again, we'll see shortly how this is done!)

Back to the crambin minimization example

Provided one has a PSF and coordinates for the system of interest, many typical applications of CHARMM follow a rather simple pattern:

  • (A) Read RTF and parameter file (i.e., load force field information into CHARMM)
  • (B) Read PSF and coordinate file (i.e., load information about specific system)
  • (C) If necessary, do some more preparatory stuff
  • (D) Do some work
  • (E) Possibly do "on the fly" analysis of what was done in (D)
  • (F) Frequently, (D) changed the coordinates; so save the coordinates and whatever else may have changed.
Reading files

Please do take a look at the downloaded example CHARMM script 1crn_1.inp now. As you may expect, it contains a title; followed by a lengthy comment paraphrasing what was just said. And, then it starts off by indeed reading the RTF file, which is accomplished by the following three commands

open unit 1 form read name top_all27_prot_na.rtf
read rtf card unit 1
clos unit 1

An analogous triplet of commands is carried out for reading the parameter file, the PSF and the coordinate file, so it is worthwhile to spend a few words on what's going on here: I/O (Input/Output) in CHARMM follows the model of most programming languages; since CHARMM is programmed in FORTRAN, there are significant similarities to the FORTRAN I/O model. First, a file has to be opened ("connected" to a "unit"); the remaining arguments to the open command specify the mode of operation. Note: units 5 and 6 are reserved; on most systems you can have unit numbers ranging from 1 to 99. Obviously, you have to specify the file name (name <name>). The keyword read indicates that the file is opened for read access (meaning that it is an error if the file does not exist!). Further, we specify that the file is "formatted" (keyword form (alternatively one could use card)), i.e., it is an ASCII text file as opposed to a binary file (in which case the keyword would be file; file is actually the default, so form must be specified!).

The next command (read rtf card unit 1) does the actual reading. The first keyword after read specifies what type of file is going to be read; not surprisingly, the keyword is rtf for RTFs, para for parameter files etc. (see 1crn_1.inp). The second keyword after the read command has to do with the fact that many types of files can be read by CHARMM in more than one format (we'll see examples later!). In all read statements in 1crn_1.inp we specify the respective native CHARMM format by the card keyword. Finally, we have to tell CHARMM from which unit we want to read; typically (as is the case here) this is the unit that was opened just before the read command ( read ... unit 1).

Finally, files opened for reading, should be explicitly closed when one is done. (This is not necessary, but good practice).

A brief comment on the minimization

The coordinates we read are basically the coordinates from the 1CRN pdb file of crambin, with missing atoms (e.g., hydrogens added). The simple command

energy

computes the energy of the system, using a multitude of default settings. This is one reason why this example is somewhat artificial! Many default settings in CHARMM are, unfortunately, not optimal, and normally one would carefully tune options here (more on this, in particular the necessary background, later). Further, we have crambin without surrounding solvent, so the subsequent minimization of the protein in "vacuum" is not very realistic. For the moment, we can ignore these caveats and take a look at the output of the energy commmand instead.

CHARMM>    energy

NONBOND OPTION FLAGS: 
    ELEC     VDW      ATOMs    CDIElec  SHIFt    VATOm    VSWItch 
    BYGRoup  NOEXtnd  NOEWald 
CUTNB  = 14.000 CTEXNB =999.000 CTONNB = 10.000 CTOFNB = 12.000
WMIN   =  1.500 WRNMXD =  0.500 E14FAC =  1.000 EPS    =  1.000
NBXMOD =      5
There are        0 atom  pairs and        0 atom  exclusions.
There are        0 group pairs and        0 group exclusions.
<MAKINB> with mode   5 found   1835 exclusions and   1713 interactions(1-4)
<MAKGRP> found    578 group exclusions.
Generating nonbond list with Exclusion mode = 5
== PRIMARY == SPACE FOR   184479 ATOM PAIRS AND        0 GROUP PAIRS

General atom nonbond list generation found:
  119614 ATOM PAIRS WERE FOUND FOR ATOM LIST
    4229 GROUP PAIRS REQUIRED ATOM SEARCHES

ENER ENR:  Eval#     ENERgy      Delta-E         GRMS
ENER INTERN:          BONDs       ANGLes       UREY-b    DIHEdrals     IMPRopers
ENER CROSS:           CMAPs     
ENER EXTERN:        VDWaals         ELEC       HBONds          ASP          USER
----------       ---------    ---------    ---------    ---------    ---------
ENER>        0   -296.51905      0.00000     15.23577
ENER INTERN>      117.51821    131.42640      9.84634    215.59668       2.29425
ENER CROSS>       -64.40694
ENER EXTERN>      -91.86576   -616.92824      0.00000      0.00000       0.00000
----------       ---------    ---------    ---------    ---------    ---------

The lines immediately following the energy command report (some of) the settings used to compute the energy (e.g., the use of a 12 A cut-off radius (CTOFNB), of a shifting function for the electrostatic (SHIFt) and a switching function (VSWItch) for Lennard Jones interactions) . The following lines contain some information about the generation of the nonbonded lists. (TODO: this should point towards my material (possibly updated) about energy calculation) The actual output of the energy calculation can be found in the lines starting with ENER. The header lines should be mostly self-explanatory, in particular the total energy can be found as entry ENERgy. UREY-b denotes the Urey-Bradley like bonded terms in the 1-3 distance, CMAP is the new dihedral cross term (TODO: some ref?), and VDWaals in CHARMM jargon refers to the Lennard-Jones interactions.

The next two commands do the real work (minimization). Obviously, mini initiates the minimization. The keyword following mini indicates the minimizer to be used; here we start with steepest descent (SD), followed by adopted basis set Newton Raphson (ABNR) [give reference]. Details of these minimizers will be discussed later. The option nstep 100 tells the program to carry out 100 minimization steps. The behavior of the minimizers could be finetuned by method specific additional options, but this is beyond the scope of this introduction. The output from the minimization resembles that of the energy calculation

MINI MIN: Cycle      ENERgy      Delta-E         GRMS    Step-size
MINI INTERN:          BONDs       ANGLes       UREY-b    DIHEdrals     IMPRopers
MINI CROSS:           CMAPs     
MINI EXTERN:        VDWaals         ELEC       HBONds          ASP          USER
----------       ---------    ---------    ---------    ---------    ---------

[... snip ...]

MINI>       10   -492.83381    196.31476      1.39253      0.00156
MINI INTERN>       31.06297     99.50559      7.06564    212.92448       4.12544
MINI CROSS>       -68.59167
MINI EXTERN>     -129.29653   -649.62974      0.00000      0.00000       0.00000

More about minimization later on. At present, we just want to point your attention to the fields Delta-E and GRMS. The first reports the change in energy to the previous printout, i.e., if you add the 196.31 kcal/mole of the MINI> 10 line to the energy at this step (-492.83 kcal/mole), you obtain the initial energy of the system (-296.52 kcal/mole, see the MINI> 0 line, or the output of the energy command). The gradient root mean square (GRMS) field reports the accompanying root mean square deviation in the forces. If you skim through the output, you can see how the energy goes down; if you set nstep of the ABNR minimization to something very high (e.g., 10000), you would most likely see that both Delta-E and GRMS eventually become zero; possibly, the minimization would even stop if some internal convergence criteria are fulfilled.

It should be pointed out that the ener command is redundant; we could have just carried out the minimization. However, you'll often find an energy command after everything is set up, and we maintain this tradition. If, for some reason something went wrong up to this point, the energy calculation will most likely fail and the CHARMM run will abort at this (early) point. Also, one would usually add to the ener command all the various options pertaining to how energy and forces are calculated exactly; these are remembered throughout the remainder of the CHARMM script and commands like mini or dyna (molecular dynamics) need to have only options controlling details of minimization or MD, rather than energy options as well. We shall see this in later examples, where attention is paid to these details.

Writing files

At the end of the minimization, our system does not only have a much lower energy, but its coordinates have changed as well. Most likely, one wants to save these coordinates for later use, and so we have to learn how to write files. Writing files in CHARMM is very similar to reading files. First, the file is opened (with option writ

open unit 11 form writ name 1crn_somemini.crd

instead of read; in this case, the file need not exist), followed by the

writ coor card unit 11

statement, which writes to the just opened unit 11. As in the read command we have to tell CHARMM what type of file (coor for coordinates) we plan to write, and in what format (card for native CHARMM format in the above snippet). If you look at 1crn_1.inp, you'll see that we actually write the coordinates twice, once in CHARMM format (keyword card, but also in PDB format (keyword pdb). The native CHARMM format is intended for reuse by CHARMM, whereas the PDB file can be used for data exchange with other programs, in particular (but not limited to) graphics programs. The novel part (compared to reading) is that the writ command is followed by a title

writ coor pdb unit 11
* CRAMBIN, from pdb entry 1crn,
* with CHARMM27 parameter set
* after 100 steps SD + 100 steps ABNR minimization
*

This provides the possibility to pass some information into the respective output file (take a look at 1crn_somemini.crd, the title is there, augmented by some "accounting information"). The title can be ommitted, but providing an informative(!) title is good practice. After most type of writes, files are closed automatically; so there is usually no need for an explicit clos command.

A closing comment

Congratulations, you have just worked your way through a complete (and almost meaningful) CHARMM run/script. Lets take one more look at 1crn_1.inp from a different point of view. Strip the file of all comments and titles, and then count the number of commands. Given that the ener command is actually superfluous, the real work is initiated by just two lines (the two mini commands). All other commands (I count 18!) do routine stuff like reading in the force field (RTF and parameters) or write out the final coordinates. This ratio is not too untypical; while the "hard work" commands of CHARMM do very much controlled by a single command line, they need to be surrounded by multiple "lesser" commands taking care of the small details. There are two other characteristics of (most) CHARMM scripts that you'll notice soon. (i) Many of them are very similar. Had we decided to run a short MD simulation as a first example, our script would have looked very similar. (ii) If you want to apply our minimization script to a different protein (provided you have a PSF and coordinates), all you would have to do is to change the filenames. In the following, we want to show you capabilities of CHARMM scripting that help you make your scripts more general (e.g., to write one script that could be used to minimize any system for which you have a PSF and coordinate file), as well as modular (i.e., to split the script into recurring parts, such as reading of RTF and parameters, as opposed to the work part, which changes depending whether you want to do MD or minimization or normal coordinate analysis or ...

Refinement of CHARMM scripts / scripting

Command line parameters, @variables

So far, the CHARMM scripting language seems to be a concatenation of individual commands. It does contain, however, most (if not all) elements of a (imperative) programming language (even though it is not an extremely comfortable one). One key element of a programming language is the ability to set and manipulate variables. In the context of running CHARMM, it is extremely useful to pass certain variables into the program from the outside. Run the following miniscript (we name it title2.inp)

* Example of passing a variable from the command line  
* The variable myvalue was initialized to @myvalue  
*  
stop 

by executing

charmm_executable myvalue=500 < title2.inp 

and study the output (you could also state myvalue:500): Before the title is echoed, you see that an argument was passed

Processing passed argument "myvalue:500"  
Parameter: MYVALUE <- "500" 

and in the echoing of the title, @myvalue is replaced by the value assigned to the variable myvalue, i.e., 500 in our case. The little example highlights something to keep in mind. The first thing that is done when a line in a script (containing a title or a command) is parsed, is to scan for @variables, which are then replaced by their value (no variable replacement is done in comments!).

Thus, we can make an easy generalization to our minimization script 1crn_1.inp. Rather than hardcoding the number of minimization steps, we could pass the number from the command line. Replace, e.g., the line

 mini sd nstep 100 

by

 mini sd nstep @nstep 

and execute the script as

 charmm_executable nstep:10 < 1crn_1.inp > my_output_name 

and you will carry out 10 steps of SD minimization. If, however, you specified nstep:1000, you would carry out 1000 SD minimization steps. (Obviously, you could do the same for the ABNR minimization command).

If you follow some naming conventions, you can generalize you script even further. Note that all files pertaining to our model protein (crambin) that are read or written by 1crn_1.inp start with "1crn" (the PDB code of the crambin structure used). If you replace "1crn" by "@system" and pass system:1crn on the command line, you have exactly the functionality of the original script. However, suppose that you have a psf and coordinate file for some other protein, then you can use the very same script for this protein as well. Let's assume that your modified 1crn_1.inp script is named simplemini_1.inp to distinguish it from the original downloaded version, then executing

charmm_executable nstep:100 system:1crn < simplemini_1.inp > some_output_name

does exactly what the original 1crn_1.inp script did, where everything was hardcoded. If you specify

charmm_executable nstep:500 system:1ccn < simplemini_1.inp > some_other_output_name 

then you carry out 500 steps of (SD) minimization for a protein to which you have given the label 1ccn (1CCN would be the PDB code for an NMR structure of crambin).

Our new script is not free from problems. First, it contains hardcoded titles/comments referring to crambin. Second, if you forget to pass one or more variables on the command line, you CHARMM job will "bomb" (we'll show how to do some error handling for this a bit later). Finally, if you use the @system trick, you have the variable as part of a longer string. E.g, you want that @system_init.crd gets replaced by 1crn_init.crd. In this case this works fine, but suppose you wanted to substitute @system in @systemproperty. This will fail, since CHARMM searches for a variable "systemproperty". You can avoid this by writing @{system}property, which (given that @system expands to 1crn) expands to 1crnproperty. Whenever in doubt, use @{varname} rather than just @varname!

Needless to say, one can also set variables in a script itself, manipulate them, and calculate with them. Before we show some examples, let us show you two ways how one can see/check the result of the calculations/manipulations.

The echo command:
does more or less the same as its Unix counterpart; everything following echo is written to standard output (one can also force it to write to a different file unit, see miscom.doc). Variables are expanded, so the output will contain the value of the variables rather than the @variable string. Suppose @test contains the value 625. Then

echo value of test is @test

will result in the output line

VALUE OF TEST IS 625

in the CHARMM output

The write title command:
We have seen that @variables get expanded in titles. Just as one can write coordinates (writ coor) or PSFs etc., one can also write an arbitrary title. This is the most flexible way to get "generic" output through CHARMM (i.e, to write an almost arbitrary piece of text). The trick here consists in creating a title on the fly and writing it to standard output (or, alternatively, a separate file). In Unix, standard output is associated with (file) unit 6, and so we are going to write our title to unit 6. Look at the following code snippet

writ title unit 6  
* some title on the fly 
* which may contain @variable values which will be substituded
* 

The writ title command (as any writ command) expects that a title follows, on which normal command line substitution is carried out. This title will be written to the specified unit, in the above example standard output, i.e., it gets merged into the normal CHARMM output. Consider the following example (which we name title3.inp)

* Example of passing a variable from the command line
* The variable myvalue was initialized to @myvalue 
*
! we now set a second variable ourselves 
set 2value 100   
! and check that we can use it 
write title unit 6 
* checking values of variables 
* myvalue : @myvalue 
* 2value  : @2value    
stop 

Run it as charmm_executable myvalue:500 < title3.inp and observe the output. You see that the title lines following the write title appear twice; once, as in the beginning of the script following the RDTITL> * string, and a second time all by themselves, without leading *. (BTW: try replacing unit 6 by, e.g., unit 51 and precede the command by

open writ form name some_file_name unit 51 

If you rerun this modified title3.inp script, you will notice that the lines

CHECKING VALUES OF VARIABLES 
MYVALUE : 500  
2VALUE  : 100 

are now missing from the output, but you will find them in the new file named some_file_name instead.

Manipulating variables

Based on the framework of title3.inp we can now illustrate how variables can be manipulated. Please try the following examples and make sure that you understand what's going on:

varmanip1.inp:

* First example of manipulating variables
*

! set a variable
set value 100

! and check its value
write title unit 6
*checking values of variables
*value : @value 

! now increment it by one
incr value by 1

! and check its value
write title unit 6
*checking values of variables
*value : @value 

stop

Of particular interest is the calc facility of CHARMM, which permits fairly complex calculations. Please take a look at miscom.doc, which contains information about calc and other "miscellaneous" commands; the following example just skims the surface

varmanip2.inp:

* Example of calculating with calc
*

! set a variable
set value 100

! and check its value
write title unit 6
*checking values of variables
*value : @value 

! now increment it by one
calc value = 2 * @value
calc sqvalue = sqrt ( @value ) 

! and check the results
write title unit 6
*checking values of variables
*value   : @value 
*sqvalue : @sqvalue 

stop

The example should be self-explanatory. Some care is required when and where to use @ (typically on the right hand side of an expression), and to leave spaces between ( ) and mathematical operators and variable names, e.g.

calc sqvalue = sqrt ( @value )

in the example (see the documentation for further details!)

A note on ?variables

For completeness sake, a preliminary comment on a second type of variable. Many CHARMM commands, aside from producing more or less direct output, place some key values in "internal" variables. E.g., the energy of a system calculated with the most recent ENER command is put into a variable named ener. To avoid name clashes with variables set by users, these value of these variables can be accessed by preceding the variable name by a question mark ?. I.e., if CHARMM encounters (in a title or in a command) the string ?ener, it will attept to replace it with the energy value from the most recent ENERgy command. This is quite handy, and we'll see many examples later on, once we have reached the stage where we can use CHARMM to work with biomolecular systems.

A first revised version of our full example

As an example of the things we have learned, let's study a revised version of our first full example (1crn_1.inp). We call the file simplemini.inp, and you can download it here. It incorporates the ideas outlined earlier and uses ?variables to report the energy change upon minimization in the coordinate files saved.

Some shortcomings remain: (1) If you forget to pass the necessary command line variables, your job dies somewhere in the middle of the script. It would be nice if some error checking were done at the beginning of the script and / or default values were provided were possible. We'll show shortly how to address this. (2) Regardless of the number of minimization steps requested, coordinates are saved to the same files, potentially overwriting earlier outputs. It really depends on you (what you want to accomplish) whether you consider this a problem or not, so there is no generic solution to offer.

Branching

An essential element of any programming language is branching. In CHARMM scripting you can do things along the lines

IF comparison THEN
   ! do something
ELSE
   ! do something else
ENDIF

There is no explicit ELSE IF, but IFs can be nested.

Example 1: In our work, we occasionally run MD simulations of a system in the gas phase and with an implicit solvent model. Since in this case the same PSF and coordinate file can be used, it is possible to use one and the same script for both environments (gas phase and "solution"). Depending on the environment, the options for energy calculations have to be set differently; also, for the calculations with implicit solvent some additional commands are required. We pass from the command line a variable env, which should be set to "gasp" or "solv" (with obvious meaning). Our script(s) then contain(s) code like the following

if @env eq gasp then
   energy inbfrq 1 atom vswi swit cutnb 1000. ctofnb 998. ctonnb 996.
   ! => energy options for gas phase (infinite cut-off)
endif
if @env eq eef1 then
   ! set up EEF1 implicit solvent model (Proteins, 35:133-152 (1999))
   eef1 setup temp 298.15 unit 93 name solvpar.inp
   update ctonnb 7. ctofnb 9. cutnb 10. group rdie
   energy inbfrq 1
endif

Example 2: We have seen that CHARMM scripts can be made more generic by passing command line parameters. The downside is that one really has to pass all variables. Thus, it may be useful to check whether the user has indeed provided a value on the command line so that a variable is initialized. If this is not the case, a default value can be provided; thus, crashing of the script is prevented.

Suppose you use a variable varname. Provided it has been initialized (set command, calc varname = ..., value passed on command line), any instance of @varname will be replaced by its value. In addition, @?varname returns 1. If varname is not initialized, @?varname returns 0 instead. Now remember our "full example" where we passed nstep (the number of (SD) minimizations to be carried out) from the command line. The following code snipped provides a default value to nstep, preventing the script from crashing if the command line parameter was forgotten:

if @?nstep eq 0 set nstep 100 

Note the use of the short form of if. Since only a single command is executed, the then command endif brace can be ommitted. See miscom.doc for the full syntax. Of course, whether you want defaults is up to you and will depend strongly on the planned usage of the script.

Labels and goto

You can place a label

label someplace

basically anywhere in a CHARMM script. If (before or after) that label CHARMM encounters a

goto someplace

command, the script continues from label someplace; i.e., control is transferred to that line of the script. The label/goto tandem, combined with ifs, make possible to construct almost arbitrary control structures, in particular loops (see below). First, however, two simpler examples.

Example 1: We just showed how to provide a default value to a variable that is expected to be passed from the command line. Obviously, this is not possible or sensible in all cases. The script simplemini.inp expects two parameters, nstep and system. While it makes sense to set a default for the former (if the user forgets to specify a value), the second variable has to be set to a sensible value. Thus, we may want to check whether system is initialized, and if not, write a meaningful error message and exit the program. The following fragment shows how to accomplish this

* The title
* ...
*
if @?nstep eq 0 set nstep 100 ! provide default for nstep
if @?system eq 0 goto systemerror

... ! continue with normal script up to normal

stop

label systemerror

echo You need to pass a value for variable system
echo
echo Aborting execution

stop

Example 2: It was pointed out that many CHARMM scripts do identical things (read rtf, params, psf, coords, some more stuff) whereas the "genuine" work part (minimization, molecular dynamics) consists of just a few lines. Thus, the first twenty to forty lines of many CHARMM scripts contain essentially the same commands. Using label/goto one can reorganize such scripts, so that the boring stuff is placed after a label at the end of the scripts. Thus, the more unique parts of the script can be seen earlier in the file (note that stream files provide a better way of accomplishing the same goal, but the technique may prove useful in other cases). Take a look at simple_mini2.inp. After checking whether nstep and system were passed from the command line, command is transferred to label setupsystem. The corresponding code is at the end of the script; from there, control is transferred back to the beginning of the script (label fromsetupsystem). The first interesting line (mini sd) moves up about 15 lines; one can understand a bit more quickly what the script does.

Note that the "argument" of a goto does not need to be a fixed string; it can also be a @variable or constructed out of @variables. E.g.,

goto @{reference}_point

would jump to first_point if @reference resolved to "first", but to second_point if @reference resolved to "second". This can be very powerful; most of the time, however, constructs like this will result in undecipherable scripts!

Loops

The various commands to manipulate variables, if (then else endif), and the label/goto tandem are the necessary ingredients to build loops etc. Take a look at the following examples, which should by now be self-explanatory

Example 1: carry out a command a fixed number of times

* example of a simple loop (for loop in C // do loop in FORTRAN)
*

if @?nstep eq 0 set nstep 10

set step 1

label loop

echo executing step @step ! real work would be done here ...

incr step by 1

if @step le @nstep goto loop

stop

This little example carries out a command nstep times, counting from 1 to nstep. This resembles the FORTRAN way of doing things. If you come from the C camp, start with set step 0 and change the if statement to

if @step lt @nstep goto loop

In this case, step counts from 0 to 9.

Example 2: compute the factorial of a number

* factorial
*

if @?number eq 0 goto nonumber
if @number gt 50 goto toobig
if @number lt 0 goto negative

prnlev -5

set factorial 1
set input @number

label factorial

if @number le 1 then 
   prnlev 5
   goto done
endif

calc factorial = @factorial * @number

decr number by 1

goto factorial


label done

echo The factorial of @input is @factorial

stop

label nonumber

echo Please initialize the value of number on the command line, e.g.
echo by passing 
echo    number:5 
echo on the commandline 
echo
echo Aborting run ...

stop

label toobig

echo Can calculate factorial only for numbers <= 50
echo
echo Aborting run ...

stop

label negative

echo I do not know how to compute the factorial of a negative number  (@number)
echo
echo Aborting run ...

stop

The otherwise "useless" example provides the possibility to show some new elements: (i) We do some checking on the input (provide a default, trap negative or too large arguments). Second, we tell CHARMM to "shut up" --- prnlev <value> controls the "loquaciousness" of CHARMM; -5 tells the program to be absolutetly quiet; +2 is the default. We have to restore the default, otherwise we would never learn about the result. Also, if you try the above for arguments > 9, you'll see that CHARMM doesn't give the exact integer value, instead it reports an (approximate) floating point number. This automatic conversion of precision, unfortunately, is difficult to to prevent, and can be the source of (severe) rounding errors, if you try to do complicated calculations with CHARMM variables.

Note: Looping in CHARMM is sloooow compared to real scripting languages! This doesn't matter if each loop does a heavy calculation (10000 steps of MD or so), but it becomes very noticeable if each loop carries out just a minor calculation. In this case the real work may take milliseconds per step, but the loop processing may require ten times as long! This is particularly an issue for analysis of trajectories. One can always process a trajectory frame by frame with traj read in a CHARMM loop; very often, however, specialized commands do the looping automatically in a fraction of the time.

Stream files

If CHARMM encounters the command

stream some_filename

it stops reading from the current file and starts reading from some_filename. This provides a (in my opinion) more elegant solution (compared to the label/goto tandem) to cope with repetitive commands that are necessary in almost all CHARMM runs.

Consider the following (stream) file

* Set up the system
*

! make sure that @system is initialized
if @?system eq 0 then

   echo You need to pass a value for variable system
   echo
   echo Aborting execution

   stop

endif

! Read RTF and params

open unit 1 form read name top_all27_prot_na.rtf
read rtf card unit 1
clos unit 1

open unit 1 form read name par_all27_prot_na.prm
read para card unit 1
clos unit 1

! Read PSF and coordinate file

open unit 1 form read name @{system}.psf
read psf card unit 1
clos unit 1

open unit 1 form read name @{system}_init.crd
read coor card unit 1
clos unit 1

return

which is intended to be called from a variant of simple_mini2.inp something like (assume that the above file is called system_setup.inp).

* title of main script 
*

stream setup_system.inp

! Do the work

! save starting energy of system
energy
...

etc. Thus, my final version of 1crn_init.inp / simplemini.inp consists of the script pair simple_mini3.inp plus system_setup.inp.

The system command

On most platforms, CHARMM provides a facility to pass commands to the operating system. This permits the execution of other programs from within a CHARMM script. We just consider a simple example. Suppose you have code that traps an error, like

if @?varname eq 0 goto error
...
label error
echo You forgot to initialize a variable
echo Look at the output for further details
echo Aborting ...

stop

as we have seen several times by now. This stops execution of CHARMM and puts information in the output file. However, you still have to look at the output to find out what's wrong; also, if the actual CHARMM job you wanted to carry out is fast, then you might not even notice that it aborted. Thus, it would be nice if things were written to the console (the screen). The basic form of the system command is

system "some command" 

and everything between the quotation marks is passed to the shell. The quotation marks, on the one hand, mark the command to be passed, but they have a second purpose. They prevent CHARMM from transforming the command to upper case letters, which is important for case sensitive operating systems! (This, however, has some problematic consequences as well in more complicated cases, see below.) So let's try to improve on our error handling by invoking the Unix "echo" command rather than CHARMM's echo.

if @?varname eq 0 goto error
...
label error
system "echo You forgot to initialize a variable > /dev/stderr"
system "echo Look at the output for further details > /dev/stderr"
system "echo Aborting ... > /dev/stderr"

stop

E.g., the first system command passes

echo You forgot to initialize a variable > /dev/stderr

to the shell, the shell echo command sends "You forgot to initialize a variable" to /dev/stderr, the generic standard error device under Unix. The reason for redirecting to standard error is that if the normal output of CHARMM is redirected to a file, e.g.,

charmm_executable < charmm_input_script.inp > charmm_output_file.out

the output of the system "echo .." command would also be buried in that file. Because of the "> /dev/stderr" trick, one would see on the screen

You forgot to initialize a variable
Look at the output for further details
Aborting ...

if an error occurred.

Very quickly a refinement that illustrates some of the pitfalls one may encounter with the system command. Suppose you want to trap whether a variable is not negative (e.g., the factorial example above) and you want to echo the value of the offending variable, then you can use the following code snippet

label negative

if @number lt 0 goto negative

system "echo Negative number \(@{NUMBER}\) cannot be handled > /dev/stderr"
system "echo > /dev/stderr"
system "echo Aborting run ... > /dev/stderr"

Note the we need to write @{NUMBER} within the quotation marks. If we had written @{number}, CHARMM would search for a variable "number", whereas because of CHARMM's standard policy of converting all commands to upper case letters, the "real" name of the variable is "NUMBER" (remember that the quotation marks "..." prevent the default conversion to upper case letters. Second, we had to "escape" the parentheses. This has nothing to do with CHARMM, but with the fact that ( ) have a special meaning for the shell, which also interprets the string passed by CHARMM. Writing "\(" tells the shell that we have a simple "(" in mind, instead of a symbol with special meaning.

Note: The system command is one of the few capabilities that require machine / operating system dependent code (i.e., which cannot be programmed by standard FORTRAN). Thus, it may not work (or work "flakily") on some platforms. The system command can be extremely useful, but if you foresee that your CHARMM scripts will be run on a variety of platforms (e.g., Linux, Windows, Mac OSX) things may break in unexpected manners. Also, as the last example shows, details of system usage may even depend on the shell you use etc. and you have also to understand the workings of the shell. You have been warned!

Another closing comment

Hopefully we could convince you that you can do some nifty things with CHARMM scripts. We use many of these "tricks" to keep our own scripts managable and (reasonably) generic. On the other hand, there are limits what CHARMM scripting can do, and we discourage you "to stretch the limits". Fancy CHARMM scripts can rapidly become incomprehensible (no matter how many comments you write!); hence, if things go wrong (and we mean the type of wrong which you notice after days of simulations), it's often very tedious to spot the error. Despite all its features, the programming capabilities of CHARMM scripts are vastly inferior to those of Perl, Python or even the bash shell. Thus, if you find yourself constructing complex loops within CHARMM scripts or issueing tons of system commands, you should ask yourself whether it would not be better to construct the loops in Perl (Python, bash, ruby ...) and call simpler CHARMM jobs from this external script (passing job specific info via the command line). Yet another approach is taken, e.g., by the charmming.org website. Here the backend generates CHARMM scripts specific for your task and protein on the fly.

As you journey into the depths of CHARMM progresses, keep these options in mind and do not forget to weigh elegance and generality versus simplicity.