Sb pdb

From CHARMM tutorial
Jump to navigationJump to search

SB's thoughts on getting PDB coordinates into CHARMM

Preliminary remarks

Note: For the time being, this is not intended as tutorial material; instead, I want to compare various approaches that I have found / explored.

One glaring omission from the following compilation is the use of the MMTSB tools in this context.

Aside from "doing it myself", I have taken a reasonably detailed look at how others "do it", in particular:

I am focusing on the protein part; waters, ions, ligands (i.e., HETATMs) are yet another story.

The following is tested with c35 and c36, so it takes into account the changes that were introduced starting with c35 concerning the reading of PDB files. Also, this assumes PDB files in a reasonably current format and further expects a version of CHARMM that understands the RESId option to READ COOR (read coor pdb RESId).

I consider the official documentation in io.doc relatively poor; there is at least one factual error: According to io.doc, READ SEQU PDB understands the RESId option; this is, however, not the case!

General considerations

It goes without saying that the PDB file needs to be edited before it can be used; in most cases, it will also be necessary to split the the PDB file. The following summary of what needs to be done is from Rick's posting aready mentioned:

  • change HIS residue name to e.g. HSE
  • create a new file for each subunit, or ANY gap in residue numbers. Considering problems encountered by students / colleagues using CHARMM that run into problems at this stage, this is the single most important point of failure. (Note: for getting the sequence information, one can use the CHAIn X option to extract just chain X from a multi-domain pdb file. However, no equivalent option exists when reading coordinates from PDB files!)
  • for multi-model NMR files, create new files for each model
  • examine subunit terminii for residue numbering, blocking groups

Optional:

  • change atom name CD1 to CD for residue ILE
  • for standard COO- termini, change atom names for O, OXT to OT1, OT2
  • other edits may be needed for e.g. acetyl on N-terminus, amide C-terminus etc.

Possibly the most important editing decision concerns the question whether one includes segment identifiers into the PDB files that are being fed to CHARMM (and reads them as read coor pdb RESId) or not. The former approach is, e.g., used by http://www.charmm-gui.org, the latter by http://www.charmming.org. Depending on the approach taken, one has to watch out for different pitfalls.

Last not least, if you are "doing it yourself", search in the original PDB files for info about disulfide bridges (SSBOND) since you will have to add these manually to the PSF.

An example: 5TIM

To illustrate (some of) the issues/problems, we consider the steps necessary to prepare the pdb file of a triosephosphate isomerase pdb entry 5TIM for work with CHARMM. Below is an excerpt from the ATOM section of the original pdb file:

ATOM      1  N   SER A   2      29.700  57.568  22.037  1.00 42.07           N
ATOM      2  CA  SER A   2      29.413  56.872  20.802  1.00 44.61           C
ATOM      3  C   SER A   2      30.278  55.645  20.373  1.00 34.14           C
...
ATOM   1882  NE2 GLN A 250      17.318  46.262  23.955  1.00 50.11           N
ATOM   1883  OXT GLN A 250      21.621  49.945  21.757  1.00 26.66           O
TER    1884      GLN A 250
ATOM   1885  N   SER B   2      42.653 -11.070 -13.991  1.00 57.21           N
ATOM   1886  CA  SER B   2      41.911 -10.010 -14.605  1.00 57.09           C
ATOM   1887  C   SER B   2      41.928  -8.733 -13.761  1.00 51.98           C
...
ATOM   3766  NE2 GLN B 250      36.162   3.112 -24.388  1.00 71.04           N
ATOM   3767  OXT GLN B 250      40.401  -0.920 -21.933  1.00 49.09           O
TER    3768      GLN B 250
HETATM 3769  S   SO4 A 256      43.311  33.943  -8.945  1.00 67.44           S
...

As one sees, the pdb file contains HETATM entries, but we are going to ignore those in our discussion. In the following we assume that all renaming has been taken care of (e.g., OXT should be renamed to OT2!); also, there are no disulfide bridges in the structure. Triosephosphate isomerase is a dimer, so there are two chains, A and B, in the pdb file. What makes the system a bit more interesting is the fact that there are no coordinates for the respective first residue in the two chains, i.e., coordinates are available staring with residue 2 of chains A and B, respectively.

We are going to explore several ways of getting the 5TIM pdb file "into CHARMM", followed by an analysis how http://www.charmm-gui.org and http://www.charmming.org handle such a case. The common first step is to divide the ATOM section of the pdb file into two pdb files, one for each chain. (Very) recent CHARMM versions can extract sequence information per chain from a single pdb file, but the split files are (still) required for reading the coordinates, so we are going to use separate pdb files both for reading the sequence information, as well as for reading the coordinates.

Appending SEGId labels to the pdb files --- the RESId option

In this case we tag each line of the pdb files with the segment identifier we use. Here we stick to the chain labels, A and B, but we could use arbitrary labels to identify the chains/segments. A brief excerpt of the modified pdb file for chain A:

ATOM      1  N   SER A   2      29.700  57.568  22.037  1.00 42.07      A
ATOM      2  CA  SER A   2      29.413  56.872  20.802  1.00 44.61      A
ATOM      3  C   SER A   2      30.278  55.645  20.373  1.00 34.14      A
...
ATOM   1882  NE2 GLN A 250      17.318  46.262  23.955  1.00 50.11      A
ATOM   1883  OT2 GLN A 250      21.621  49.945  21.757  1.00 26.66      A
TER    1884      GLN A 250

Note the segment identifier placed starting in column 73; also, OXT has been renamed to OT2. The pdb files modified in this manner were processed by the following script (note the RESId option in the read coor pbd resi line!)

* Read 5TIM sequ and coords from two pdb files, one for each chain,
* that have SEGId at the end of the atom entries
* 

! Read topology and parameter files
open read card unit 10 name top_all27_prot_na.rtf
read  rtf card unit 10
clos unit 10

open read card unit 20 name par_all27_prot_na.prm
read para card unit 20
clos unit 20

open read card unit 10 name 5tim_a_segi.pdb
read sequence pdb unit 10
clos unit 10
generate A setup warn first NTER last CTER

open read card unit 10 name 5tim_b_segi.pdb
read sequence pdb unit 10
clos unit 10
generate B setup warn first NTER last CTER

open read card unit 10 name 5tim_a_segi.pdb
read coor pdb  unit 10 resi
clos unit 10
 
open read card unit 10 name 5tim_b_segi.pdb
read coor pdb  unit 10 resi
clos unit 10

! Print heavy atoms with unknown coordinates
coor print sele ( .not. INIT ) .and. ( .not. hydrogen ) end

! Add any missing coordinates
ic param
ic build
prnlev 0
hbuild sele hydrogen end
prnlev 5

energy

open writ card unit 10 name 5tim_segi.psf
writ psf card unit 10
* psf of 5tim, chains a and b
* read from pdb files with segi info added
* resi option for read card pdb
*

open writ card unit 10 name 5tim_segi.crd
writ coor card unit 10
* coor of 5tim, chains a and b
* read from pdb files with segi info added
* resi option for read card pdb
*

stop

No further editing of pdb files --- no use of APPEnd option

In this and the following variant, the pdb files are split into one per chain, but not modified further (aside from renaming of atom/residue names). The following is an excerpt for the chain A pdb file:

ATOM      1  N   SER A   2      29.700  57.568  22.037  1.00 42.07           N  
ATOM      2  CA  SER A   2      29.413  56.872  20.802  1.00 44.61           C  
ATOM      3  C   SER A   2      30.278  55.645  20.373  1.00 34.14           C
...
ATOM   1882  NE2 GLN A 250      17.318  46.262  23.955  1.00 50.11           N  
ATOM   1883  OT2 GLN A 250      21.621  49.945  21.757  1.00 26.66           O  
TER    1884      GLN A 250

The following CHARMM script is used:

* Read 5TIM sequ and coords from two pdb files, one for each chain
* aside from renaming of atom/residue names
*    HIS -> HSE/HSD
*    CD1 ILE -> CD  ILE
*    C-terminal O -> OT1
*    OXT -> OT2
* the ATOM entries are unmodified from the pdb file
* Variant 1: no use of APPEnd option
*

! Read topology and parameter files
open read card unit 10 name top_all27_prot_na.rtf
read  rtf card unit 10
clos unit 10

open read card unit 20 name par_all27_prot_na.prm
read para card unit 20
clos unit 20

open read card unit 10 name 5tim_a_nosegi.pdb
read sequence pdb unit 10
clos unit 10
generate A setup warn first NTER last CTER

open read card unit 10 name 5tim_b_nosegi.pdb
read sequence pdb unit 10
clos unit 10
generate B setup warn first NTER last CTER

 
open read card unit 10 name 5tim_a_nosegi.pdb
read coor pdb  unit 10 offs -1
clos unit 10


open read card unit 10 name 5tim_b_nosegi.pdb
read coor pdb  unit 10 offs 248
clos unit 10 

! Print heavy atoms with unknown coordinates
coor print sele ( .not. INIT ) .and. ( .not. hydrogen ) end 

! Add any missing coordinates
ic param
ic build
prnlev 0
hbuild sele hydrogen end
prnlev 5 

energy

open writ card unit 10 name 5tim_nosegi_noappe.psf
writ psf card unit 10
* psf of 5tim, chains a and b
* read from pdb files without segi info added
* only offs option for read card pdb
*

open writ card unit 10 name 5tim_nosegi_noappe.crd
writ coor card unit 10
* coor of 5tim, chains a and b
* read from pdb files without segi info added
* only offs option for read card pdb
*

stop

Explanations: If no segment identifiers were added to the pdb file, the RESId option to read coor pdb must not be used. However, in this case care is required concerning the atom/residue numbers. The residue numbers in the pdb file (resSeq field in columns 23-26) correspond to what is referred to as RESID in CHARMM (not to be confused with the RESId option!). The RESID is treated as a label that is used to match coordinates with the corresponding atom in the PSF. Internally, residue numbering in CHARMM always starts with 1; these CHARMM internal residue numbers are referred to as RESNO. So, when CHARMM starts reading the above pdb file for chain A, it looks for RESID 1 to match its counting that starts with RESNO 1. Given that the PDB file contains only coordinates for residues 2 and higher, there is a problem. The dilemma can be solved by specifying the OFFSet (the "mismatch) between RESID (pdb resSeq field) and absolute RESNO according to RESID + OFFSET = RESNO. Thus, as is done above, we have to specify OFFSet -1 for chain A. Things get more complicated for chain B. In the pdb file, the resSeq field (CHARMM RESID) starts again at 2, but CHARMM's RESNO counter expects 250 (249 residues were read from the chain A pdb file). Thus, for the chain B pdb file we have to set the OFFSet to 248, since RESID + OFFSET = 2 + 248 = 250 = RESNO, as was done above.

Comment: The RESId option to read coor pdb used before avoids the need for having to specify / worry about OFFSets. However, its use requires that the segment identifier was tagged onto all ATOM lines in the pdb file, starting at column 73. So, conclusion up to this point: Either do more editing of the pdb file(s), or figure out your OFFSets. As usual, there is no free lunch :-(

No further editing of pdb files --- using the APPEnd option

As we have seen in the above example, the need for OFFSets can arise for two reasons. First, residue numbers in the pdb file do not start with 1, and one has to compensate for the RESID/RESNO mismatch. Second, residue numbers in the pdb file are reset to 1 (or some low number) because a new chain starts. To avoid ever more complicated OFFSet calculations for the second, third etc. chain, one can use the APPEnd option in combination with OFFSet. Given the same pdb files as in the previous section, the following alternative script can be used to read coordinates:

* Read 5TIM sequ and coords from two pdb files, one for each chain
* aside from renaming of atom/residue names
*    HIS -> HSE/HSD
*    CD1 ILE -> CD  ILE
*    C-terminal O -> OT1
*    OXT -> OT2
* the ATOM entries are unmodified from the pdb file
* Variant 2: use the APPEnd option to simplify reading of second chain
*

! Read topology and parameter files
open read card unit 10 name top_all27_prot_na.rtf
read  rtf card unit 10
clos unit 10

open read card unit 20 name par_all27_prot_na.prm
read para card unit 20
clos unit 20

open read card unit 10 name 5tim_a_nosegi.pdb
read sequence pdb unit 10
clos unit 10
generate A setup warn first NTER last CTER

open read card unit 10 name 5tim_b_nosegi.pdb
read sequence pdb unit 10
clos unit 10
generate B setup warn first NTER last CTER

 
open read card unit 10 name 5tim_a_nosegi.pdb
read coor pdb  unit 10 offs -1
clos unit 10


open read card unit 10 name 5tim_b_nosegi.pdb
read coor pdb  unit 10 offs -1 appe
clos unit 10 

! Print heavy atoms with unknown coordinates
coor print sele ( .not. INIT ) .and. ( .not. hydrogen ) end 

! Add any missing coordinates
ic param
ic build
prnlev 0
hbuild sele hydrogen end
prnlev 5 

energy

open writ card unit 10 name 5tim_nosegi_appe.psf
writ psf card unit 10
* psf of 5tim, chains a and b
* read from pdb files without segi info added
* offs and appe option for read card pdb
*

open writ card unit 10 name 5tim_nosegi_appe.crd
writ coor card unit 10
* coor of 5tim, chains a and b
* read from pdb files without segi info added
* offs and appe option for read card pdb
*

stop

The only difference is the read coor line for reading the coordinates of chain B,

read coor pdb  unit 10 offs -1 appe

We still have to give an OFFSet, since residue numbering starts at 2 rather than at 1 as expected by CHARMM. But the APPEnd options avoids the need take the total number of residues read so far into account.

Observation: All three methods lead to identical psf and crd files, and the energy of the final ENERgy command is the same in all three cases.

The charmm-gui approach

The website basically uses the approach described in Appending SEGId labels to the pdb files --- the RESId option. The server splits the original pdb file into one file per peptide chain and does most of the recommended renaming. Instead of using the chain labels A, B, C etc. as segment identifiers, charmm-gui labels the chains as PROA, PROB etc.

At present, there are two minor shortcomings:

  • The C-terminal O, OXT are not renamed to OT1, OT2
  • The read sequ command generated by the website in the step1_pdbreader.inp script does not work with c35 and higher. The solution here is to change the line
read sequence coor pdb unit 10 ! incorrect command

to the correct form

read sequence pdb unit 10
  • The HBUIld command without atom selection (as issued in the charmm-gui generated input script) fails in recent versions of CHARMM (it used to be that HBUIld defaulted to all hydrogens if no atom selection was given, in recent versions of CHARMM an empty atom selection results in no hydrogen coordinates being rebuilt, which is probably not what is intended ...)

If one takes these things into account, the input script generated by charmm-gui combined with the processed pdb files are equivalent to what was described in Appending SEGId labels to the pdb files --- the RESId option.

The charmming approach

http://www.charmming.org does not tag ATOM lines with segment identifiers. Nevertheless, it does heavy editing of the pdb files to avoid the need for OFFSets or APPEnd options. This leads to psf and crd files that in some aspects are substantially different from those created by all other methods.

charmming.org splits the pdb file into individual files for each chain, doing some renaming, as all other approaches. However, rather than just splitting the files (with minor edits), these individual pdb files are newly generated files. In particular, residue numbering (resSeq field) starts at 1.

Note: Presently, the pdb files generated/synthesized by charmming.org do not work with c35 and higher since the resSeq/RESID field is shifted by one to the right. Due to the changes to the pdb reader introduced in c35, CHARMM c35 and higher fails on these files. The following discussion is based on tests with pdb files, originally generated by charmming.org, where this shift-by-one error was corrected by hand.

Further, the chains / segments are first processed indiviually, i.e., one script generates a psf for chain A and reads the coordinates from the respective pdb file. A second script generates a psf for chain B, reads the respective coordinates etc. A final script merges these individual PSFs (read psf card append) and CRD files (read coor card unit <no> resid sele segid <name> end).

Pro: This stepwise approach avoids the need to calculate OFFSets.

Con: In the final psf and crd file, the information about the original residue numbers (as given in the original pdb file) is lost. Below is an excerpt of the final crd for 5TIM as generated by charmming.org.


* COORDS
*  DATE:     5/ 6/ 9      9:16:35      CREATED BY USER: schedd
*
      7606  EXT
         1         1  SER       N              29.7000000000       57.5680000000       22.0370000000  A         1              42.0700000000
         2         1  SER       HT1            30.2836703731       56.9627427877       22.6490397067  A         1               0.0000000000
         3         1  SER       HT2            28.8085909209       57.7948188333       22.5223274221  A         1               0.0000000000
         4         1  SER       HT3            30.2138344340       58.4476737588       21.8278400405  A         1               0.0000000000
         5         1  SER       CA             29.4130000000       56.8720000000       20.8020000000  A         1              44.6100000000
...
      3802       249  GLN       OT1            22.7680000000       48.6500000000       23.1050000000  A         249            24.5100000000
      3803       249  GLN       OT2            21.2645700000       49.8809700000       22.1098400000  A         249             0.0000000000
      3804       250  SER       N              42.6530000000      -11.0700000000      -13.9910000000  B         1              57.2100000000
      3805       250  SER       HT1            43.6593965349      -10.8109568604      -13.9502241934  B         1               0.0000000000
      3806       250  SER       HT2            42.5413548392      -11.9401105398      -14.5496077718  B         1               0.0000000000
      3807       250  SER       HT3            42.2967287738      -11.2293951382      -13.0270167847  B         1               0.0000000000
...

Contrast this with the final crd file from any other approach discussed (ignore the extended format used by charmming.org compared to the default (old) CHARMM format, which only affects spacing)

* COOR OF 5TIM, CHAINS A AND B
* READ FROM PDB FILES WITH SEGI INFO ADDED
* RESI OPTION FOR READ CARD PDB
*  DATE:     5/19/ 9     14:23:22      CREATED BY USER: stefan
*
 7606
    1    1 SER  N     29.70000  57.56800  22.03700 A    2     42.07000
    2    1 SER  HT1   30.28367  56.96274  22.64904 A    2      0.00000
...
 3802  249 GLN  OT1   22.76800  48.65000  23.10500 A    250   24.51000
 3803  249 GLN  OT2   21.62100  49.94500  21.75700 A    250   26.66000
 3804  250 SER  N     42.65300 -11.07000 -13.99100 B    2     57.21000
 3805  250 SER  HT1   43.65940 -10.81096 -13.95022 B    2      0.00000
 3806  250 SER  HT2   42.54135 -11.94011 -14.54961 B    2      0.00000
 3807  250 SER  HT3   42.29673 -11.22940 -13.02702 B    2      0.00000
...

Note how the second column from the right in the second excerpt still retains the original residue numbering, which is lost in the charmming.org approach. Thus, one always has to keep the OFFSet of -1 in mind when comparing to papers / work based on the pdb numbering. An OFFSet of -1 is relatively easy, but this may become obnoxious for larger OFFSets.

Further things to note about charmming.org: There are a number of other things to keep in mind that may lead to different initial coordinates from charmming.org compared to charmm-gui.org or hand made approaches.

  • scripts start with BOMLEV -2, which may hide (serious) problems
  • Some pdb files generated by the website (new-*-a.pdb, new-*-b.pdb etc.) do not contain a TER/END statement. This may result in the last ATOM line not being parsed correctly!
  • The line HBUIld SELE ALL END does not work as expected in recent versions of CHARMM; no hydrogens are rebuilt. Use HBUIld SELE HYDRogen END instead.
  • As pointed out, setting up a multidomain protein such as 5TIM, is handled by multiple CHARMM scripts in the charmming.org framework. This involves writing coordinates to files and re-reading these in later job, and hydrogen coordinates are rebuilt multiple times (initially, separately for each domain, then again for the full system). Apparently because of round-off error, this leads to slightly different final coordinates compared to the various one-step approaches described above, resulting in slightly different energies. I admit being too lazy to find out where the differences arise exactly.

To sum up: When using charmming.org to set up 5TIM, one ends up with a slightly different total energy of the final system (before any minimizations), and the atom numbering is different. To illustrate the latter point, see the output of a

print coor sele atom a 5 * end

command. Here the output from the charmm-5tim-*-append.inp, after all coordinates have been rebuilt but before the switch to extended I/O format

CHARMM>    print coor sele atom a 5 * end
SELRPN>     14 atoms have been selected out of   7606
NOTE: A SELECTED SUBSET OF ATOMS WILL BE USED


         COORDINATE FILE MODULE
TITLE>  * APPEND THE PDBS
TITLE>  *
  14
  67    5 PRO  N     27.31200  47.92600  15.64400 A    5     15.26000
  68    5 PRO  CD    26.11000  48.43600  16.43300 A    5     12.74000
  69    5 PRO  HD1   25.29615  48.72034  15.73302 A    5      0.00000
  70    5 PRO  HD2   26.40839  49.31426  17.04357 A    5      0.00000
  71    5 PRO  CA    27.82100  46.70200  16.24000 A    5     13.54000
  72    5 PRO  HA    28.86587  46.82241  16.47268 A    5      0.00000
  73    5 PRO  CB    26.93400  46.41800  17.47100 A    5     12.90000
  74    5 PRO  HB1   27.47257  46.68967  18.40398 A    5      0.00000
  75    5 PRO  HB2   26.66662  45.34044  17.51197 A    5      0.00000
  76    5 PRO  CG    25.69100  47.25900  17.32200 A    5     13.65000
  77    5 PRO  HG1   24.87637  46.67505  16.84295 A    5      0.00000
  78    5 PRO  HG2   25.33950  47.61986  18.31213 A    5      0.00000
  79    5 PRO  C     27.74200  45.51900  15.31300 A    5     11.93000
  80    5 PRO  O     27.08100  45.57900  14.32700 A    5     15.67000

And below the output of the same command using the "RESId" approach described above (identical results are obtained with all other methods!)

CHARMM>    print coor sele atom a 5 * end
SELRPN>     17 atoms have been selected out of   7606
NOTE: A SELECTED SUBSET OF ATOMS WILL BE USED


         COORDINATE FILE MODULE
TITLE>  * COOR OF 5TIM, CHAINS A AND B
TITLE>  * READ FROM PDB FILES WITH SEGI INFO ADDED
TITLE>  * RESI OPTION FOR READ CARD PDB
TITLE>  *  DATE:     5/25/ 9     13:27: 1      CREATED BY USER: stefan
TITLE>  *
  17
  50    4 GLN  N     28.57200  50.91400  14.28200 A    5      9.35000
  51    4 GLN  HN    28.89098  51.30031  13.42046 A    5      0.00000
  52    4 GLN  CA    27.57300  49.87900  14.16400 A    5      9.56000
  53    4 GLN  HA    26.70410  50.20714  14.71452 A    5      0.00000
  54    4 GLN  CB    27.18200  49.57600  12.72500 A    5      9.46000
  55    4 GLN  HB1   28.10053  49.28293  12.17493 A    5      0.00000
  56    4 GLN  HB2   26.77476  50.50663  12.27750 A    5      0.00000
  57    4 GLN  CG    26.12500  48.44700  12.53200 A    5      9.96000
  58    4 GLN  HG1   25.11633  48.82477  12.80432 A    5      0.00000
  59    4 GLN  HG2   26.37670  47.58052  13.18019 A    5      0.00000
  60    4 GLN  CD    26.12200  48.00200  11.06900 A    5     17.65000
  61    4 GLN  OE1   27.18700  47.75800  10.48000 A    5     15.75000
  62    4 GLN  NE2   24.97600  48.09800  10.40400 A    5     13.42000
  63    4 GLN  HE21  24.15998  48.45028  10.86228 A    5      0.00000
  64    4 GLN  HE22  24.92935  47.81799   9.44514 A    5      0.00000
  65    4 GLN  C     28.09700  48.57800  14.77000 A    5     14.24000
  66    4 GLN  O     29.20500  48.17000  14.45500 A    5     12.25000

As you see, different residues are selected.

A comment on Rick's READ UNIV approach

Much of the above analysis was inspired by Rick's posting showing his method to "get" pdb coordinates "into CHARMM". His approach differs from those discussed so far by not using the PDB option to read coor, but using the UNIV option to read coor instead. Applying his example to 5TIM leads to the following script (which reads pdb files without segment identifier added to the ATOM lines!):

* Read 5TIM sequ and coords from two pdb files, one for each chain
* aside from renaming of atom/residue names
*    HIS -> HSE/HSD
*    CD1 ILE -> CD  ILE
*    C-terminal O -> OT1
*    OXT -> OT2
* the ATOM entries are unmodified from the pdb file
* Variant 3: READ COOR UNIV instead of READ COOR PDB
* adapted from Rick Venable's post at
* http://www.charmm.org/ubbthreads/ubbthreads.php?ubb=showflat&Number=3748&site_id=1#import
* 

! Read topology and parameter files
open read card unit 10 name top_all27_prot_na.rtf
read  rtf card unit 10
clos unit 10

open read card unit 20 name par_all27_prot_na.prm
read para card unit 20
clos unit 20

! define file format
read univ
* modified pdb
*
pdb
segid 22 1
ires 23 4
w 61 6
end

! protein chain a, segid a; get sequ and coords from file
open unit 3 read card name 5tim_a_nosegi.pdb
read sequ pdb unit 3
rewind unit 3
gener A setup warn 
read coor univ unit 3 offset -1
close unit 3

! protein chain b, segid b; get sequ and coords from file
open unit 3 read card name 5tim_b_nosegi.pdb
read sequ pdb unit 3
rewind unit 3
gener B setup warn
read coor univ unit 3 offset 248
close unit 3

! Print heavy atoms with unknown coordinates
coor print sele ( .not. INIT ) .and. ( .not. hydrogen ) end 

! Add any missing coordinates
ic param
ic build
prnlev 0
hbuild sele hydrogen end
prnlev 5

energy 

open writ card unit 10 name 5tim_rmv.psf
writ psf card unit 10
* psf of 5tim, chains a and b
* read from pdb files without segi info added
* RMV's approach (read coor univ)
*

open writ card unit 10 name 5tim_rmv.crd
writ coor card unit 10
* coor of 5tim, chains a and b
* read from pdb files without segi info added
* RMV's approach (read coor univ)
*

stop


The energy of the system is identical to our approaches, and identical psf and crd files are obtained. Note that while we always read the full sequence information first, generated both segments and then read coordinates for the two segments, the script, following Rick's example, sets up segment (chain) A, reads its coordinates, and only then sets up segment (chain) B and reads the latter coordinates. This is a cosmetic difference; the above script could be changed to our order of doing things with identical results.

Some conclusions / questions

  • charmming.org really is different from all other methods.
  • The rationale of Rick's READ COOR UNIV approach is unclear to me, it seems to me that the same can be achieved with READ COOR PDB. Hint's/explanation appreciated. At present, I only see the advantage that the approach could be easily adapted to different file formats as well, e.g., PDB like file formats where some columns are shifted/swapped etc.

Further problems

  • What about alternate/multiple coordinates ???
  • Suppose I have a chain with gaps (coordinates missing for some residues) in the middle. Is there a semi-automated process that could handle this ...? (would READ SEQU PDB even notice ...?