Sb pdb
SB's thoughts on getting PDB coordinates into CHARMM
Preliminary remarks
Note: For the time being, this is not intended as tutorial material; instead, I want to compare various approaches that I have found / explored.
One glaring omission from the following compilation is the use of the MMTSB tools in this context.
Aside from "doing it myself", I have taken a reasonably detailed look at how others "do it", in particular:
- http://www.charmm-gui.org
- http://www.charmming.org
- Rick Venable's approach documented in a posting by Rick Venable in the CHARMM forum.
I am focusing on the protein part; waters, ions, ligands (i.e., HETATMs) are yet another story.
The following is tested with c35 and c36, so it takes into account the changes that were introduced starting with c35 concerning the reading of PDB files. Also, this assumes PDB files in a reasonably current format and further expects a version of CHARMM that understands the RESId option to READ COOR (read coor pdb RESId).
I consider the official documentation in io.doc relatively poor; there is at least one factual error: According to io.doc, READ SEQU PDB understands the RESId option; this is, however, not the case!
General considerations
It goes without saying that the PDB file needs to be edited before it can be used; in most cases, it will also be necessary to split the the PDB file. The following summary of what needs to be done is from Rick's posting aready mentioned:
- change HIS residue name to e.g. HSE
- create a new file for each subunit, or ANY gap in residue numbers. Considering problems encountered by students / colleagues using CHARMM that run into problems at this stage, this is the single most important point of failure. (Note: for getting the sequence information, one can use the CHAIn X option to extract just chain X from a multi-domain pdb file. However, no equivalent option exists when reading coordinates from PDB files!)
- for multi-model NMR files, create new files for each model
- examine subunit terminii for residue numbering, blocking groups
Optional:
- change atom name CD1 to CD for residue ILE
- for standard COO- termini, change atom names for O, OXT to OT1, OT2
- other edits may be needed for e.g. acetyl on N-terminus, amide C-terminus etc.
Possibly the most important editing decision concerns the question whether one includes segment identifiers into the PDB files that are being fed to CHARMM (and reads them as read coor pdb RESId) or not. The former approach is, e.g., used by http://www.charmm-gui.org, the latter by http://www.charmming.org. Depending on the approach taken, one has to watch out for different pitfalls.
Last not least, if you are "doing it yourself", search in the original PDB files for info about disulfide bridges (SSBOND) since you will have to add these manually to the PSF.
An example: 5TIM
To illustrate (some of) the issues/problems, we consider the steps necessary to prepare the pdb file of a triosephosphate isomerase pdb entry 5TIM for work with CHARMM. Below is an excerpt from the ATOM section of the original pdb file:
ATOM 1 N SER A 2 29.700 57.568 22.037 1.00 42.07 N ATOM 2 CA SER A 2 29.413 56.872 20.802 1.00 44.61 C ATOM 3 C SER A 2 30.278 55.645 20.373 1.00 34.14 C ... ATOM 1882 NE2 GLN A 250 17.318 46.262 23.955 1.00 50.11 N ATOM 1883 OXT GLN A 250 21.621 49.945 21.757 1.00 26.66 O TER 1884 GLN A 250 ATOM 1885 N SER B 2 42.653 -11.070 -13.991 1.00 57.21 N ATOM 1886 CA SER B 2 41.911 -10.010 -14.605 1.00 57.09 C ATOM 1887 C SER B 2 41.928 -8.733 -13.761 1.00 51.98 C ... ATOM 3766 NE2 GLN B 250 36.162 3.112 -24.388 1.00 71.04 N ATOM 3767 OXT GLN B 250 40.401 -0.920 -21.933 1.00 49.09 O TER 3768 GLN B 250 HETATM 3769 S SO4 A 256 43.311 33.943 -8.945 1.00 67.44 S ...
As one sees, the pdb file contains HETATM entries, but we are going to ignore those in our discussion. In the following we assume that all renaming has been taken care of (e.g., OXT should be renamed to OT2!); also, there are no disulfide bridges in the structure. Triosephosphate isomerase is a dimer, so there are two chains, A and B, in the pdb file. What makes the system a bit more interesting is the fact that there are no coordinates for the respective first residue in the two chains, i.e., coordinates are available staring with residue 2 of chains A and B, respectively.
We are going to explore several ways of getting the 5TIM pdb file "into CHARMM", followed by an analysis how http://www.charmm-gui.org and http://www.charmming.org handle such a case. The common first step is to divide the ATOM section of the pdb file into two pdb files, one for each chain. (Very) recent CHARMM versions can extract sequence information per chain from a single pdb file, but the split files are (still) required for reading the coordinates, so we are going to use separate pdb files both for reading the sequence information, as well as for reading the coordinates.
Appending SEGId labels to the pdb files --- the RESId option
In this case we tag each line of the pdb files with the segment identifier we use. Here we stick to the chain labels, A and B, but we could use arbitrary labels to identify the chains/segments. A brief excerpt of the modified pdb file for chain A:
ATOM 1 N SER A 2 29.700 57.568 22.037 1.00 42.07 A ATOM 2 CA SER A 2 29.413 56.872 20.802 1.00 44.61 A ATOM 3 C SER A 2 30.278 55.645 20.373 1.00 34.14 A ... ATOM 1882 NE2 GLN A 250 17.318 46.262 23.955 1.00 50.11 A ATOM 1883 OT2 GLN A 250 21.621 49.945 21.757 1.00 26.66 A TER 1884 GLN A 250
Note the segment identifier placed starting in column 73; also, OXT has been renamed to OT2. The pdb files modified in this manner were processed by the following script (note the RESId option in the read coor pbd resi line!)
* Read 5TIM sequ and coords from two pdb files, one for each chain, * that have SEGId at the end of the atom entries * ! Read topology and parameter files open read card unit 10 name top_all27_prot_na.rtf read rtf card unit 10 clos unit 10 open read card unit 20 name par_all27_prot_na.prm read para card unit 20 clos unit 20 open read card unit 10 name 5tim_a_segi.pdb read sequence pdb unit 10 clos unit 10 generate A setup warn first NTER last CTER open read card unit 10 name 5tim_b_segi.pdb read sequence pdb unit 10 clos unit 10 generate B setup warn first NTER last CTER open read card unit 10 name 5tim_a_segi.pdb read coor pdb unit 10 resi clos unit 10 open read card unit 10 name 5tim_b_segi.pdb read coor pdb unit 10 resi clos unit 10 ! Print heavy atoms with unknown coordinates coor print sele ( .not. INIT ) .and. ( .not. hydrogen ) end ! Add any missing coordinates ic param ic build prnlev 0 hbuild sele hydrogen end prnlev 5 energy open writ card unit 10 name 5tim_segi.psf writ psf card unit 10 * psf of 5tim, chains a and b * read from pdb files with segi info added * resi option for read card pdb * open writ card unit 10 name 5tim_segi.crd writ coor card unit 10 * coor of 5tim, chains a and b * read from pdb files with segi info added * resi option for read card pdb * stop
No further editing of pdb files --- no use of APPEnd option
In this and the following variant, the pdb files are split into one per chain, but not modified further (aside from renaming of atom/residue names). The following is an excerpt for the chain A pdb file:
ATOM 1 N SER A 2 29.700 57.568 22.037 1.00 42.07 N ATOM 2 CA SER A 2 29.413 56.872 20.802 1.00 44.61 C ATOM 3 C SER A 2 30.278 55.645 20.373 1.00 34.14 C ... ATOM 1882 NE2 GLN A 250 17.318 46.262 23.955 1.00 50.11 N ATOM 1883 OT2 GLN A 250 21.621 49.945 21.757 1.00 26.66 O TER 1884 GLN A 250
The following CHARMM script is used:
* Read 5TIM sequ and coords from two pdb files, one for each chain * aside from renaming of atom/residue names * HIS -> HSE/HSD * CD1 ILE -> CD ILE * C-terminal O -> OT1 * OXT -> OT2 * the ATOM entries are unmodified from the pdb file * Variant 1: no use of APPEnd option * ! Read topology and parameter files open read card unit 10 name top_all27_prot_na.rtf read rtf card unit 10 clos unit 10 open read card unit 20 name par_all27_prot_na.prm read para card unit 20 clos unit 20 open read card unit 10 name 5tim_a_nosegi.pdb read sequence pdb unit 10 clos unit 10 generate A setup warn first NTER last CTER open read card unit 10 name 5tim_b_nosegi.pdb read sequence pdb unit 10 clos unit 10 generate B setup warn first NTER last CTER open read card unit 10 name 5tim_a_nosegi.pdb read coor pdb unit 10 offs -1 clos unit 10 open read card unit 10 name 5tim_b_nosegi.pdb read coor pdb unit 10 offs 248 clos unit 10 ! Print heavy atoms with unknown coordinates coor print sele ( .not. INIT ) .and. ( .not. hydrogen ) end ! Add any missing coordinates ic param ic build prnlev 0 hbuild sele hydrogen end prnlev 5 energy open writ card unit 10 name 5tim_nosegi_noappe.psf writ psf card unit 10 * psf of 5tim, chains a and b * read from pdb files without segi info added * only offs option for read card pdb * open writ card unit 10 name 5tim_nosegi_noappe.crd writ coor card unit 10 * coor of 5tim, chains a and b * read from pdb files without segi info added * only offs option for read card pdb * stop
Explanations: If no segment identifiers were added to the pdb file, the RESId option to read coor pdb must not be used. However, in this case care is required concerning the atom/residue numbers. The residue numbers in the pdb file (resSeq field in columns 23-26) correspond to what is referred to as RESID in CHARMM (not to be confused with the RESId option!). The RESID is treated as a label that is used to match coordinates with the corresponding atom in the PSF. Internally, residue numbering in CHARMM always starts with 1; these CHARMM internal residue numbers are referred to as RESNO. So, when CHARMM starts reading the above pdb file for chain A, it looks for RESID 1 to match its counting that starts with RESNO 1. Given that the PDB file contains only coordinates for residues 2 and higher, there is a problem. The dilemma can be solved by specifying the OFFSet (the "mismatch) between RESID (pdb resSeq field) and absolute RESNO according to RESID + OFFSET = RESNO. Thus, as is done above, we have to specify OFFSet -1 for chain A. Things get more complicated for chain B. In the pdb file, the resSeq field (CHARMM RESID) starts again at 2, but CHARMM's RESNO counter expects 250 (249 residues were read from the chain A pdb file). Thus, for the chain B pdb file we have to set the OFFSet to 248, since RESID + OFFSET = 2 + 248 = 250 = RESNO, as was done above.
Comment: The RESId option to read coor pdb used before avoids the need for having to specify / worry about OFFSets. However, its use requires that the segment identifier was tagged onto all ATOM lines in the pdb file, starting at column 73. So, conclusion up to this point: Either do more editing of the pdb file(s), or figure out your OFFSets. As usual, there is no free lunch :-(
No further editing of pdb files --- using the APPEnd option
As we have seen in the above example, the need for OFFSets can arise for two reasons. First, residue numbers in the pdb file do not start with 1, and one has to compensate for the RESID/RESNO mismatch. Second, residue numbers in the pdb file are reset to 1 (or some low number) because a new chain starts. To avoid ever more complicated OFFSet calculations for the second, third etc. chain, one can use the APPEnd option in combination with OFFSet. Given the same pdb files as in the previous section, the following alternative script can be used to read coordinates:
* Read 5TIM sequ and coords from two pdb files, one for each chain * aside from renaming of atom/residue names * HIS -> HSE/HSD * CD1 ILE -> CD ILE * C-terminal O -> OT1 * OXT -> OT2 * the ATOM entries are unmodified from the pdb file * Variant 2: use the APPEnd option to simplify reading of second chain * ! Read topology and parameter files open read card unit 10 name top_all27_prot_na.rtf read rtf card unit 10 clos unit 10 open read card unit 20 name par_all27_prot_na.prm read para card unit 20 clos unit 20 open read card unit 10 name 5tim_a_nosegi.pdb read sequence pdb unit 10 clos unit 10 generate A setup warn first NTER last CTER open read card unit 10 name 5tim_b_nosegi.pdb read sequence pdb unit 10 clos unit 10 generate B setup warn first NTER last CTER open read card unit 10 name 5tim_a_nosegi.pdb read coor pdb unit 10 offs -1 clos unit 10 open read card unit 10 name 5tim_b_nosegi.pdb read coor pdb unit 10 offs -1 appe clos unit 10 ! Print heavy atoms with unknown coordinates coor print sele ( .not. INIT ) .and. ( .not. hydrogen ) end ! Add any missing coordinates ic param ic build prnlev 0 hbuild sele hydrogen end prnlev 5 energy open writ card unit 10 name 5tim_nosegi_appe.psf writ psf card unit 10 * psf of 5tim, chains a and b * read from pdb files without segi info added * offs and appe option for read card pdb * open writ card unit 10 name 5tim_nosegi_appe.crd writ coor card unit 10 * coor of 5tim, chains a and b * read from pdb files without segi info added * offs and appe option for read card pdb * stop
The only difference is the read coor line for reading the coordinates of chain B,
read coor pdb unit 10 offs -1 appe
We still have to give an OFFSet, since residue numbering starts at 2 rather than at 1 as expected by CHARMM. But the APPEnd options avoids the need take the total number of residues read so far into account.
Observation: All three methods lead to identical psf and crd files, and the energy of the final ENERgy command is the same in all three cases.
The charmm-gui approach
The website basically uses the approach described in Appending SEGId labels to the pdb files --- the RESId option. The server splits the original pdb file into one file per peptide chain and does most of the recommended renaming. Instead of using the chain labels A, B, C etc. as segment identifiers, charmm-gui labels the chains as PROA, PROB etc.
At present, there are two minor shortcomings:
- The C-terminal O, OXT are not renamed to OT1, OT2
- The read sequ command generated by the website in the step1_pdbreader.inp script does not work with c35 and higher. The solution here is to change the line
read sequence coor pdb unit 10 ! incorrect command
to the correct form
read sequence pdb unit 10
- The HBUIld command without atom selection (as issued in the charmm-gui generated input script) fails in recent versions of CHARMM (it used to be that HBUIld defaulted to all hydrogens if no atom selection was given, in recent versions of CHARMM an empty atom selection results in no hydrogen coordinates being rebuilt, which is probably not what is intended ...)
If one takes these things into account, the input script generated by charmm-gui combined with the processed pdb files are equivalent to what was described in Appending SEGId labels to the pdb files --- the RESId option.
The charmming approach
http://www.charmming.org does not tag ATOM lines with segment identifiers. Nevertheless, it does heavy editing of the pdb files to avoid the need for OFFSets or APPEnd options. This leads to psf and crd files that in some aspects are substantially different from those created by all other methods.
charmming.org splits the pdb file into individual files for each chain, doing some renaming, as all other approaches. However, rather than just splitting the files (with minor edits), these individual pdb files are newly generated files. In particular, residue numbering (resSeq field) starts at 1.
Note: Presently, the pdb files generated/synthesized by charmming.org do not work with c35 and higher since the resSeq/RESID field is shifted by one to the right. Due to the changes to the pdb reader introduced in c35, CHARMM c35 and higher fails on these files. The following discussion is based on tests with pdb files, originally generated by charmming.org, where this shift-by-one error was corrected by hand.
Further, the chains / segments are first processed indiviually, i.e., one script generates a psf for chain A and reads the coordinates from the respective pdb file. A second script generates a psf for chain B, reads the respective coordinates etc. A final script merges these individual PSFs (read psf card append) and CRD files (read coor card unit <no> resid sele segid <name> end).
Pro: This stepwise approach avoids the need to calculate OFFSets.
Con: In the final psf and crd file, the information about the original residue numbers (as given in the original pdb file) is lost. Below is an excerpt of the final crd for 5TIM as generated by charmming.org.
* COORDS * DATE: 5/ 6/ 9 9:16:35 CREATED BY USER: schedd * 7606 EXT 1 1 SER N 29.7000000000 57.5680000000 22.0370000000 A 1 42.0700000000 2 1 SER HT1 30.2836703731 56.9627427877 22.6490397067 A 1 0.0000000000 3 1 SER HT2 28.8085909209 57.7948188333 22.5223274221 A 1 0.0000000000 4 1 SER HT3 30.2138344340 58.4476737588 21.8278400405 A 1 0.0000000000 5 1 SER CA 29.4130000000 56.8720000000 20.8020000000 A 1 44.6100000000 ... 3802 249 GLN OT1 22.7680000000 48.6500000000 23.1050000000 A 249 24.5100000000 3803 249 GLN OT2 21.2645700000 49.8809700000 22.1098400000 A 249 0.0000000000 3804 250 SER N 42.6530000000 -11.0700000000 -13.9910000000 B 1 57.2100000000 3805 250 SER HT1 43.6593965349 -10.8109568604 -13.9502241934 B 1 0.0000000000 3806 250 SER HT2 42.5413548392 -11.9401105398 -14.5496077718 B 1 0.0000000000 3807 250 SER HT3 42.2967287738 -11.2293951382 -13.0270167847 B 1 0.0000000000 ...
Contrast this with the final crd file from any other approach discussed (ignore the extended format used by charmming.org compared to the default (old) CHARMM format, which only affects spacing)
* COOR OF 5TIM, CHAINS A AND B * READ FROM PDB FILES WITH SEGI INFO ADDED * RESI OPTION FOR READ CARD PDB * DATE: 5/19/ 9 14:23:22 CREATED BY USER: stefan * 7606 1 1 SER N 29.70000 57.56800 22.03700 A 2 42.07000 2 1 SER HT1 30.28367 56.96274 22.64904 A 2 0.00000 ... 3802 249 GLN OT1 22.76800 48.65000 23.10500 A 250 24.51000 3803 249 GLN OT2 21.62100 49.94500 21.75700 A 250 26.66000 3804 250 SER N 42.65300 -11.07000 -13.99100 B 2 57.21000 3805 250 SER HT1 43.65940 -10.81096 -13.95022 B 2 0.00000 3806 250 SER HT2 42.54135 -11.94011 -14.54961 B 2 0.00000 3807 250 SER HT3 42.29673 -11.22940 -13.02702 B 2 0.00000 ...
Note how the second column from the right in the second excerpt still retains the original residue numbering, which is lost in the charmming.org approach. Thus, one always has to keep the OFFSet of -1 in mind when comparing to papers / work based on the pdb numbering. An OFFSet of -1 is relatively easy, but this may become obnoxious for larger OFFSets.
Further things to note about charmming.org: There are a number of other things to keep in mind that may lead to different initial coordinates from charmming.org compared to charmm-gui.org or hand made approaches.
- scripts start with BOMLEV -2, which may hide (serious) problems
- Some pdb files generated by the website (new-*-a.pdb, new-*-b.pdb etc.) do not contain a TER/END statement. This may result in the last ATOM line not being parsed correctly!
- The line HBUIld SELE ALL END does not work as expected in recent versions of CHARMM; no hydrogens are rebuilt. Use HBUIld SELE HYDRogen END instead.
- As pointed out, setting up a multidomain protein such as 5TIM, is handled by multiple CHARMM scripts in the charmming.org framework. This involves writing coordinates to files and re-reading these in later job, and hydrogen coordinates are rebuilt multiple times (initially, separately for each domain, then again for the full system). Apparently because of round-off error, this leads to slightly different final coordinates compared to the various one-step approaches described above, resulting in slightly different energies. I admit being too lazy to find out where the differences arise exactly.
To sum up: When using charmming.org to set up 5TIM, one ends up with a slightly different total energy of the final system (before any minimizations), and the atom numbering is different. To illustrate the latter point, see the output of a
print coor sele atom a 5 * end
command. Here the output from the charmm-5tim-*-append.inp, after all coordinates have been rebuilt but before the switch to extended I/O format
CHARMM> print coor sele atom a 5 * end SELRPN> 14 atoms have been selected out of 7606 NOTE: A SELECTED SUBSET OF ATOMS WILL BE USED COORDINATE FILE MODULE TITLE> * APPEND THE PDBS TITLE> * 14 67 5 PRO N 27.31200 47.92600 15.64400 A 5 15.26000 68 5 PRO CD 26.11000 48.43600 16.43300 A 5 12.74000 69 5 PRO HD1 25.29615 48.72034 15.73302 A 5 0.00000 70 5 PRO HD2 26.40839 49.31426 17.04357 A 5 0.00000 71 5 PRO CA 27.82100 46.70200 16.24000 A 5 13.54000 72 5 PRO HA 28.86587 46.82241 16.47268 A 5 0.00000 73 5 PRO CB 26.93400 46.41800 17.47100 A 5 12.90000 74 5 PRO HB1 27.47257 46.68967 18.40398 A 5 0.00000 75 5 PRO HB2 26.66662 45.34044 17.51197 A 5 0.00000 76 5 PRO CG 25.69100 47.25900 17.32200 A 5 13.65000 77 5 PRO HG1 24.87637 46.67505 16.84295 A 5 0.00000 78 5 PRO HG2 25.33950 47.61986 18.31213 A 5 0.00000 79 5 PRO C 27.74200 45.51900 15.31300 A 5 11.93000 80 5 PRO O 27.08100 45.57900 14.32700 A 5 15.67000
And below the output of the same command using the "RESId" approach described above (identical results are obtained with all other methods!)
CHARMM> print coor sele atom a 5 * end SELRPN> 17 atoms have been selected out of 7606 NOTE: A SELECTED SUBSET OF ATOMS WILL BE USED COORDINATE FILE MODULE TITLE> * COOR OF 5TIM, CHAINS A AND B TITLE> * READ FROM PDB FILES WITH SEGI INFO ADDED TITLE> * RESI OPTION FOR READ CARD PDB TITLE> * DATE: 5/25/ 9 13:27: 1 CREATED BY USER: stefan TITLE> * 17 50 4 GLN N 28.57200 50.91400 14.28200 A 5 9.35000 51 4 GLN HN 28.89098 51.30031 13.42046 A 5 0.00000 52 4 GLN CA 27.57300 49.87900 14.16400 A 5 9.56000 53 4 GLN HA 26.70410 50.20714 14.71452 A 5 0.00000 54 4 GLN CB 27.18200 49.57600 12.72500 A 5 9.46000 55 4 GLN HB1 28.10053 49.28293 12.17493 A 5 0.00000 56 4 GLN HB2 26.77476 50.50663 12.27750 A 5 0.00000 57 4 GLN CG 26.12500 48.44700 12.53200 A 5 9.96000 58 4 GLN HG1 25.11633 48.82477 12.80432 A 5 0.00000 59 4 GLN HG2 26.37670 47.58052 13.18019 A 5 0.00000 60 4 GLN CD 26.12200 48.00200 11.06900 A 5 17.65000 61 4 GLN OE1 27.18700 47.75800 10.48000 A 5 15.75000 62 4 GLN NE2 24.97600 48.09800 10.40400 A 5 13.42000 63 4 GLN HE21 24.15998 48.45028 10.86228 A 5 0.00000 64 4 GLN HE22 24.92935 47.81799 9.44514 A 5 0.00000 65 4 GLN C 28.09700 48.57800 14.77000 A 5 14.24000 66 4 GLN O 29.20500 48.17000 14.45500 A 5 12.25000
As you see, different residues are selected.
A comment on Rick's READ UNIV approach
Much of the above analysis was inspired by Rick's posting showing his method to "get" pdb coordinates "into CHARMM". His approach differs from those discussed so far by not using the PDB option to read coor, but using the UNIV option to read coor instead. Applying his example to 5TIM leads to the following script (which reads pdb files without segment identifier added to the ATOM lines!):
* Read 5TIM sequ and coords from two pdb files, one for each chain * aside from renaming of atom/residue names * HIS -> HSE/HSD * CD1 ILE -> CD ILE * C-terminal O -> OT1 * OXT -> OT2 * the ATOM entries are unmodified from the pdb file * Variant 3: READ COOR UNIV instead of READ COOR PDB * adapted from Rick Venable's post at * http://www.charmm.org/ubbthreads/ubbthreads.php?ubb=showflat&Number=3748&site_id=1#import * ! Read topology and parameter files open read card unit 10 name top_all27_prot_na.rtf read rtf card unit 10 clos unit 10 open read card unit 20 name par_all27_prot_na.prm read para card unit 20 clos unit 20 ! define file format read univ * modified pdb * pdb segid 22 1 ires 23 4 w 61 6 end ! protein chain a, segid a; get sequ and coords from file open unit 3 read card name 5tim_a_nosegi.pdb read sequ pdb unit 3 rewind unit 3 gener A setup warn read coor univ unit 3 offset -1 close unit 3 ! protein chain b, segid b; get sequ and coords from file open unit 3 read card name 5tim_b_nosegi.pdb read sequ pdb unit 3 rewind unit 3 gener B setup warn read coor univ unit 3 offset 248 close unit 3 ! Print heavy atoms with unknown coordinates coor print sele ( .not. INIT ) .and. ( .not. hydrogen ) end ! Add any missing coordinates ic param ic build prnlev 0 hbuild sele hydrogen end prnlev 5 energy open writ card unit 10 name 5tim_rmv.psf writ psf card unit 10 * psf of 5tim, chains a and b * read from pdb files without segi info added * RMV's approach (read coor univ) * open writ card unit 10 name 5tim_rmv.crd writ coor card unit 10 * coor of 5tim, chains a and b * read from pdb files without segi info added * RMV's approach (read coor univ) * stop
The energy of the system is identical to our approaches, and identical psf and crd files are obtained. Note that while we always read the full sequence information first, generated both segments and then read coordinates for the two segments, the script, following Rick's example, sets up segment (chain) A, reads its coordinates, and only then sets up segment (chain) B and reads the latter coordinates. This is a cosmetic difference; the above script could be changed to our order of doing things with identical results.
Some conclusions / questions
- charmming.org really is different from all other methods.
- The rationale of Rick's READ COOR UNIV approach is unclear to me, it seems to me that the same can be achieved with READ COOR PDB. Hint's/explanation appreciated. At present, I only see the advantage that the approach could be easily adapted to different file formats as well, e.g., PDB like file formats where some columns are shifted/swapped etc.
Further problems
- What about alternate/multiple coordinates ???
- Suppose I have a chain with gaps (coordinates missing for some residues) in the middle. Is there a semi-automated process that could handle this ...? (would READ SEQU PDB even notice ...?