-
Notifications
You must be signed in to change notification settings - Fork 11
/
Copy pathmanual.txt
514 lines (384 loc) · 35.7 KB
/
manual.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
NOTE: ATTRACT 2.0 is still UNDER DEVELOPMENT
This manual is still in the DRAFT stage and
will undergo a serious overhaul before the final release
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
TODO: --cdie has distances capped at 50 A!
rr1 = 1.0d0/sqrt(r2)-1.0/50.0
r2a = rr2 - (1.0/50.0)*(1.0/50.0)
--only-flex
TODO: grid restraints (sym.cpp)
TODO: 1/(rmsd+1) weighting of MC ensemble switching
NEW: gravity 4, ghost, --fast option for randsearch
NEW: cdie, potshape, all-atom, alphabet, proxlim=0
NEW: --grid <index 1> <index 2>
NEW: --ensemble option: --ensemble <ligand> <PDB list file>
NEW: 2 extra floats in the parameter file: start and end of the switch
function; if they are zero, then no switching
NEW --shm switch for make-grid
NEW: make-grid omp version:
timed on 1AVXA
normal: 199 seconds
omp: 58 seconds (4 threads)
omp-torque: 104 seconds (4 threads)
omp-torque + shm: 94 seconds (4 threads)
NEW: --sym
Basic usage:
attract.inp is gone. It's all command line options now. Starting structure generation is de-coupled from the docking. To perform a standard docking protocol, first run the systsearch (or randsearch) tool to write a DOF file, run ATTRACT on that DOF file, run ATTRACT again on the output of the first ATTRACT, etc. You can even connect them with Unix pipes if you want.
Just type "attract" without any arguments to get a basic idea of its command line usage.
Keep in mind that the receptor is not kept fixed by default!
The output structures are printed on stdout. Diagnostic messages are (should be) printed on stderr. Pipe them as needed.
Coding issue: In a terminal there is no problem, but piped messages are not always printed in the order you expect. This is because C and Fortran have their independent printing mechanisms.
********************************************************************************
A short word on docking mechanics
********************************************************************************
The new ATTRACT version can be used in "classical" mode, i.e. without grids or anything.
The code concerning these calculations (nonbon8.f, minfor.f) has been largely untouched, which has two advantages. First, everyone currently has their private ATTRACT versions. It should be easy to port the changes to the new version.
Second, the new ATTRACT will give exactly the same docking results as the old version, but within certain limits.
Docking is inherently chaotic: minimal perturbances in coordinates will give rise to macroscopic differences in docking results in a fraction of the cases. So, keep the following in mind:
- A single invocation of ATTRACT will give exactly the same results as a one-stage attract.inp, if the CPU architecture is the same, and if you fix the receptor.
- Multiple ATTRACT invocations will not give the same results as a multi-stage attract.inp, because all DOFs are written to disk in between docking stages, causing all DOFs and coordinates to be rounded with .3 - .8 digits precision.
- The old multi-body ATTRACT version will differ in the same way. There is an additional source of rounding differences here, caused by the fact that the new ATTRACT calculates the energies and forces as the sum of pairwise receptor-ligand interactions, with the ligand always rotated into the coordinate frame of the receptor.
********************************************************************************
The DOF file format (structures.dat)
********************************************************************************
This file format is both the input and the output format of ATTRACT. The file format is as follows:
- The first section contains zero or more comment lines, marked by ##. Currently, in the ATTRACT output, this contains the command line parameters that where received by ATTRACT.
- The next section starts with "#pivot auto" or a manual pivot definition. This specifies the rotation pivot of each molecule. In case of "#pivot auto", it is automatically determined as the average of all atoms. Manual specification goes as follows:
#pivot 1 12.345 6.789 10.123
#pivot 2 22.345 -6.789 -10.123
and so on for each molecule
- Then comes the section with the centering convention. It contains two lines:
#centered receptor: false
#centered ligand: false
or
#centered receptor: true
#centered ligands: true
or a combination of those.
If a molecule is centered, all translation coordinates are of the pivot center relative to the world origin. If it is not centered, all translation coordinates are relative to the original PDB coordinates.
Example: (0,0,0) for both receptor and ligand. When they are not centered, this represents the original position in the PDBs. When they are centered, it means that the pivot point of both receptor and ligand are on top of each other in the global origin.
- The final section contains the actual DOF values of each structure. For each structure, there comes first a line #X, where X is the number of the structure. (i.e. #1, #2, #3). Then there comes a comment section of zero or more lines of per-structure comments, starting with ##. Finally, there comes a DOF line for each molecule (i.e. 2 DOF lines for a standard two-body docking, N lines for a N-body docking, 1<=N<=100)
A line contains at least 6 numbers: three for the rotation (phi-ssi-rot, in radians), three for the translation (see above, in A) and then the values for the normal modes. The number of normal modes can be different for each molecule (0-20 modes per molecule).
Note: per-structure comments starting with ### in the input file are treated as special by ATTRACT in the sense that they are printed out again unchanged in the output file. All other comments are discarded.
DOF files generated by systsearch or randsearch conform to the format above.
********************************************************************************
The PDB file format
********************************************************************************
You supply either two PDB files or one PDB file to ATTRACT. If it is one PDB, the first line must contain the number of molecules in the file, and then the molecules separated by TER (like the old multi-body version).
The PDB itself is in standard ATTRACT reduced format. They are generated with reduce as usual. The atom type field can be 0-99.
- Residue type translation
Residue type 0 is the dummy atom type
By default, residue type 99 is translated to 0. Residue type 32 is also translated to 0 but only if fewer than 32 atom types have been specified in the parameter file.
You can specify a custom translation table for residue types. All forcefield calculations on atoms will be after this translation table has been applied, except for the EM module, which receives the untranslated atom types.
A translation table consists of a text file containing two integer columns per line (1-99).
TODO: specifying a custom translation table is not yet implemented
- Residue type address space
I propose the following: residue type 1-31 are ATTRACT protein reduced atom types, 32-74 are reserved for non-protein reduced atom types, residue 75-99 are reserved for full-atom/heavy atom types.
NEW: generic reduce (allatom) that can also deal with RNA, full atom, ...
********************************************************************************
Atom type parameter file format
********************************************************************************
All parameter files must have one additional line with two integers P and X at the beginning of the file.
P is the potential shape, which must be 8 or 12 (8 is standard ATTRACT, 12 is standard Lennard-Jones). X is the number of atomtypes.
The rest of the file must contain parameters for X atomtypes in standard ATTRACT format, as usual.
Testing issue: Piotr and I got a preliminary version working for RNA, but the new one still needs to be tested
NEW: full atom force field
********************************************************************************
Command line options
********************************************************************************
All command line options start with --
All options must be specified after structures.dat, the parameter file and the PDBs.
Coding issue: see parse_options.cpp for complete list of options, and ministate.cpp for the defaults.
At some point, we may want to auto-generate this from some configuration data model.
********************************************************************************
Sampling options
********************************************************************************
The following list contains the command line options for sampling
--fix-receptor (default: off!): keep the receptor protein's translation and orientation fixed
--only-rot : use only rotational degrees of freedom
--only-trans
--only-flex: use only imodes as degrees of freedom
--vmax <value> (default: 100)
The maximum number of minimization steps, from attract.inp
--mc Use Monte Carlo (MC) sampling instead of energy minimization
--mcmax <value> : used in pure MC mode to indicate the total number of MC steps.
--rcut <value> (default: 1500)
The distance squared pairlist cutoff from attract.inp
Orthogonality issue: Grids do not use an rcut, it should give similar results to an infinite rcut
TODO/Orthogonality issue: The new version is not yet binary compatible with Piotr's version, because it uses rcut1 in pairgen.f while Piotr uses rcut.
********************************************************************************
Tools
********************************************************************************
Some of them are in bin/ but most of them are in tools/, depending on whether they use the ATTRACT Fortran/C++ codebase or not.
For the exact command line syntax: just invoke the program without any arguments, this usually will tell you the exact command line syntax.
- collect
Collect takes now as input a DOF file + one or more PDBs.
There is a difference in the result whether you supply one or two PDBs.
If you supply one PDB, it must be in reduced format, with the first line containing the number of structures (see the PDB section above). The structures are printed one after another.
If you supply two or more PDBs, they can be in any format you like (reduced, full-atom, ...). The structures will be separated with MODEL/ENDMDL lines.
You can also provide a --modes <mode file> option, to read in and apply normal
modes in the same way as ATTRACT does. See the Normal modes section for more details.
Collect has some sanity checks that you provide the correct number of structures and normal modes.
- center
A simple program that reads in a PDB, computes its center of mass and subtracts it from all coordinates. The coordinates are printed on stdout, the center of mass at stderr, pipe it as needed.
- fix_receptor
This will read in a DOF file and will fix the receptor (i.e. setting all its non-mode DOFs to zero). This is done by rotating all ligands (all molecules beyond the first) into the coordinate frame of the receptor.
- deredundant
This will read in a DOF file and remove all duplicate structures.
Since it does not read in actual PDB files, this is a bit empirical: the structure is considered as a sphere with a radius of gyration of 30 A.
Structures with a (ligand) RMSD of more than 0.05 A are considered different.
It will add a comment to the written structures to indicate where it came from.
Coding issue: the code is a bit outdated, but it should still work
Coding issue: there is also a Python version in tools/
NOTE: It assumes that the receptor is fixed
NOTE: For best results, sort the structures first by energy
NOTE: Normal modes are not taken into account
NOTE: This is not a clustering algorithm, use a different program for that
- sort.py
This will sort the structures by energy
It will add a comment to the written structures to indicate where it came from.
- split.py
Takes as input a DOF file, a pattern and a number X. It will split the DOF file into X files pattern-1, pattern-2, ..., pattern-X. This is highly useful for parallellization.
It will add a comment to the written structures to indicate where it came from.
- join.py
Reverse of split.py. Takes as input a pattern. It will look for all files starting with pattern and join them back into a single DOF file, written at stdout.
It utilizes the comment line written by split.py to deduce where every structure should go. No other assumptions are made, you are free to scramble the structures and files in any way you want. If there are duplicate or missing structures, it will complain.
- top
top <DOF file> <number of structures X>. Prints out the first X structures of a DOF file.
- fill-energies.py
This will take a DOF file and another file that contains "Energy: " lines. It will add the energy lines as comments into the DOF file and print it out, replacing any existing energy commments.
The other file can be another DOF file but it can also be the output of attract --score. In this manner, you can re-score your docking results without re-minimization.
- fill.py
This will take a DOF file and another DOF file and add all comments (e.g. energy, seed) of the second file to the first. This is useful in combination with tools that don't preserve comments such as fix_receptor.
- monte.py / metro.py
Monte.py takes as input a DOF file and applies a random perturbation to the DOFs: rotation, translation and normal modes.
Metro.py takes as input two DOF files. They must have the same number of structures. The second DOF file is considered "new". For every structure, it selects the old or the new structure according to a Metropolis criterion. The selected structures are printed out.
Together, you can use them to do a Monte-Carlo + minimization protocol. Taking initial docking solutions, you run monte, then ATTRACT, then metro on the results + the initial docking solutions. You can repeat this as many times as you wish.
NEW: the monte.py DOF steps, command line
TODO: metro.py currently accepts only new structures that have a lower energy, i.e. the temperature is 0 K. I will make it parameterizable from command line.
Coding issue/TODO: monte.py uses the system time as seed. Perhaps it would be better to use the structure's seed, so that the result becomes deterministic (on the same CPU architecture at least).
- randsearch.py
Specify the number of bodies, the number of desired structures, and optionally a seed number. It will generate starting structures of N bodies with random orientation and placement. The pivot points are guaranteed to be equidistant on a unit sphere of 75 A radius (i.e. 150 A from each other for 2 bodies): for more than two bodies, a steepest-descent energy minimization is performed to achieve this.
NOTE: This is the default starting structure sampling protocol in HADDOCK
NOTE: This is currently the only generic way to get starting structures for multi-body docking.
Coding issue: For more than two bodies, the energy minimization can be slow. You are recommended to install the psyco module or use pypy instead of vanilla Python. I will port it to C one of these days.
TODO: The randomization process is not completely random for rotation, I will implement a procedure that better samples the unit sphere.
TODO: organize the tools properly, adapt the Makefile.
********************************************************************************
Old tools
********************************************************************************
Those I didn't touch, ask Martin how they work :-)
translate #Find starting points arround receptor, to place the center of ligand
rotam
modes
modesca
compare
viewe
rmsca
********************************************************************************
Normal modes
********************************************************************************
--modes <modes file>
There is a single modes file that describes all normal modes for all structures
Any molecule can have any number of modes (maximum number has to be set in bin/max.h)
See bin/read_hm.cpp for a description of the file format
As before, normal modes experience a force towards zero, a fourth order function. Force constants are given in the file as before.
The DOF file format however enables a starting structure with non-zero starting values for the mode DOF.
Note: normal modes are for small scale deformations, up to an angstrom or two. For larger deformations, I suggest to generate multiple coordinate models with wide-apart mode coordinates, assign each DOF structure to the closest coordinate model, and subtract the DOF coordinate of that model. After doing the docking separately with each model and its associated structures, you can add the model DOF coordinate again.
Coding issue/TODO: Perhaps it would be better to switch to per-molecule normal mode files, specified in the same way as grids (e.g. --modes 1 hm1.dat --modes 2 hm2.dat ...)
*******************************************************************************
iATTRACT
*******************************************************************************
iATTRACT works with multi-body docking and ensemble docking or proteins and nucleic acids.
It employs the atomistic OPLS force field (allatom/allatom.par). Structures must be converted
to OPLS format with tool allatom/aareduce.py using the --dumppatch option.
The protocols/iattract.py script requires the following parameters in addition to a normal ATTRACT docking
--infinite : includes all residues up to a distance of 50 A in the pairlist calculation of the nonbonded forces
--name <namingscheme>
Naming scheme for the iattract output files. Caution: if iattract files with this naming scheme are already present
these will be used instead of generating them on-the-fly. Can be also be used as an option
in bin/collect, bin/irmsd.py, bin/lrmsd.py tools.
--icut <value>
Cutoff for on-the-fly detection of flexible interface residues. Default: 3.0
Use 5.0 for peptide-protein refinement.
Internal option for attract binary, index modes:
--imodes <modes file>
The index modes contain a list of atom numbers for the atoms that will be treated as fully flexible.
Different proteins are separated by -1.
In general, every structure has its own index file, the naming scheme is usually flexm-iattract1.dat etc.
Orthogonality issue: Index modes can be used together with normal modes to include flexibility on different scales.
TODO: test using global modes and index modes together
********************************************************************************
Restraints
********************************************************************************
--rest <restraints file>
Restraints can be intermolecular but also intramolecular (preserving secondary structure). It can do most of the things that HADDOCK can do.
You can repeat the "--rest" option in case of multiple restraint files, the restraints will be taken together.
See restraints.txt for more information on the file format.
- The air.py tool
This is a convenience tool to generate restraints files from HADDOCK-style active and passive residues. It takes eight files, run air.py without any argument to get its precise command line syntax. The file formats are as follows:
- residue list: a list of residues, one integer per line
- mapping file: a list of mappings, two columns per line. The first column is the residue number (integer). The second column is the residue identifier (usually integer but not necessarily) in the PDB. Active and passive residues in the residue list are first mapped and then converted to a selection of atoms.
NOTE: a mapping file is generated and printed out by reduce.
Example:
active residue list: 20
mapping file: 20 19B
PDB: atom 321-326 are Ser 19B
=> active residue 20 is translated into the selection 321-326
You can give a ninth parameter to indicate the chance that a restraint is not used. Default is 0.5 (HADDOCK default). Unlike HADDOCK, you can adjust this parameter for every restraint separately.
- The tbl2attract.py tool
Convenience tool for converting tbl restraints files to ATTRACT restraint format. Run without options in command line to get help.
********************************************************************************
Gravity
********************************************************************************
--gravity <gravity mode>
Defines attractive forces on pivot points. It depends on the gravity mode:
1: defines an attractive force on every pivot point towards the global origin
2: defines an attractive force between the receptor pivot and each ligand pivot
3: defines an attractive force between all pivots
--rstk <force constant> (default 0.0015)
The gravity potential is a harmonic restraint function. With this parameter, you can set the force constant.
********************************************************************************
Multi-copy
********************************************************************************
Multi-copy sidechains/loops should work like before.
Testing issue: Multi-copy sidechains/loops have not been tested by me.
Coding issue: Multi-copy sidechains/loops have their private energy evaluation function (in select.f)
Orthogonality issue: Multi copy sidechains + multi-body docking. There will be a different copy for each interaction with another molecule.
Orthogonality issue: Multi copy sidechains + restrains will give some inaccuracies (as far as the restraints are concerned, the sidechains are independent).
Orthogonality issue: Multi copy sidechains + normal modes may give some weird results.
********************************************************************************
Grids
********************************************************************************
The new ATTRACT can use a combination of a neighbour and a potential grid to get a significant speedup.
The potential grid energy is stored at the voxels, and the actual potential grid energy of an atom is computed by linear interpolation between eight voxels.
The neighbour energy of an atom is computed at run time, using the neighbour list stored at the nearest voxel.
The following things should be kept in mind:
- The neighbour grid and potential grid are in one file
- Every molecule has its own grid file
- Grids, especially torque grids, can take a lot of memory!
- For two body docking, only the receptor needs to have a grid if it is fixed. If you don't fix the receptor, you need grids for both molecules, or a torque grid for one of the molecules.
- ATTRACT will always fall back to non-grid docking if grids are not available. For two-body docking, if you forget to fix the receptor, this means slow docking!
- For multi-body docking, everything is viewed as a sum of pairwise interactions. Every pairwise interaction will be grid-accellerated if two grids or one torque grid are available for that pair, or if one of them is a fixed receptor for which a grid is available. Else, it will fall back to "classical" mode for that pair.
NOTE: The following section will contain NOTE sections like this, containing instructions to change advanced grid settings. These advanced settings are hard-coded in the grid-generating programs. You can change the settings and recompile. However, the *ATTRACT* program does not expect particular grid settings: you don't need to recompile ATTRACT or anything to deal with a grid with a different voxel size or other settings (ATTRACT *will* recompile by itself, but only because the code to compute a grid and to read in a grid are in the same file, and ATTRACT uses that file).
1. Calculation of grids
A. First, you need to calculate an interior map. Voxels marked as interior will have their energy set to infinite and their gradients to zero.
Use the tool calc_interior (in bin/):
calc_interior file.pdb file-interior.vol
For best results, you should use a NON-reduced PDB as input.
The interior map will be in Situs EM format (.vol). You can visualize it as follows:
bin/vol2ccp4 file-interior.vol file-interior.ccp4
pymol file-interior.ccp4
NOTE: At this point, your voxel size (default: 0.9 A) will have been determined: subsequent tools will check that the voxel size is the same as in the .vol file. To change the voxel size, adapt "gridspacing" in makegrid.h and recompile everything.
calc_interior creates a 10.8 A box around your protein (the edge is at least 10.8 A from any atom). It then sets all voxels within 7 A of any atom as "interior" and then shrinks the interior inward by 23 voxels. All of these settings can also be adapted in makegrid.h
B. The next step is the actual calculation of the grid. Use make-grid for this.
Invoke make-grid without arguments to get the command line syntax.
You must supply a reduced PDB file and the interior map computed above. The last argument must be the name for the grid file. Two additional arguments describe the distance thresholds:
- The plateau distance is the border between the potential grid energy and the neighbour energy. The neighbour energy at and beyond the plateau distance is zero. The neighbour energy under the plateau distance is the standard energy minus the energy at the plateau distance. The potential grid energy at and beyond the plateau distance is equal to the standard energy. The potential grid energy under the plateau distance is equal to the energy at the plateau distance.
The same goes for the gradients.
- The neighbour distance is the distance threshold for which neighbour atoms are stored on the grid. At runtime, if an atom is nearest to this voxel, the distance of each stored atom is computed. If it is above the plateau distance, it is ignored, else the energy is computed.
NOTE: There is actually a hidden option "rigid" in pairenergy.f, which is off by default. If it is on, it will ignore all neighbour atoms that were between the plateau distance and the neighbour distance when the grid was built. It will not compute their distance and just assume that they are beyond the plateau distance. When there are no normal modes and the voxel size would be very small, this would be a reasonable assumption, leading to some speed gain. However, with 0.9 A I found that these atoms often are within plateau distance and that their contribution is non-neglegible.
NOTE: There is an additional run-time approximation to gain speed by pre-tabulation. See the pre-tabulation section of the grid for more details.
NOTE: To calculate the potential grid, an extremely large cutoff of 50 A is used. This setting is called "distcutoff" in makegrid.h.
NOTE: make-grid uses the voxel box generated by calc_interior as the voxels for which to compute the potentials and neighbours. It will extend this box with 32 voxels of *double* size (1.8 A) for which *only* a potential grid is computed. This value of 32 is hard-coded in makegrid.h.
Coding issue: Grids have their private energy evaluation function (in nonbon.h). This function is used both at grid generation (for the potential grid energy) and at runtime (for the neighbour energy). In other words, you cannot just modify nonbon8.f and expect grids to work properly with your new energy: you have to modify nonbon.h as well.
Orthogonality issue: Piotr says that the very long cutoff used by grids doesn't work well with RNA. Perhaps a vdW switching function can be implemented.
2. Torque grids
There are torque versions for several binaries: attract-torque and make-grid-torque (and shm-grid-torque; see memory management below). These versions generate/use torque grids instead of normal grids. Torque grids have exactly the same neighbour grid, but their potential grid differs: in addition to the energy and gradient they have a 3x3 torque matrix stored, used to calculate the torques on the receptor. This makes torque grids three times as large, but also ensures that a single evaluation computes both receptor and ligand forces.
No special options need to be specified for torque grids.
Orthogonality issue: Torque grids will give (small) inaccuracies if the receptor has normal modes. This is because normal mode forces will be computed using the neighbour forces only, and potential grid forces will be neglected. I expect this to be a generally safe approximation, but testing is needed to be sure!
3. Grid usage options
Grids are specified for a molecule as follows:
--grid <molecule> <grid file>
molecule 1 is the receptor, 2 is the first ligand, 3 the second ligand, etc.
--gridmode <1 or 2>
gridmode is 1 by default for attract, meaning that two grids are necessary to grid-accelerate an interaction pair, and that two evaluations are made: one to calculate the energy and the ligand forces, and a second evaluation, with ligand and receptor reversed, to calculate the receptor forces. If the receptor is fixed, only one grid is necessary and only one evaluation is made.
gridmode is 2 by default for attract-torque, meaning that only one grid is necessary to grid-accelerate an interaction pair, and that only one evaluation is made to compute both receptor and ligand forces.
For attract, if you change the gridmode to 2, the speed will double but there will be considerable inaccuracies in the receptor forces (only the neighbour part will be there).
3. Memory management
There is the possibility to load a large portion of a grid into shared memory (shm; /dev/shm on Ubuntu). This has great memory advantages with parallellization, when four or more attract processes can load the same grid(s) and still use not much more memory than a single attract process.
The shm-grid tool takes a grid and creates two shared memory segments (one for the potentials and one for the neighbours), generating a grid header file that holds a reference to these segments. A grid header file can be loaded with --grid as if it where a full grid file, but it is much much smaller. The header file will be valid only as long as the shared memory segments exist: these segments are be deleted at reboot or when the shm-clean tool is run, making the header file invalid. This will result in an error message.
4. Pre-tabulation (prox)
There is a final speed optimization to compute energies at medium distances. A small (2000 elements per A**2) table converts distance**2 into the nearest distance**-2. Then, for every distance**-2 and every atom type-atom type pair, a large prox table contains the neighbour energy (and gradient), i.e. the standard energy minus the standard energy at plateau distance.
This pre-tabulation is only used for distances shorter than the plateaudistance and longer than some distance limit proxlim. Distances shorter than that are quite rare (< 6 A: less than 10 % in fully docked structures) and require too much memory for accurate tabulation.
The prox limits can be changed with the following command line options:
--proxlim <distance> (default 36). The distance**2 above which prox pre-tabulation is used.
--proxmax <distance> (default 200) The maximum distance**2 for prox pre-tabulation. Must be at least as large as the largest plateau-distance**2 of any grid (this is validated)
--proxmaxtype <number of types> (default 31). The number of atom types for which a prox table must be generated (atom type 1 up to proxmaxtype)
Prox tables can be quite big. They are also loaded in shared memory; they are
automatically deleted when all processes that use them are finished. The prox settings are coded into the segment name. Starting an additional attract process will re-use the prox settings of the other process. however,if you use different prox settings, a new prox table segment will be created.
In case of a crash, you can run shm-clean and it will remove the prox files as well; however, if your prox settings are non-standard, this will not happen and you will have to delete the file in /dev/shm yourself.
Testing issue: it should be extensively tested to what extent prox tables give a speed increase, and if it is really worth it, or that the memory is better used e.g. for torque grids.
5. Grid alphabets
When building a grid, you can supply an alphabet of atom types. The grid will only be built for those atom types. When you use the grid for docking, it will be checked that the partners only contain those atom types.
NEW: you cannot supply a custom alphabet yet
TODO: checking against the alphabets is not yet implemented
********************************************************************************
Pure MC mode
********************************************************************************
In this mode, no energy minimization is performed. Instead, moves are made and accepted/rejected according to a Metropolis Monte Carlo sampling algorithm.
--mc
Enables pure MC.
--mctemp <temperature in RT> (default: 3.5)
Adjusts the Metropolis temperature.
--mcscalerot <scaling in radians> (default: 0.05)
--mcscalecenter <scaling in A> (default: 0.1)
--mcscalemode <scaling in mode A> (default: 3)
Adjusts the step size of the rotation, translation and normal modes, respectively.
--mcensprob <probability> (default: 0.05)
Adjusts the probability to switch between ensemble copies in an MC step
Orthogonality issue: this is fully compatible with grids, normal modes, restraints and all other features.
Orthogonality issue: mode coordinates are not in A but such that the sum of squares of all displacements is 1 A. Therefore, you probably want large steps for modes and increase them for larger proteins.
Orthogonality issue: also in pure MC you can use vmax to control the maximum number of steps.
Testing issue: computation looks fine but proper parameters need to be investigated.
*******************************************************************************
Scoring and trajectory modes
********************************************************************************
--score
In this mode, no minimization is performed, but the energies and gradients of every structure is printed.
Orthogonality issue: this mode is not compatible with pure MC (it would be meaningless anyway)
--traj
In this mode, all input structures beyond the first are ignored. For the first structure, the structure is printed in DOF format after each minimization step. Use this together with collect to generate a trajectory movie of your minimization.
Orthogonality issue: this mode works also with pure MC, in which case the structure is printed after each accepted move.
*******************************************************************************
EM grids
*******************************************************************************
--em <em.inp file>
TODO: I will document this further once it is completely ready :-)
*******************************************************************************
SAXS
*******************************************************************************
Most SAXS related stuff can be found in tools/saxs
Check out the examples/attractsaxs folder to find out how ATTRACT-SAXS works.
********************************************************************************
Benchmarking tools
********************************************************************************
NEW: lrmsd program
NEW: irmsd program
NEW: fnat program
********************************************************************************
Modified amino acids
********************************************************************************
ATTRACT now supports docking with a limited number of modified amino acids during
coarse-grained docking and atomistic refinement.
Note: for docking with modified amino acids all polar hydrogen atoms must be present
in the input PDB (PDB2PQR cannot deal with them). Use AddH in Chimera or another tool like CNS
to add hydrogens and select the option "PDB contains all polar hydrogens" in the GUI.
Also note that modified amino acids have to be labeled as ATOM so assure that HETATM
entries are changed to ATOM in the input PDB file. Otherwise, these residues will be ignored
or treated as non-covalently bound cofactors...
The following modified amino acids are supported:
HYP hydroxyproline
SEP phosphoserine
TPO phosphothreonine
TYP phosphotyrosine
TYS sulfotyrosine
NEP phosphonohistidine
CSP phosphocysteine
ALY acetyllysine
MLZ monomethyllysine
MLY dimethyllysine
M3L trimethyllysine
TODO: add naming convention to the web-interface
TODO: add hydrogen building by CNS
TODO: make a phosphate atom type in the future?