MXFP (Macromolecule eXtended FingerPrint) is a 217-dimensional fuzzy fingerprint that encodes atom pairs of seven pharmacophore groups, making it suitable for the comparison of large molecules and scaffold hopping. Each atom is assigned to one or more of the following pharmacophore categories:
- Heavy Atoms (HAC)
- Hydrophobic Atoms (HYD)
- Aromatic Atoms (ARO)
- H-bond Acceptors (HBA)
- H-bond Donors (HBD)
- Positively Charged Atoms (POS)
- Negatively Charged Atoms (NEG)
For each pharmacophore category, all possible atom pairs are determined and converted to a Gaussian with an 18% width centered at the atom pair topological (2D) or Euclidean (3D) distance. This Gaussian is sampled at 31 distance bins ((d_{i})) spanning from (d_{0} = 0) to (d_{30} = 317.8) bonds at exponentially increasing intervals. The Gaussian value (g_{jk}(d_{i})) for an atom pair with distance (d_{jk}) is calculated as follows:
Each of the obtained 31 Gaussian values is normalized to the sum of all 31 values, (s_{jk}), to ensure that every atom pair contributes equally to the fingerprint.
The sum of normalized Gaussian contributions from all atom pairs of one pharmacophore category at distance (d_{i}) is normalized by the total number of atoms in that category (N_{c}) raised to the power of 1.5 to reduce the sensitivity of the fingerprint to molecule size. This value is multiplied by 100 and rounded to yield the final fingerprint bit value (v_{ci}).
The 31 fingerprint bit values from each of the 7 atom categories are concatenated, yielding the 217-dimensional fingerprint vector.
You will need the following prerequisites:
There are several ways to get started using MXFP:
To obtain a local copy of the project, clone the GitHub repository:
git clone https://github.com/reymond-group/mxfp_python.git
To create a ready-to-use Conda environment, download the mxfp.yml
file from the repository and run the following commands:
conda env create -f mxfp.yml
conda activate mxfp
To install MXFP in an existing Conda environment, activate the environment and install MXFP via pip:
conda activate my_environment
pip install mxfp
In your Python script or Jupyter notebook:
- Import the required libraries (RDKit, MXFP).
- Convert SMILES to an
rdchem.Mol
object with RDKit (optional). - Initialize the
MXFPCalculator
class. - Calculate the MXFP of your molecule either from the
rdchem.Mol
object or directly from SMILES.
# Import the required libraries (RDKit, MXFP)
from rdkit import Chem
from mxfp import MXFPCalculator
# Convert SMILES to rdchem.Mol object with RDKit (optional)
polymyxin_b2_smiles = 'C[C@H]([C@H]1C(=O)NCC[C@@H](C(=O)N[C@H](C(=O)N[C@@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N1)CCN)CCN)CC(C)C)CC2=CC=CC=C2)CCN)NC(=O)[C@H](CCN)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](CCN)NC(=O)CCCCC(C)C)O'
polymyxin_b2_mol = Chem.MolFromSmiles(polymyxin_b2_smiles)
# Initialize the MXFPCalculator class
calculator = MXFPCalculator()
# Calculate MXFP of your molecule from rdchem.Mol object or directly from SMILES
polymyxin_b2_mxfp = calculator.mxfp_from_mol(polymyxin_b2_mol) # from rdchem.Mol object
polymyxin_b2_mxfp = calculator.mxfp_from_smiles(polymyxin_b2_smiles) # from SMILES
If you are working with 3D coordinates and wish to use Euclidean atom-pair distances instead of topological distances, initialize the MXFPCalculator
class with the dimensionality='3D'
parameter. Note that you will not be able to calculate MXFP from a SMILES string if you use the '3D' option, so you need to provide an rdchem.Mol
object.
# Initialize the MXFPCalculator class for 3D calculations
calculator_3d = MXFPCalculator(dimensionality='3D')
This project is licensed under the MIT License.