Question about BRICS decomposition #3

vincrichard · 2024-04-30T16:44:06Z

Hello, thank you for revising the code in the previous issue.
I would like to ask another question about the way the BRICS decomposition is implemented.
I tried on some molecules and I notice that you are separating the rings as independent fragments, and keeping a lot of single atoms as fragments. This can lead to the masking being present only on some various atoms not connected with each other. Was this intended ? Is there any reason for not using the BRICS decomposition as it is ?

I would be happy if you could share your though process for the brics_decomp method implementation.

Best regards

smiles used: COc1cc(C)c(-c2nc3ccccc3c(=O)n2N=Cc2ccccc2OCC(=O)OC(C)C)cc1C(C)C

MoAMa brics decomposition:

atom_groups = brics_decomp(mol)[0]
MolsToGridImage(mols=[mol] * len(atom_groups), subImgSize=(400, 400), highlightAtomLists=atom_groups)

def remove_dummy_atom(piece: Chem.Mol):
    piece = Chem.rdchem.RWMol(piece)
    try:
        while True:
            dummy_atom_id = [atom.GetSymbol() for atom in piece.GetAtoms()].index("*")
            piece.RemoveAtom(dummy_atom_id)
    except ValueError:
        pass
    return Chem.Mol(piece)

fragmented=BRICS.BreakBRICSBonds(mol)
pieces=Chem.GetMolFrags(fragmented, asMols=True)
atom_groups = []
for piece in pieces:
    atom_groups.append(mol.GetSubstructMatch(remove_dummy_atom(piece)))
MolsToGridImage(mols=[mol] * len(pieces), subImgSize=(400, 400), highlightAtomLists=atom_groups)

The text was updated successfully, but these errors were encountered:

einae-nd · 2024-04-30T17:50:32Z

Thank you for the question! The reason for our implementation of the BRICS decomposition is due to the limitations of GNN message passing. Because we utilize a 5-layer model, we must therefore make sure that all masked atoms are within a 5-hop neighborhood of the nearest unmasked node. Otherwise, the masked node will not receive useful feature knowledge from neighboring nodes/motifs. The original BRICS leaves large motifs, which causes some masked atoms to be outside of the 5-hop radius. We therefore modify BRICS to further decompose the molecule to guarantee that the majority of motifs allow for meaningful message passing when masked.

The original atom masking strategy proposed by SNAP leaves all single atoms for masking, so the presence of single atoms to mask with our masking strategy is still acceptable. The goal of our motif-aware masking strategy is to increase the message passing between inter-motif nodes, so the presence of masked pairs or masked rings increases the difficulty of the training task enough to encourage inter-motif knowledge transfer.

We are currently developing methods to perform mask autoencoding on large motifs that utilize the original BRICS method. Any updates will be reflected in our paper and this code repo as well.

I hope this answers your question!

vincrichard · 2024-05-01T15:45:19Z

Thank you for your answer, I am still unclear on some aspect, so please bear with me with this follow-up question.

While I understand the limitation of GNN, I think this check happens after the brics_decomp function present in get_motif, the check seems to be line 280, but we compute the motif line 270.

MoAMa-dev/datautils.py

Lines 270 to 282 in 77c7857

    
           motifs = get_motifs(mol) 
        
           num_atoms = data.x.size()[0] 
        
           sample_size = int(num_atoms * self.mask_rate + 1) 
        
           valid_motifs = [] 
        
           if len(motifs) != 1: 
        
               for motif in motifs: 
        
                   for atom in mol.GetAtoms(): 
        
                       if atom.GetIdx() in motif: 
        
                           if (inter_motif_proximity(motif, [atom], []) > 5): 
        
                               break 
        
                   valid_motifs.append(motif)

To me, this part of the brics_decomp method is unclear:

MoAMa-dev/chemutils.py

Lines 284 to 295 in 77c7857

    
           # select atoms at intersections as motif 
        
           for atom in mol.GetAtoms(): 
        
               if len(atom.GetNeighbors()) > 2 and not atom.IsInRing(): 
        
                   cliques.append([atom.GetIdx()]) 
        
                   for nei in atom.GetNeighbors(): 
        
                       if [nei.GetIdx(), atom.GetIdx()] in cliques: 
        
                           cliques.remove([nei.GetIdx(), atom.GetIdx()]) 
        
                           breaks.append([nei.GetIdx(), atom.GetIdx()]) 
        
                       elif [atom.GetIdx(), nei.GetIdx()] in cliques: 
        
                           cliques.remove([atom.GetIdx(), nei.GetIdx()]) 
        
                           breaks.append([atom.GetIdx(), nei.GetIdx()]) 
        
                       cliques.append([nei.GetIdx()])

This is the reason of multiple single atom motif, in some case I believe it should not happen. For example, I found the following smiles: CC(C)C(Cl)=NOC(=O)Nc1ccccn1

This is the BRICS decomposition:

But the brics_decomp method remove the first motif which is not a ring, and in this case the motif was small enough to be in the k-hop (k=5) of the non mask carbon atom.

Could you provide more insight on this ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about BRICS decomposition #3

Question about BRICS decomposition #3

vincrichard commented Apr 30, 2024

einae-nd commented Apr 30, 2024

vincrichard commented May 1, 2024

Question about BRICS decomposition #3

Question about BRICS decomposition #3

Comments

vincrichard commented Apr 30, 2024

einae-nd commented Apr 30, 2024

vincrichard commented May 1, 2024