Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about BRICS decomposition #3

Open
vincrichard opened this issue Apr 30, 2024 · 2 comments
Open

Question about BRICS decomposition #3

vincrichard opened this issue Apr 30, 2024 · 2 comments

Comments

@vincrichard
Copy link

Hello, thank you for revising the code in the previous issue.
I would like to ask another question about the way the BRICS decomposition is implemented.
I tried on some molecules and I notice that you are separating the rings as independent fragments, and keeping a lot of single atoms as fragments. This can lead to the masking being present only on some various atoms not connected with each other. Was this intended ? Is there any reason for not using the BRICS decomposition as it is ?

I would be happy if you could share your though process for the brics_decomp method implementation.

Best regards

smiles used: COc1cc(C)c(-c2nc3ccccc3c(=O)n2N=Cc2ccccc2OCC(=O)OC(C)C)cc1C(C)C

MoAMa brics decomposition:

atom_groups = brics_decomp(mol)[0]
MolsToGridImage(mols=[mol] * len(atom_groups), subImgSize=(400, 400), highlightAtomLists=atom_groups)

image

def remove_dummy_atom(piece: Chem.Mol):
    piece = Chem.rdchem.RWMol(piece)
    try:
        while True:
            dummy_atom_id = [atom.GetSymbol() for atom in piece.GetAtoms()].index("*")
            piece.RemoveAtom(dummy_atom_id)
    except ValueError:
        pass
    return Chem.Mol(piece)

fragmented=BRICS.BreakBRICSBonds(mol)
pieces=Chem.GetMolFrags(fragmented, asMols=True)
atom_groups = []
for piece in pieces:
    atom_groups.append(mol.GetSubstructMatch(remove_dummy_atom(piece)))
MolsToGridImage(mols=[mol] * len(pieces), subImgSize=(400, 400), highlightAtomLists=atom_groups)

image

@einae-nd
Copy link
Owner

Thank you for the question! The reason for our implementation of the BRICS decomposition is due to the limitations of GNN message passing. Because we utilize a 5-layer model, we must therefore make sure that all masked atoms are within a 5-hop neighborhood of the nearest unmasked node. Otherwise, the masked node will not receive useful feature knowledge from neighboring nodes/motifs. The original BRICS leaves large motifs, which causes some masked atoms to be outside of the 5-hop radius. We therefore modify BRICS to further decompose the molecule to guarantee that the majority of motifs allow for meaningful message passing when masked.

The original atom masking strategy proposed by SNAP leaves all single atoms for masking, so the presence of single atoms to mask with our masking strategy is still acceptable. The goal of our motif-aware masking strategy is to increase the message passing between inter-motif nodes, so the presence of masked pairs or masked rings increases the difficulty of the training task enough to encourage inter-motif knowledge transfer.

We are currently developing methods to perform mask autoencoding on large motifs that utilize the original BRICS method. Any updates will be reflected in our paper and this code repo as well.

I hope this answers your question!

@vincrichard
Copy link
Author

Thank you for your answer, I am still unclear on some aspect, so please bear with me with this follow-up question.

While I understand the limitation of GNN, I think this check happens after the brics_decomp function present in get_motif, the check seems to be line 280, but we compute the motif line 270.

MoAMa-dev/datautils.py

Lines 270 to 282 in 77c7857

motifs = get_motifs(mol)
num_atoms = data.x.size()[0]
sample_size = int(num_atoms * self.mask_rate + 1)
valid_motifs = []
if len(motifs) != 1:
for motif in motifs:
for atom in mol.GetAtoms():
if atom.GetIdx() in motif:
if (inter_motif_proximity(motif, [atom], []) > 5):
break
valid_motifs.append(motif)

To me, this part of the brics_decomp method is unclear:

MoAMa-dev/chemutils.py

Lines 284 to 295 in 77c7857

# select atoms at intersections as motif
for atom in mol.GetAtoms():
if len(atom.GetNeighbors()) > 2 and not atom.IsInRing():
cliques.append([atom.GetIdx()])
for nei in atom.GetNeighbors():
if [nei.GetIdx(), atom.GetIdx()] in cliques:
cliques.remove([nei.GetIdx(), atom.GetIdx()])
breaks.append([nei.GetIdx(), atom.GetIdx()])
elif [atom.GetIdx(), nei.GetIdx()] in cliques:
cliques.remove([atom.GetIdx(), nei.GetIdx()])
breaks.append([atom.GetIdx(), nei.GetIdx()])
cliques.append([nei.GetIdx()])

This is the reason of multiple single atom motif, in some case I believe it should not happen. For example, I found the following smiles: CC(C)C(Cl)=NOC(=O)Nc1ccccn1

This is the BRICS decomposition:
image

But the brics_decomp method remove the first motif which is not a ring, and in this case the motif was small enough to be in the k-hop (k=5) of the non mask carbon atom.
image

Could you provide more insight on this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants