Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files for split_train, split_valid, and split_test arguments #36

Open
TangYiChing opened this issue Oct 20, 2024 · 2 comments
Open

Files for split_train, split_valid, and split_test arguments #36

TangYiChing opened this issue Oct 20, 2024 · 2 comments

Comments

@TangYiChing
Copy link

Hi,

I am running train.py, and this line of code complains the input is a directory. complex_names_all = read_strings_from_txt(self.split_path)

The original code snippet, read_strings_from_text needs a file

def read_strings_from_txt(path):
    # every line will be one element of the returned list
    with open(path) as file:
        lines = file.readlines()
        return [line.rstrip() for line in lines]

However, the default split_train, split_valid, split_test are paths not files. see utils.parsing.py

  • split_train='./data/splits/timesplit_no_lig_overlap_train',
  • split_val='./data/splits/timesplit_no_lig_overlap_val',
  • split_test='./data/splits/timesplit_test'

Can you tell me how to generate those files?

Bellow are the errors from the train.py

$ CUDA_VISABLE_DEVICES=6 python train.py
Available GPU count: 8
Namespace(config=None, log_dir='workdir', restart_dir=None, cache_path='./data/cache', data_dir='./data/mnt/nas/research-data/luwei/dynamicbind_data/pdbbind_v11//pocket_aligned_fill_missing/', info_path='./data/d3_with_clash_info_small.csv', finetune_data_path='./results/CACHE4/finetune_data.pkl', split_train='./data/splits/timesplit_no_lig_overlap_train', split_val='./data/splits/timesplit_no_lig_overlap_val', split_test='./data/splits/timesplit_test', test_sigma_intervals=False, val_inference_freq=5, finetune_freq=None, num_finetune_complexes=500, inference_steps=20, num_inference_complexes=100, inference_earlystop_metric='valinf_rmsds_lt2', inference_earlystop_goal='max', wandb=False, project='difdock_train', run_name='', cudnn_benchmark=False, num_dataloader_workers=0, pin_memory=False, n_epochs=400, batch_size=32, sample_batch_size=16, scheduler=None, scheduler_patience=20, lr=0.001, restart_lr=None, w_decay=0.0, num_workers=1, use_ema=False, ema_rate=0.999, only_test=False, limit_complexes=0, all_atoms=False, receptor_radius=30, c_alpha_max_neighbors=10, atom_radius=5, atom_max_neighbors=8, matching_popsize=20, matching_maxiter=20, max_lig_size=None, remove_hs=False, num_conformers=1, esm_embeddings_path=None, lddt_weight=0.99, affinity_weight=0.01, tr_weight=0.33, rot_weight=0.33, tor_weight=0.33, res_tr_weight=0.33, res_rot_weight=0.33, res_chi_weight=0.33, rot_sigma_min=0.03, rot_sigma_max=1.65, tr_sigma_min=0.1, tr_sigma_max=20, tor_sigma_min=0.0314, tor_sigma_max=3.14, res_rot_sigma_min=0.01, res_rot_sigma_max=1, res_tr_sigma_min=0.01, res_tr_sigma_max=1, res_chi_sigma_min=0.01, res_chi_sigma_max=1, no_torsion=False, num_conv_layers=2, max_radius=5.0, scale_by_sigma=False, ns=16, nv=4, distance_embed_dim=32, cross_distance_embed_dim=32, no_batch_norm=False, use_second_order_repr=False, cross_max_distance=80, dynamic_max_cross=False, dropout=0.0, embedding_type='sinusoidal', sigma_embed_dim=32, embedding_scale=1000)
Processing complexes from [./data/splits/timesplit_no_lig_overlap_train] and saving it to [./data/cache_torsion/limit0_INDEXtimesplit_no_lig_overlap_train_maxLigSizeNone_H1_recRad30_recMax10]
Traceback (most recent call last):
  File "/tools/DynamicBind/train.py", line 221, in <module>
    main_function()
  File "/tools/DynamicBind/train.py", line 165, in main_function
    train_loader, val_loader = construct_loader(args, t_to_sigma)
  File "/tools/DynamicBind/datasets/pdbbind.py", line 797, in construct_loader
    train_dataset = PDBBind(info=info,cache_path=args.cache_path, split_path=args.split_train, keep_original=True,
  File "/tools/DynamicBind/datasets/pdbbind.py", line 144, in __init__
    self.preprocessing()
  File "/tools/DynamicBind/datasets/pdbbind.py", line 203, in preprocessing
    complex_names_all = read_strings_from_txt(self.split_path)
  File "/tools/DynamicBind/utils/utils.py", line 61, in read_strings_from_txt
    with open(path) as file:
IsADirectoryError: [Errno 21] Is a directory: './data/splits/timesplit_no_lig_overlap_train'
@patjiang
Copy link

patjiang commented Oct 26, 2024

Make sure you clone the workdir!
e.g.
(click this link): https://github.com/user-attachments/assets/9258e083-04a2-4e64-93cb-893012a4309b
or look at this photo:
image

Hope this helps!

@aTMRz
Copy link

aTMRz commented Dec 6, 2024

after download this i still can't find the split files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants