-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pycisTopic find_diff_features function stopping abruptly at Starting a local Ray instance #195
Comments
Hi! I actually managed to figure this one out, turns out there was a discrepancy between barcode names that was causing this issue! |
Hi @ptk1601, Here's an example on finding variable features with the updated code from @ghuls. This code is faster, uses less memory and handles the imputed accessibility differently. The code for identifying differentially accessible regions (DARs) between groups is not compatible yet with this new implementation, but will be updated soon. For now, it means that for DAR calculation, the original EXAMPLE CODE Requirements: https://github.com/aertslab/pycisTopic/tree/polars_1xx ## load object and initialise variables
from pycisTopic.cistopic_class import *
from pycisTopic.diff_features import *
import pickle
infile = open('cistopic_object.pkl', 'rb')
cistopic_obj = pickle.load(infile)
infile.close()
topic_region_ = cistopic_obj.selected_model.topic_region.to_numpy(dtype=np.float32)
cell_topic_ = cistopic_obj.selected_model.cell_topic.to_numpy(dtype=np.float32)
regions = cistopic_obj.selected_model.topic_region.index.tolist()
## Get mean and dispersion of normalized imputed accessibility per region.
(region_names_to_keep,
per_region_means_on_normalized_imputed_acc,
per_region_dispersions_on_normalized_imputed_acc,) =
calculate_per_region_mean_and_dispersion_on_normalized_imputed_acc(
region_topic=topic_region_,
cell_topic=cell_topic_,
region_names=regions,
scale_factor1 = 10**6,
scale_factor2 = 10**4,
regions_chunk_size=20000,)
## Optional: save output
import numpy as np
import pandas as pd
np.savez_compressed("impute_acc_per_region_mean.npz", arr=impute_acc_per_region_mean)
np.savez_compressed("impute_acc_per_region_dispersion.npz", arr=impute_acc_per_region_dispersion)
np.savez_compressed(" region_idx_to_keep.npz", arr=region_idx_to_keep)
pd.DataFrame(region_names_to_keep, columns=["regions"]).to_csv("region_names_to_keep.csv", index=False)
## Find highly variable features.
var_features = find_highly_variable_features(
features=region_names_to_keep,
per_region_means_on_normalized_imputed_acc=per_region_means_on_normalized_imputed_acc,
per_region_dispersions_on_normalized_imputed_acc=per_region_dispersions_on_normalized_imputed_acc,
min_disp = 0.05,
min_mean = 0.0125,
max_disp = float("inf"),
max_mean = 3,
n_bins = 20,
n_top_features = None,
plot = True,)
## Optional: save regions
pd.DataFrame(var_features, columns=["regions"]).to_csv("var_features.csv", index=False)
Calculate DARs (old code still, WIP): ## read data
infile = open('cistopic_object.pkl', 'rb')
cistopic_obj = pickle.load(infile)
infile.close()
var_features_path = "var_features.csv"
var_features_df = pd.read_csv(var_features_path)
var_features = var_features_df['regions'].tolist()
## use old impute_accessibility function with variable features only
## note that this step may still require quite some resources
## this is only an intermediate solution and will be updated soon
imputed_acc_obj = impute_accessibility(
cistopic_obj,
selected_cells=None,
selected_regions=var_features,
scale_factor=10**6
)
## calculate DARs
markers_dict= find_diff_features(
cistopic_obj,
imputed_acc_obj,
variable='cell_type',
var_features=var_features,
contrasts = None,
adjpval_thr=0.05,
log2fc_thr=np.log2(1.5),
n_cpu=1,
_temp_dir='tmp_dir'
)
## Optional: save DARs
with open('DARs.pkl', 'wb') as f:
pickle.dump(markers_dict, f) |
Hi, I understand its a branch of pycistopic, but how do i access this branch when i have the latest main branch of scenic+ installed? |
Hello!
I am having issues with the find_diff_features function in the pycisTopic library as a part of the SCENIC+ workflow. I have not had any issues generating the previous, required dataframes in the workflow according to the live lecture aside from perhaps a few NaNs in the Seurat_cell_type column of my cistopic_obj. But for some reason when I run find_diff_features, Jupyter Notebook tells me that a local Ray instance has started but then cuts off right there without any additional processes, and I am left with an empty markers_dict. Is there something I can check to fix this? Here is the code I am running.
import ray
markers_dict= find_diff_features(
cistopic_obj,
imputed_acc_obj,
variable='Seurat_cell_type',
var_features=variable_regions,
contrasts=None,
adjpval_thr=0.05,
log2fc_thr=np.log2(1.5),
n_cpu=5,
_temp_dir='/tmp',
split_pattern = None
)
The text was updated successfully, but these errors were encountered: