Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidIndexError: (slice(None, None, None), 30751) while using coverage plot #493

Open
yojetsharma opened this issue Oct 25, 2024 · 6 comments

Comments

@yojetsharma
Copy link

genes_violin_plot = 'MAFIP'
region='chr9:39287963-39288463'
# Create a coverage plot
fig = coverage_plot(
    SCENICPLUS_obj=scplus_obj,  # Your SCENICPLUS object
    bw_dict=bw_dict,
    region=region,
    genes_violin_plot=genes_violin_plot,
    gene_height=1,
    exon_height=4,
    meta_data_key='broader_annotation',  # Adjust based on your metadata structure
    figsize=(8, 10)  # Adjust figure size as needed
)
scplus_obj.gene_names:
Index(['MIR1302-2HG', 'AL627309.1', 'AL627309.5', 'AL627309.4', 'AP006222.2',
       'AL669831.2', 'LINC01409', 'FAM87B', 'LINC01128', 'LINC00115',
       ...
       'MAFIP', 'AC011043.1', 'AC011043.2', 'AC011841.1', 'AL354822.1',
       'AL592183.1', 'AC240274.1', 'AC004556.3', 'AC007325.4', 'AC007325.2'],
      dtype='object', length=30761)
@SeppeDeWinter
Copy link
Collaborator

Hi @yojetsharma

Can you provide the full error output please?

All the best,

Seppe

@yojetsharma
Copy link
Author

yojetsharma commented Nov 11, 2024

The code in question was run using a different pr_annotation file (i will get back to you on that complete shortly). But the code I tried recently gave the following error (pr_annotation file was the same that is used for pycistopic analysis for this):

from scenicplus.plotting.coverageplot import coverage_plot
scenicplus.plotting.coverageplot.coverage_plot(
    scplus_obj,
    bw_dict=bw_dict,
    region='chr14:100725892-100739224',
    meta_data_key='broader_annotation',
    pr_consensus_bed=pr_consensus_bed,
    pr_gtf=pr_annotation,
    genes_violin_plot='DLK1'
)

/home/praghu/yojetsharma/.conda/envs/scenicplus/lib/python3.11/site-packages/pyranges/methods/intersection.py:52: FutureWarning: In a future version, df.iloc[:, i] = newvals will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either df[df.columns[i]] = newvals or, if columns are non-unique, df.isetitem(i, newvals)
scdf.loc[:, "Start"] = new_starts
/home/praghu/yojetsharma/.conda/envs/scenicplus/lib/python3.11/site-packages/pyranges/methods/intersection.py:53: FutureWarning: In a future version, df.iloc[:, i] = newvals will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either df[df.columns[i]] = newvals or, if columns are non-unique, df.isetitem(i, newvals)
scdf.loc[:, "End"] = new_ends

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[75], line 2
      1 from scenicplus.plotting.coverageplot import coverage_plot
----> 2 scenicplus.plotting.coverageplot.coverage_plot(
      3     scplus_obj,
      4     bw_dict=bw_dict,
      5     region='chr14:100725892-100739224',
      6     meta_data_key='broader_annotation',
      7     pr_consensus_bed=pr_consensus_bed,
      8     pr_gtf=pr_annotation,
      9     genes_violin_plot='DLK1'
     10 )

File ~/.conda/envs/scenicplus/lib/python3.11/site-packages/scenicplus/plotting/coverageplot.py:288, in coverage_plot(SCENICPLUS_obj, bw_dict, region, genes_violin_plot, genes_arcs, gene_height, exon_height, meta_data_key, pr_consensus_bed, region_bed_height, pr_gtf, pr_interact, bw_ymax, color_dict, cmap, plot_order, figsize, fontsize_dict, gene_label_offset, arc_rad, arc_lw, cmap_violinplots, violinplots_means_color, violinplots_edge_color, violoinplots_alpha, width_ratios_dict, height_ratios_dict, sort_vln_plots, add_custom_ax)
    283 gtf_region_intersect = pr_gtf.intersect(pr_region)
    284 # only keep exon and gene info
    285 gtf_region_intersect = gtf_region_intersect[np.logical_and(
    286     np.logical_or(gtf_region_intersect.Feature == 'gene',
    287                   gtf_region_intersect.Feature == 'exon'),
--> 288     gtf_region_intersect.gene_type == 'protein_coding')]
    289 # iterate over all genes in intersect
    290 ax = axs_bw[subplot_idx]

File ~/.conda/envs/scenicplus/lib/python3.11/site-packages/pyranges/pyranges.py:269, in PyRanges.__getattr__(self, name)
    244 """Return column.
    245 
    246 Parameters
   (...)
    264 Name: Start, dtype: int32
    265 """
    267 from pyranges.methods.attr import _getattr
--> 269 return _getattr(self, name)

File ~/.conda/envs/scenicplus/lib/python3.11/site-packages/pyranges/methods/attr.py:67, in _getattr(self, name)
     65     return pd.concat([df[name] for df in self.values()])
     66 else:
---> 67     raise AttributeError("PyRanges object has no attribute", name)

AttributeError: ('PyRanges object has no attribute', 'gene_type')

Screenshot from 2024-11-11 21-56-32

@yojetsharma
Copy link
Author

Hi @SeppeDeWinter,
Any suggestions on how to prevent the above error?

@SeppeDeWinter
Copy link
Collaborator

Hi @yojetsharma

Can you show the output of

It looks like it does not have a gene_type column.

pr_annotation

Best,

Seppe

@yojetsharma
Copy link
Author

yojetsharma commented Dec 31, 2024

Hello @SeppeDeWinter
Hope you're doing well and Happy New Year.
This is how the pr_annotation looks like now after renaming the column name:

	Chromosome	Start	End	gene_type	Score	Strand	Transcript_type	Transcription_Start_Site
0	GL000009.2	58375	58376	NaN	.	-	protein_coding	58375
1	GL000194.1	115017	115018	NaN	.	-	protein_coding	115017
2	GL000194.1	115054	115055	MAFIP	.	-	protein_coding	115054
3	GL000195.1	49163	49164	NaN	.	-	protein_coding	49163
4	GL000213.1	139654	139655	NaN	.	-	protein_coding	139654
...	...	...	...	...	...	...	...	...
87556	chrY	6911751	6911752	AMELY	.	-	protein_coding	6911751
87557	chrY	6872607	6872608	AMELY	.	-	protein_coding	6872607
87558	chrY	21918031	21918032	RBMY1E	.	-	protein_coding	21918031
87559	chrY	24047968	24047969	CDY1B	.	-	protein_coding	24047968
87560	chrY	24048018	24048019	CDY1B	.	-	protein_coding	

And now after running the following:

import pyranges as pr

pr_annotation = pd.read_table(
        os.path.join("/home/praghu/yojetsharma/131024/", "qc", "tss.bed") ##from pycisTopic workflow
    ).rename(
        {"Name": "Gene", "# Chromosome": "Chromosome"}, axis = 1)
pr_annotation["Transcription_Start_Site"] = pr_annotation["Start"]
pr_annotation = pr.PyRanges(pr_annotation)
pr_annotation
pr_annotation.to_csv('genome_annotation.tsv', sep='\t')
from scenicplus.plotting.coverageplot import coverage_plot
bigwig_dir = '/home/praghu/yojetsharma/pseudobulk_bw_files/'
bw_dict = {x.replace('.bw', ''): os.path.join(bigwig_dir, x) for x in os.listdir(bigwig_dir) if '.bw' in x}
pr_consensus_bed = pr.read_bed('/home/praghu/yojetsharma/consensus_regions.bed')
scenicplus.plotting.coverageplot.coverage_plot(
    scplus_obj,
    bw_dict=bw_dict,
    region='chr14:100725892-100739224',
    meta_data_key='broader_annotation',
    pr_consensus_bed=pr_consensus_bed,
    pr_gtf=pr_annotation,
    genes_violin_plot='DLK1'
)

Gives the following error:

File ~/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/plotting/coverageplot.py:286, in coverage_plot(SCENICPLUS_obj, bw_dict, region, genes_violin_plot, genes_arcs, gene_height, exon_height, meta_data_key, pr_consensus_bed, region_bed_height, pr_gtf, pr_interact, bw_ymax, color_dict, cmap, plot_order, figsize, fontsize_dict, gene_label_offset, arc_rad, arc_lw, cmap_violinplots, violinplots_means_color, violinplots_edge_color, violoinplots_alpha, width_ratios_dict, height_ratios_dict, sort_vln_plots, add_custom_ax)
    283 gtf_region_intersect = pr_gtf.intersect(pr_region)
    284 # only keep exon and gene info
    285 gtf_region_intersect = gtf_region_intersect[np.logical_and(
--> 286     np.logical_or(gtf_region_intersect.Feature == 'gene',
    287                   gtf_region_intersect.Feature == 'exon'),
    288     gtf_region_intersect.gene_type == 'protein_coding')]
    289 # iterate over all genes in intersect
    290 ax = axs_bw[subplot_idx]

File ~/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pyranges/pyranges.py:269, in PyRanges.__getattr__(self, name)
    244 """Return column.
    245 
    246 Parameters
   (...)
    264 Name: Start, dtype: int32
    265 """
    267 from pyranges.methods.attr import _getattr
--> 269 return _getattr(self, name)

File ~/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pyranges/methods/attr.py:67, in _getattr(self, name)
     65     return pd.concat([df[name] for df in self.values()])
     66 else:
---> 67     raise AttributeError("PyRanges object has no attribute", name)

AttributeError: ('PyRanges object has no attribute', 'Feature')

Am I going wrong with my pr_annotation file?

@yojetsharma
Copy link
Author

yojetsharma commented Dec 31, 2024

Since the previous error showed that "Feature" issue, I tried to solve that by downloading the gtf from https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.annotation.gtf.gz . Then created using that created pr_annotation as follows:

# Read the GTF content into a DataFrame
pr_annotation = pd.read_csv(io.StringIO(content), sep='\t', comment='#', header=None,
                            names=['Chromosome', 'Source', 'Feature', 'Start', 'End', 'Score', 'Strand', 'Frame', 'Attributes'])

# Process the Attributes column to extract gene_id and gene_name
pr_annotation['gene_id'] = pr_annotation['Attributes'].str.extract('gene_id "(.*?)"')
pr_annotation['gene_name'] = pr_annotation['Attributes'].str.extract('gene_name "(.*?)"')
pr_annotation['gene_type'] = pr_annotation['Attributes'].str.extract('gene_type "(.*?)"')
pr_annotation['transcript_type'] = pr_annotation['Attributes'].str.extract('transcript_type "(.*?)"')
pr_annotation['transcript_namne'] = pr_annotation['Attributes'].str.extract('transcript_name "(.*?)"')

# Add the Transcription_Start_Site column
pr_annotation["Transcription_Start_Site"] = pr_annotation["Start"]

# Convert to PyRanges object
pr_annotation = pr.PyRanges(pr_annotation)
pr_annotation

And this is how it looks like now:


Chromosome | Source | Feature | Start | End | Score | Strand | Frame | gene_id | gene_name | gene_type | transcript_type | transcript_namne | Transcription_Start_Site
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
chr1 | HAVANA | gene | 11869 | 14409 | . | + | . | ENSG00000223972.5 | DDX11L1 | transcribed_unprocessed_pseudogene | NaN | NaN | 11869
chr1 | HAVANA | transcript | 11869 | 14409 | . | + | . | ENSG00000223972.5 | DDX11L1 | transcribed_unprocessed_pseudogene | lncRNA | DDX11L1-202 | 11869
chr1 | HAVANA | exon | 11869 | 12227 | . | + | . | ENSG00000223972.5 | DDX11L1 | transcribed_unprocessed_pseudogene | lncRNA | DDX11L1-202 | 11869
chr1 | HAVANA | exon | 12613 | 12721 | . | + | . | ENSG00000223972.5 | DDX11L1 | transcribed_unprocessed_pseudogene | lncRNA | DDX11L1-202 | 12613
chr1 | HAVANA | exon | 13221 | 14409 | . | + | . | ENSG00000223972.5 | DDX11L1 | transcribed_unprocessed_pseudogene | lncRNA | DDX11L1-202 | 13221
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ...
chrY | HAVANA | exon | 57214350 | 57214397 | . | - | . | ENSG00000227159.8_PAR_Y | DDX11L16 | unprocessed_pseudogene | unprocessed_pseudogene | DDX11L16-201 | 57214350
chrY | HAVANA | exon | 57213880 | 57213964 | . | - | . | ENSG00000227159.8_PAR_Y | DDX11L16 | unprocessed_pseudogene | unprocessed_pseudogene | DDX11L16-201 | 57213880
chrY | HAVANA | exon | 57213526 | 57213602 | . | - | . | ENSG00000227159.8_PAR_Y | DDX11L16 | unprocessed_pseudogene | unprocessed_pseudogene | DDX11L16-201 | 57213526
chrY | HAVANA | exon | 57213204 | 57213357 | . | - | . | ENSG00000227159.8_PAR_Y | DDX11L16 | unprocessed_pseudogene | unprocessed_pseudogene | DDX11L16-201 | 57213204
chrY | HAVANA | exon | 57212184 | 57213125 | . | - | . | ENSG00000227159.8_PAR_Y | DDX11L16 | unprocessed_pseudogene | unprocessed_pseudogene | DDX11L16-201 | 57212184

Then followed the same code as above to generate the coverage_plot:

from scenicplus.plotting.coverageplot import coverage_plot
bigwig_dir = '/home/praghu/yojetsharma/pseudobulk_bw_files/'
bw_dict = {x.replace('.bw', ''): os.path.join(bigwig_dir, x) for x in os.listdir(bigwig_dir) if '.bw' in x}
pr_consensus_bed = pr.read_bed('/home/praghu/yojetsharma/consensus_regions.bed')
scenicplus.plotting.coverageplot.coverage_plot(
    scplus_obj,
    bw_dict=bw_dict,
    region='chr14:100725892-100739224',
    meta_data_key='broader_annotation',
    pr_consensus_bed=pr_consensus_bed,
    pr_gtf=pr_annotation
)
Screenshot 2025-01-01 at 1 25 13 AM

I removed the genes_violin_plot parameter as that was giving SliceError.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants