Most Popular
1500 questions
18
votes
2 answers
How can we distinguish between true zero and dropout-zero counts in single-cell RNA-seq?
In single-cell RNA-seq data we have an inflated number of 0 (or near-zero) counts due to low mRNA capture rate and other inefficiencies.
How can we decide which genes are 0 due to gene dropout (lack of measurement sensitivity), and which are…
Peter
- 2,594
- 14
- 33
18
votes
4 answers
How to compute RPKM in R?
I have the following data of fragment counts for each gene in 16 samples:
> str(expression)
'data.frame': 42412 obs. of 16 variables:
$ sample1 : int 4555 49 122 351 53 27 1 0 0 2513 ...
$ sample2 : int 2991 51 55 94 49 10 55 0 0 978 ...
$…
Iakov Davydov
- 2,705
- 12
- 32
18
votes
3 answers
How can I downsample a BAM file while keeping both reads in pairs?
I know how to downsample a BAM file to lower coverage. I know I can randomly select lines in SAM, but this procedure can't guarantee two reads in a pair are always sampled the same time. Is there a way to downsample BAM while keeping pairing…
medbe
- 787
- 1
- 6
- 9
18
votes
3 answers
Convert a BAM file from one reference to another?
I have a set of BAM files that are aligned using the NCBI GRCh37 human genome reference (with the chromosome names as NC_000001.10) but I want to analyze it using a BED file that has the UCSC hg19 chromosome names (e.g. chr1). I want to use bedtools…
morgantaschuk
- 540
- 4
- 9
18
votes
1 answer
How can I improve a long-read assembly with a repetitive genome?
I'm currently trying to assembly a genome from a rodent parasite, Nippostrongylus brasiliensis. This genome does have an existing reference genome, but it is highly fragmented. Here are some continuity statistics for the scaffolds of the current…
gringer
- 12,758
- 5
- 21
- 75
17
votes
3 answers
BAM to BigWig without intermediary BedGraph
I have a pipeline for generating a BigWig file from a BAM file:
BAM -> BedGraph -> BigWig
Which uses bedtools genomecov for the BAM -> BedGraph part and bedGraphToBigWig for the BedGraph -> BigWig part.
The use of bedGraphToBigWig to create the…
Nathan S. Watson-Haigh
- 407
- 3
- 10
17
votes
5 answers
What's the best way to download data from the SRA? Is it really this slow?
I'm trying to download three WGS datasets from the SRA that are each between 60 and 100GB in size. So far I've tried:
Fetching the .sra files directly from NCBI's ftp site
Fetching the .sra files directly using the aspera command line (ascp)
Using…
tfenne
- 171
- 1
- 4
17
votes
5 answers
Is there an easy way to create a summary of a VCF file (v4.1) with structural variations?
I got a bunch of vcf files (v4.1) with structural variations of bunch of non-model organisms (i.e. there are no known variants). I found there are quite a some tools to manipulate vcf files like VCFtools, R package vcfR or python library PyVCF.…
Kamil S Jaron
- 5,437
- 1
- 22
- 57
17
votes
2 answers
How can I extract normalized read count values from DESeq2 results?
The results obtained by running the results command from DESeq2 contain a "baseMean" column, which I assume is the mean across samples of the normalized counts for a given gene.
How can I access the normalized counts proper?
I tried the following…
bli
- 3,040
- 12
- 35
16
votes
4 answers
Why Bioconductor?
What are the advantages of having Bioconductor, for the bioinformatics community?
I've read the 'About' section and skimmed the paper, but still cannot really answer this.
I understand Bioconductor is released twice a year (unlike R), but if I want…
Peter
- 2,594
- 14
- 33
16
votes
2 answers
Alignment based vs reference-free (transcriptome analysis)?
I want to focus on transcriptome analysis. We know it's possible to analyze RNA-Seq experiment based on alignment or k-mers.
Possible alignment workflow:
Align sequence reads with TopHat2
Quantify the gene expression with Cufflinks
Possible…
SmallChess
- 2,689
- 2
- 15
- 33
16
votes
3 answers
R package development: How does one automatically install Bioconductor packages upon package installation?
I have an R package on github which uses multiple Bioconductor dependencies, 'myPackage'
If I include CRAN packages in the DESCRIPTION via Depends:, the packages will automatically install upon installation via devtools, i.e.…
ShanZhengYang
- 1,651
- 13
- 19
16
votes
3 answers
Designing a lab NGS file database schema
I am the resident Bioinfo Geek in a hospital academic lab that routinely employs NGS as well as CyTOF and other large volume data producing technologies. I am sick of our current "protocol" for metadata collection and association with the final…
Gus
- 346
- 1
- 7
16
votes
2 answers
How can I call structural variants (SVs) from pair-end short read resequencing data?
I have a reference genome and now I would like to call structural variants from Illumina pair-end whole genome resequencing data (insert size 700bp).
There are many tools for SV calls (I made an incomplete list of tools bellow). There is also a…
Kamil S Jaron
- 5,437
- 1
- 22
- 57
15
votes
10 answers
How to simulate NGS reads, controlling sequence coverage?
I have a FASTA file with 100+ sequences like this:
>Sequence1
GTGCCTATTGCTACTAAAA ...
>Sequence2
GCAATGCAAGGAAGTGATGGCGGAAATAGCGTTA
......
I also have a text file like this:
Sequence1 40
Sequence2 30
......
I would like to simulate next-generation…
SmallChess
- 2,689
- 2
- 15
- 33