Most Popular

1500 questions
18
votes
2 answers

How can we distinguish between true zero and dropout-zero counts in single-cell RNA-seq?

In single-cell RNA-seq data we have an inflated number of 0 (or near-zero) counts due to low mRNA capture rate and other inefficiencies. How can we decide which genes are 0 due to gene dropout (lack of measurement sensitivity), and which are…
Peter
  • 2,594
  • 14
  • 33
18
votes
4 answers

How to compute RPKM in R?

I have the following data of fragment counts for each gene in 16 samples: > str(expression) 'data.frame': 42412 obs. of 16 variables: $ sample1 : int 4555 49 122 351 53 27 1 0 0 2513 ... $ sample2 : int 2991 51 55 94 49 10 55 0 0 978 ... $…
Iakov Davydov
  • 2,705
  • 12
  • 32
18
votes
3 answers

How can I downsample a BAM file while keeping both reads in pairs?

I know how to downsample a BAM file to lower coverage. I know I can randomly select lines in SAM, but this procedure can't guarantee two reads in a pair are always sampled the same time. Is there a way to downsample BAM while keeping pairing…
medbe
  • 787
  • 1
  • 6
  • 9
18
votes
3 answers

Convert a BAM file from one reference to another?

I have a set of BAM files that are aligned using the NCBI GRCh37 human genome reference (with the chromosome names as NC_000001.10) but I want to analyze it using a BED file that has the UCSC hg19 chromosome names (e.g. chr1). I want to use bedtools…
18
votes
1 answer

How can I improve a long-read assembly with a repetitive genome?

I'm currently trying to assembly a genome from a rodent parasite, Nippostrongylus brasiliensis. This genome does have an existing reference genome, but it is highly fragmented. Here are some continuity statistics for the scaffolds of the current…
gringer
  • 12,758
  • 5
  • 21
  • 75
17
votes
3 answers

BAM to BigWig without intermediary BedGraph

I have a pipeline for generating a BigWig file from a BAM file: BAM -> BedGraph -> BigWig Which uses bedtools genomecov for the BAM -> BedGraph part and bedGraphToBigWig for the BedGraph -> BigWig part. The use of bedGraphToBigWig to create the…
17
votes
5 answers

What's the best way to download data from the SRA? Is it really this slow?

I'm trying to download three WGS datasets from the SRA that are each between 60 and 100GB in size. So far I've tried: Fetching the .sra files directly from NCBI's ftp site Fetching the .sra files directly using the aspera command line (ascp) Using…
tfenne
  • 171
  • 1
  • 4
17
votes
5 answers

Is there an easy way to create a summary of a VCF file (v4.1) with structural variations?

I got a bunch of vcf files (v4.1) with structural variations of bunch of non-model organisms (i.e. there are no known variants). I found there are quite a some tools to manipulate vcf files like VCFtools, R package vcfR or python library PyVCF.…
Kamil S Jaron
  • 5,437
  • 1
  • 22
  • 57
17
votes
2 answers

How can I extract normalized read count values from DESeq2 results?

The results obtained by running the results command from DESeq2 contain a "baseMean" column, which I assume is the mean across samples of the normalized counts for a given gene. How can I access the normalized counts proper? I tried the following…
bli
  • 3,040
  • 12
  • 35
16
votes
4 answers

Why Bioconductor?

What are the advantages of having Bioconductor, for the bioinformatics community? I've read the 'About' section and skimmed the paper, but still cannot really answer this. I understand Bioconductor is released twice a year (unlike R), but if I want…
Peter
  • 2,594
  • 14
  • 33
16
votes
2 answers

Alignment based vs reference-free (transcriptome analysis)?

I want to focus on transcriptome analysis. We know it's possible to analyze RNA-Seq experiment based on alignment or k-mers. Possible alignment workflow: Align sequence reads with TopHat2 Quantify the gene expression with Cufflinks Possible…
SmallChess
  • 2,689
  • 2
  • 15
  • 33
16
votes
3 answers

R package development: How does one automatically install Bioconductor packages upon package installation?

I have an R package on github which uses multiple Bioconductor dependencies, 'myPackage' If I include CRAN packages in the DESCRIPTION via Depends:, the packages will automatically install upon installation via devtools, i.e.…
ShanZhengYang
  • 1,651
  • 13
  • 19
16
votes
3 answers

Designing a lab NGS file database schema

I am the resident Bioinfo Geek in a hospital academic lab that routinely employs NGS as well as CyTOF and other large volume data producing technologies. I am sick of our current "protocol" for metadata collection and association with the final…
Gus
  • 346
  • 1
  • 7
16
votes
2 answers

How can I call structural variants (SVs) from pair-end short read resequencing data?

I have a reference genome and now I would like to call structural variants from Illumina pair-end whole genome resequencing data (insert size 700bp). There are many tools for SV calls (I made an incomplete list of tools bellow). There is also a…
Kamil S Jaron
  • 5,437
  • 1
  • 22
  • 57
15
votes
10 answers

How to simulate NGS reads, controlling sequence coverage?

I have a FASTA file with 100+ sequences like this: >Sequence1 GTGCCTATTGCTACTAAAA ... >Sequence2 GCAATGCAAGGAAGTGATGGCGGAAATAGCGTTA ...... I also have a text file like this: Sequence1 40 Sequence2 30 ...... I would like to simulate next-generation…
SmallChess
  • 2,689
  • 2
  • 15
  • 33