Most Popular
1500 questions
15
votes
3 answers
Publicly available genome sequence database for viruses?
As a small introductory project, I want to compare genome sequences of different strains of influenza virus.
What are the publicly available databases of influenza virus gene/genome sequences?
AlwaysTrying44
- 445
- 2
- 8
15
votes
5 answers
How do I carry out an ancestry/admixture test on a single VCF file?
This is a question from /u/beneficii9 on reddit. The original post can be found here.
Through the Personal Genome Project, I have had my whole genome sequenced by Veritas, and have it in the form of a single VCF file for the whole genome and one…
gringer
- 12,758
- 5
- 21
- 75
15
votes
1 answer
Why is bwa-mem the standard algorithm when using bwa?
The industry standard for aligning short reads seems to be bwa-mem. However, in my tests I have seen that using bwa backtrack (bwa-aln + bwa-sampe + bwa-samse) performs better. It is slightly slower, but gives significantly better results in terms…
terdon
- 8,869
- 3
- 16
- 44
15
votes
1 answer
What is the difference between samtools, bamtools, picard, sambamba and biobambam?
After some google searches, I found multiple tools with overlapping functionality for viewing, merging, pileuping, etc. I have not got time to try these tools, so will just see if anyone already know the answer: what is the difference between them?…
medbe
- 787
- 1
- 6
- 9
15
votes
2 answers
Downloading a reference Genome for Bowtie2
How do I download a reference genome that I can use with bowtie2? Specifically HG19. On UCSC there are a lot of file options.
EMiller
- 483
- 1
- 4
- 11
15
votes
2 answers
Meaning of BWA-MEM MAPQ scores
Does anyone know what the MAPQ values produced by BWA-MEM mean?
I'm looking for something similar to what Keith Bradnam discovered for Tophat v 1.4.1, where he realized that:
0 = maps to 5 or more locations
1 = maps to 3-4 locations
3 = maps to
…
ijoseph
- 253
- 1
- 2
- 8
15
votes
3 answers
What are the advantages and disadvantages between using KEGG or Reactome?
As enrichment analysis a usual step is to infer the pathways enriched in a list of genes. However I can't find a discussion about which database is better. Two of the most popular (in my particular environment) are Reactome and KEGG (Maybe because…
llrs
- 4,662
- 1
- 17
- 41
15
votes
2 answers
Difference between CPM and TPM and which one for downstream analysis?
What the difference between TPM and CPM when dealing with RNA seq data?
What metrics would you use if you have to perform some down stream analysis other than Differential expression for eg.
Clustering analysis using Hclust function and then…
novicebioinforesearcher
- 761
- 1
- 6
- 15
15
votes
4 answers
How to convert BED to GFF3
I would like to convert a BED format to GFF3.
The only useful tool that I could find via a google search seems to be Galaxy, and I do not feel very comfortable with online tools, plus the webserver is currenlty under maintenance.
Does anyone knows…
aechchiki
- 2,656
- 10
- 33
15
votes
2 answers
Merge hundreds of small BAM files into a single BAM file
I am working with over a million (long) reads, and aligning them to a large genome. I am considering running my alignment jobs in parallel, distributing horizontally across hundreds of nodes rather than trying to run a single job with dozens of…
Scott Gigante
- 2,103
- 1
- 12
- 32
14
votes
7 answers
Is there public RESTful api for Gnomad?
I currently find Harvard's RESTful API for ExAC extremely useful and I was hoping that a similar resource is available for Gnomad?
Does anyone know of a public access API for Gnomad or possibly any plans to integrate Gnomad into the Harvard API?
Pasted
- 243
- 2
- 5
14
votes
2 answers
Mapping drug names to ATC codes
I'm interested working with the medication information provided by the UK Biobank. In order to get these into a usable form I would like to map them to ATC codes. Since many of the drugs listed in the data showcase include dosage information,…
Greg
- 841
- 6
- 12
14
votes
3 answers
How to obtain .bed file with coordinates of all genes
I want to get a .bed file with the genes' names and canonical coordinates, also I would like to have coordinates of exons, too. I can get the list from UCSC, however, if I choose UCSC Genes - knownCanonical, I can not extract coordinates of exons.…
German Demidov
- 363
- 1
- 2
- 9
14
votes
3 answers
How do you write a .gz fastq file with Biopython?
How do you write a .gz (or .bgz) fastq file using Biopython?
I'd rather avoid a separate system call.
The typical way to write an ASCII .fastq is done as follows:
for record in SeqIO.parse(fasta, "fasta"):
SeqIO.write(record, fastq,…
Mark Ebbert
- 1,274
- 8
- 19
14
votes
1 answer
What is the difference between a Bioinformatics pipeline and workflow?
I want to understand the difference between pipeline systems and workflow engines.
After reading A Review of Scalable Bioinformatics Pipelines I had a good overview of current bioinformatics pipelines. After some further research I found that there…
A.Dumas
- 487
- 3
- 9