Highest Voted Questions - Bioinformatics Stack Exchange

3

votes

1 answer

How to remove the unpaired reads in sam/bam files?

I have sam and bam files for the chimeric reads, which come from two different parts of the genome (For example, the first half of the read from part of Chromosome 1 and the second half of the read from part of Chromosome 3). I have removed the low…

sequence-alignment samtools phylogenetics

asked Jun 19 '22 at 08:01

Wang Ming

101
4

3

votes

2 answers

How to align integer (non-DNA/protein) sequences?

I am looking for an algorithm to find the "best" alignment between two sequences of integers similar to how one aligns nucleic acids or amino acids for homology comparisons. For example, the best alignment for the two sequences below is: …

phylogenetics sequence-alignment phylogeny

asked Jun 17 '22 at 14:49

turtle

131
3

3

votes

0 answers

How to remove redundancy from a gtf file?

I have an annotation file. I would like to remove redundancy, as shown in the example (in the real file, I have a lot of these redundant cases). I would like to consider only one of the following genes (the longest could be a good choice). In the…

sequence-annotation bedtools gtf merge

asked Jun 17 '22 at 12:44

Marco

141
4

3

votes

1 answer

Different results of spearman correlation between TPM and FPKM

TPM and FPKM of RNA-Seq data form GDC TCGA calculated based STAR were retrieved, respectively. The correlation between a specific gene, e.g. HIF1A, and other genes were calculated based on TPM and FPKM, respectively. And the significant genes were…

rna-seq correlation fpkm tpm

asked Jun 16 '22 at 03:27

Yang Shi

33
4

3

votes

2 answers

How can I download from NCBI all the ITS genes and the related taxonomy?

I would like to download all the ITS1 and ITS2 genes from NCBI in a fasta file. And, I'd like to download even the related taxonomy of each sequence. Thanks, Marco

phylogenetics taxonomy

asked Jun 08 '22 at 15:12

Marco

141
4

3

votes

3 answers

Compare my VCF to gnomAD variants

I have a VCF with small variant calls against HG38 and I would like to determine which of those calls are present in the gnomAD database. Is there an existing tool that can do this? Should I be looking at variant annotation tools? or is this…

vcf

asked Jun 06 '22 at 18:40

ScottMastro

133
4

3

votes

1 answer

How to make pan-core genome curve through command line on linux?

I´m working with a dataset of 566 genomes to analyze a pangenome. So I was working with PANWEB to create this pan core genome curve, however, there is too much sequence to work with this webserver. Well, specifically I´m looking for this kind of…

genome metagenome

asked Jun 01 '22 at 14:33

Mauri1313

185
5

3

votes

3 answers

How to get strain names/ids contained in a multi FASTA file using seqkit?

FASTA files can be very big and unwieldy, especially if lines are at most 80 characters, one can't speed up browsing them by using less with -S to have one sequence every two lines. How can I extract just the strain names (or sequence names, i.e.…

fasta identifiers seqkit multi-fasta

asked May 30 '22 at 09:01

Cornelius Roemer

367
1
13

3

votes

1 answer

Perform protein structure-based sequence alignment in Python

I am looking for a Python package that performs pairwise structural alignment of protein structures (i.e., PDB files) and returns a sequence alignment. PyMOL is able to do this through the GUI, for example: For two protein PDBs, one can be aligned…

sequence-alignment proteins protein-structure

asked May 23 '22 at 14:02

Francho Nerín Fonz

33
3

3

votes

1 answer

What does this accession NCBI code mean: 6MWN_B?

According to this article, accession codes should consist from a combination of uppercase letters following a combination of digits. If this is a RefSeq, it can have a prefix as a combination of uppercase letters with underscore. But this accession…

phylogenetics genbank

asked May 06 '22 at 09:07

Vovin

355
10

3

votes

1 answer

How to manage memory contraints when analyzing a large number of gene count matrices? I keep running out of RAM with my current pipeline

I have several hundred scRNA-seq count matrices, each from a different sample. For my other dataset containg a few dozen samples, I simply merged everything together into one Seurat object, but that won’t work here as far as I can tell. When I try…

r scrnaseq seurat

asked May 02 '22 at 21:54

Johnny Rocketfingers

49
2

3

votes

1 answer

Phylogenetic tree rooting in shotgun metagenomics

But I have some weeks fighting with this issue about phylogenetic tree building to use in a phyloseq object in order to calculate beta-diversity metrics that takes into account tree distance branches metrics. I have one tree for Archea and another…

phylogenetics phylogeny metagenome scaffold

asked May 01 '22 at 09:05

MagíBC

41
3

3

votes

1 answer

Imputing small region of the genome

If I'm looking for a specific SNP in my SNP-Chip data and it isn't there, are there any tools that let me quickly impute that SNP from surrounding SNPs rather than running a lengthy 'whole chromosome' imputation job? If so, roughly how many upstream…

snp imputation haplotypes tools

asked Apr 27 '22 at 09:34

Dan Bolser

440
2
9

3

votes

1 answer

What is & how to solve File error: my.xml.state (Remote I/O error)?

I caught the next exception during my phylogeographical analysis in BEAST 2 with GEO_SPHERE. What could be the reason? & how to evade this in the future? ... 856000000 -3662.2647 5969.5577 -9631.8225 42m16s/Msamples …

phylogenetics beast

asked Apr 20 '22 at 11:30

Vovin

355
10

3

votes

1 answer

Parsing pre-2007 SMILES string

How would one parse the SMILES string BrC[2]:C[3]:C(:CH:CH:CH:@2):CH:CH:CH:CH:@3 I rely on tools like rdkit and OpenBabel to parse SMILES, but both tools aren't able to parse this string. More specifically, this SMILES string comes from the…

perl computational-biochemistry rdkit

asked Apr 19 '22 at 00:35

Ryan Park

41
5

Most Popular