Most Popular

1500 questions
3
votes
1 answer

How to remove the unpaired reads in sam/bam files?

I have sam and bam files for the chimeric reads, which come from two different parts of the genome (For example, the first half of the read from part of Chromosome 1 and the second half of the read from part of Chromosome 3). I have removed the low…
Wang Ming
  • 101
  • 4
3
votes
2 answers

How to align integer (non-DNA/protein) sequences?

I am looking for an algorithm to find the "best" alignment between two sequences of integers similar to how one aligns nucleic acids or amino acids for homology comparisons. For example, the best alignment for the two sequences below is: …
turtle
  • 131
  • 3
3
votes
0 answers

How to remove redundancy from a gtf file?

I have an annotation file. I would like to remove redundancy, as shown in the example (in the real file, I have a lot of these redundant cases). I would like to consider only one of the following genes (the longest could be a good choice). In the…
Marco
  • 141
  • 4
3
votes
1 answer

Different results of spearman correlation between TPM and FPKM

TPM and FPKM of RNA-Seq data form GDC TCGA calculated based STAR were retrieved, respectively. The correlation between a specific gene, e.g. HIF1A, and other genes were calculated based on TPM and FPKM, respectively. And the significant genes were…
Yang Shi
  • 33
  • 4
3
votes
2 answers

How can I download from NCBI all the ITS genes and the related taxonomy?

I would like to download all the ITS1 and ITS2 genes from NCBI in a fasta file. And, I'd like to download even the related taxonomy of each sequence. Thanks, Marco
Marco
  • 141
  • 4
3
votes
3 answers

Compare my VCF to gnomAD variants

I have a VCF with small variant calls against HG38 and I would like to determine which of those calls are present in the gnomAD database. Is there an existing tool that can do this? Should I be looking at variant annotation tools? or is this…
ScottMastro
  • 133
  • 4
3
votes
1 answer

How to make pan-core genome curve through command line on linux?

I´m working with a dataset of 566 genomes to analyze a pangenome. So I was working with PANWEB to create this pan core genome curve, however, there is too much sequence to work with this webserver. Well, specifically I´m looking for this kind of…
Mauri1313
  • 185
  • 5
3
votes
3 answers

How to get strain names/ids contained in a multi FASTA file using seqkit?

FASTA files can be very big and unwieldy, especially if lines are at most 80 characters, one can't speed up browsing them by using less with -S to have one sequence every two lines. How can I extract just the strain names (or sequence names, i.e.…
Cornelius Roemer
  • 367
  • 1
  • 13
3
votes
1 answer

Perform protein structure-based sequence alignment in Python

I am looking for a Python package that performs pairwise structural alignment of protein structures (i.e., PDB files) and returns a sequence alignment. PyMOL is able to do this through the GUI, for example: For two protein PDBs, one can be aligned…
3
votes
1 answer

What does this accession NCBI code mean: 6MWN_B?

According to this article, accession codes should consist from a combination of uppercase letters following a combination of digits. If this is a RefSeq, it can have a prefix as a combination of uppercase letters with underscore. But this accession…
Vovin
  • 355
  • 10
3
votes
1 answer

How to manage memory contraints when analyzing a large number of gene count matrices? I keep running out of RAM with my current pipeline

I have several hundred scRNA-seq count matrices, each from a different sample. For my other dataset containg a few dozen samples, I simply merged everything together into one Seurat object, but that won’t work here as far as I can tell. When I try…
3
votes
1 answer

Phylogenetic tree rooting in shotgun metagenomics

But I have some weeks fighting with this issue about phylogenetic tree building to use in a phyloseq object in order to calculate beta-diversity metrics that takes into account tree distance branches metrics. I have one tree for Archea and another…
MagíBC
  • 41
  • 3
3
votes
1 answer

Imputing small region of the genome

If I'm looking for a specific SNP in my SNP-Chip data and it isn't there, are there any tools that let me quickly impute that SNP from surrounding SNPs rather than running a lengthy 'whole chromosome' imputation job? If so, roughly how many upstream…
Dan Bolser
  • 440
  • 2
  • 9
3
votes
1 answer

What is & how to solve File error: my.xml.state (Remote I/O error)?

I caught the next exception during my phylogeographical analysis in BEAST 2 with GEO_SPHERE. What could be the reason? & how to evade this in the future? ... 856000000 -3662.2647 5969.5577 -9631.8225 42m16s/Msamples …
Vovin
  • 355
  • 10
3
votes
1 answer

Parsing pre-2007 SMILES string

How would one parse the SMILES string BrC[2]:C[3]:C(:CH:CH:CH:@2):CH:CH:CH:CH:@3 I rely on tools like rdkit and OpenBabel to parse SMILES, but both tools aren't able to parse this string. More specifically, this SMILES string comes from the…
Ryan Park
  • 41
  • 5
1 2 3
99
100