3

I have the following protein structural variance list:

mutations = ['F594_D600dup', 'D586_E596dup', 'D593_F594insSPEDNEYFYVD', 'E598_F612dup', 'E598_Y599insSGSSDNEYFYVDFREYE', 'E598_Y599insVAYVDFREYE', 'E604_F605insSPRGGNEYFYVDFREYEYDLKWE', 'F594_R595insGTGSSDNEYFYVDF', 'F612_G613insQGFYVDFREYEYDLKWEFPRENLEF', 'L601_K602insNVDFREYEYDL', 'M578_E598dup', 'N587_D600dup', 'R595_E596insDYVDFR', 'R595_L601dup', 'S584_D600dup', 'S584_F605dup', 'S585_F594dup', 'Y597_E598insAGSSDNEYFYVDFREY', 'Y597_E598insDEYFYVDFREY', 'Y597_K602dup', 'Y599_D600insEYEYEYEY', 'Y599_D600insPAPQIMSTSTLISENMNIA', 'Y599_E604dup']) 

and I have this protein sequence:

seq="MPALARDGGQLPLLVVFSAMIFGTITNQDLPVIKCVLINHKNNDSSVGKSSSYPMVSESPEDLGCALRPQSSGTVYEAAAVEVDVSASITLQVLVDAPGNISCLWVFKHSSLNCQPHFDLQNRGVVSMVILKMTETQAGEYLLFIQSEATNYTILFTVSIRNTLLYTLRRPYFRKMENQDALVCISESVPEPIVEWVLCDSQGESCKEESPAVVKKEEKVLHELFGTDIRCCARNELGRECTRLFTIDLNQTPQTTLPQLFLKVGEPLWIRCKAVHVNHGFGLTWELENKALEEGNYFEMSTYSTNRTMIRILFAFVSSVARNDTGYYTCSSSKHPSQSALVTIVEKGFINATNSSEDYEIDQYEEFCFSVRFKAYPQIRCTWTFSRKSFPCEQKGLDNGYSISKFCNHKHQPGEYIFHAENDDAQFTKMFTLNIRRKPQVLAEASASQASCFSDGYPLPSWTWKKCSDKSPNCTEEITEGVWNRKANRKVFGQWVSSSTLNMSEAIKGFLVKCCAYNSLGTSCETILLNSPGPFPFIQDNISFYATIGVCLLFIVVLTLLICHKYKKQFRYESQLQMVQVTGSSDNEYFYVDFREYEYDLKWEFPRENLEFGKVLGSGAFGKVMNATAYGISKTGVSIQVAVKMLKEKADSSEREALMSELKMMTQLGSHENIVNLLGACTLSGPIYLIFEYCCYGDLLNYLRSKREKFHRTWTEIFKEHNFSFYPTFQSHPNSSMPGSREVQIHPDSDQISGLHGNSFHSEDEIEYENQKRLEEEEDLNVLTFEDLLCFAYQVAKGMEFLEFKSCVHRDLAARNVLVTHGKVVKICDFGLARDIMSDSNYVVRGNARLPVKWMAPESLFEGIYTIKSDVWSYGILLWEIFSLGVNPYPGIPVDANFYKLIQNGFKMDQPFYATEEIYIIMQSCWAFDSRKRPSFPNLTSFLGCQLADAEEAMYQNVDGRVSECPHTYQNRRPFSREMDLGLLSPQAQVEDS"

I would be happy to work with hgvs in order to project the mutations on the given sequence. However I am not sure how to feed hgvs with an inline sequence (not via a reference sequence)

First, we need to convert all single letters to a three-letter format. So for example, E27del will be converted to Glu27del

str = "Test:p.Glu27del"
hp = hgvs.parser.Parser()
var_str = hp.parse_hgvs_variant(str)
print(var_str)

But I am not sure how to project the mutation if the sequence is simply a string. In the example from their paper they use hg19 as a reference and I want to inject a short sequence instead:

import hgvs.variantmapper
vm = hgvs.assemblymapper.AssemblyMapper(
    hdp, assembly_name='GRCh37', alt_aln_method='splign')
var_g = vm.c_to_g(var_c1)
var_g
0x90
  • 1,417
  • 6
  • 17

0 Answers0