Summary 1

PF00399 - Yeast PIR protein repeat

PF00399 Protein family information

There are different architecture domains, in which not annotated but well predicted regions were found, new annotations have been created on N-term and C-term.

https://www.ebi.ac.uk/interpro/protein/UniProt/A0A0L8VNF4/alphafold/ (C-term)

https://www.ebi.ac.uk/interpro/protein/UniProt/A0A0B4HCL9/alphafold/ (N-term)

We found a group of proteins in which the repeat region protein is in most of the protein length

A0A072PHP4

A0A072PHP4 Interpro sequence information

Sequence:

>tr|A0A072PHP4|A0A072PHP4_9EURO Uncharacterized protein OS=Exophiala aquamarina CBS 119918 OX=1182545 GN=A1O9_04500 PE=4 SV=1
MASLRYVPQCKGLPACQLGSNIAPTGAPVSQISDGQPQAPTGAPVSQISDGQPQAPTGAPVSQISDGQPQAPTGAPV
SQISDGQPQAPTGAPVTQISDGQPQAPTGVPVTQISDGQPQAPTGAPVSQISDGQVQAATGTSAAPAAYTGAAHRNG
ISGGLAIAGAIAGAVILI

MRF results:

Region 1: 21 - 132,   16      aa length,      7 units

NIAPTGAPVSQISDGQ
PQAPTGAPVSQISDGQ
PQAPTGAPVSQISDGQ
PQAPTGAPVSQISDGQ
PQAPTGAPVTQISDGQ
PQAPTGVPVTQISDGQ
PQAPTGAPVSQISDGQ

TAPAS results:

_images/PF0399_A0A072PHP4.png

PF00402 - Calponin family repeat

PF00402 Protein family information

There are different architecture domains, we evaluate the ones with more presence of the repeat region in the protein sequence. We have found that there is an amyloid region

A0A183PNT2

A0A183PNT2 Interpro sequence information

A0A183PNT2 AlphafoldBDB sequence information

Sequence:

>tr|A0A183PNT2|A0A183PNT2_9TREM Calponin OS=Schistosoma mattheei OX=31246 GN=SMTD_LOCUS16018 PE=3 SV=1
CQRNGFQGPVCGPKPTYSQPRQWTEEKLRAGEGIIGLQAGTNKLASQKGMSFGAQRHIAD
IRCAAVLSLQMGTNKFASQKGMSFSNQRHIADIKCDDLSQEGKSVINLQMGTNQFASQKG
MRIGSSRHIADIRCDDISKEGQNVISLQYGTNKLASQKGMRMGGQRHIADIRCDNLSADG
ASVIGLQMGLPQNQVASQAGMSFGAHRHINDSH

MRF results:

Region 1: 37-185, 39 aa length,       4 units
LQAGTNKLASQKGMSFGAQRHIADIRC-------AAVLS
LQMGTNKFASQKGMSFSNQRHIADIKCDDLSQEGKSVIN
LQMGTNQFASQKGMRIGSSRHIADIRCDDISKEGQNVIS
LQYGTNKLASQKGMRMGGQRHIADIRCDNLSADGASVIG

TAPAS results:

_images/PF00402_A0A183PNT2.png

A0A077YXR0

A0A077YXR0 Interpro sequence information

A0A077YXR0 AlphafoldDB sequence information

Sequence:

>tr|A0A077YXR0|A0A077YXR0_TRITR Calponin domain containing protein OS=Trichuris trichiura OX=36087 GN=TTRE_0000115401 PE=3 SV=1
MSDAENEQTDQDGQDQEDMEELDQEAIEEAHKRREHARAEREEVANLAAFGKPSALPKEK
LMRSEGIIPIQSGTNKYASQKGMTGFGRPRDVIDKVKCENLKPIEDESKIQSLRDVLPLQ
SGTNKFASQKGMTGFGCPRDVINKTKGTGGTGEIEEEKAKATDGVIPLQAGTNKLASQAG
MTGFGMPRSVLHRFNPDQDRQSQGFVHLQAGTNKLATQQGMTSFGSPRTNVTKYKDSQRG
EMANDESVIPRQTTGYKEGANQAGMTGFGMPRNTTIMQLSRQEQKSQGLIPYQMGINWGD
SQAGKTGFGMPRQVFTNYTDDIRGELPEELARMPDVPFWSGMEKLASQAGMTAMGMPRDV
KGTYLRRLW

MRF results:

Region 1: 45 - 348, 67 aa length,     7 units
ANLAAFGKP-SALPK------------E---------K---LMRSEGI-----IPIQ-SGTNKYASQ
KGMTGFGRPRDVIDKVKCENLKPIEDES---------K---IQSLRDV-----LPLQ-SGTNKFASQ
KGMTGFGCPRDVINKTK---------GTGGTGEIEEEK---AKATDGV-----IPLQ-AGTNKLASQ
AGMTGFGMPRSVLHRFN---------PD---------Q---DRQSQGF-----VHLQ-AGTNKLATQ
QGMTSFGSPRTNVTKYK---------DS---------QRGEMANDESV-----IPRQTTGYKEGANQ
AGMTGFGMPRNTTIMQL---------SR---------Q---EQKSQGL-----IPYQ-MGINWGDSQ
AGKTGFGMPRQVFTNYT---------DDI--------R---GELPEELARMPDVPFW-SGMEKLASQ


Region 2: 15 - 29,    8 aa length,    2 units
DQEDMEEL
DQEAIEE-

TAPAS results:

_images/PF00402_A0A077YXR0.png

P37397

P37397 Interpro sequence information

P37397 AlphafoldBDB sequence information

Sequence:

>sp|P37397|CNN3_RAT Calponin-3 OS=Rattus norvegicus OX=10116 GN=Cnn3 PE=1 SV=1
MTHFNKGPSYGLSAEVKNKIASKYDQQAEEDLRNWIEEVTGMGIGTNFQLGLKDGIILCE
LINKLQPGSVKKVNESSLNWPQLENIGNFIKAIQAYGMKPHDIFEANDLFENGNMTQVQT
TLVALAGLAKTKGFHTTIDIGVKYAEKQTRRFDEGKLKAGQSVIGLQMGTNKCASQAGMT
AYGTRRHLYDPKMQTDKPFDQTTISLQMGTNKGASQAGMSAPGTRRDIYDQKLTLQPVDN
STISLQMGTNKVASQKGMSVYGLGRQVYDPKYCAAPTEPVIHNGSQGTGTNGSEISDSDY
QAEYPDEYHGEYPDEYPREYQYGDDQGIDY

MRF results:

Region 1: 5 - 324,    100  aa length, 4 units

NKGPSY-GLSAE-VK-NKIASKYD-----QQA-EEDLRNWIEEVT-GMGIGTN------FQLGLKDGIILCEL--IN--KLQPGSVKKVNESSLNWPQLE
NIGNFIKAIQAYGMKPHDI---FEANDLFENG-NM---TQVQTTLVAL-AGLAKTKGFHTTIDIGVKYAEKQTRRFDEGKLKA------GQSVIGLQMGT
NKCASQAGMTAYGTR-RHL---YD-----PKMQTD---KPFDQTTISLQMGTNKGA---SQAGMSAPGTRRDI--YDQ-KLTLQPV---DNSTISLQMGT
NKVASQKGMSVYGLG-RQV---YD-----PKY-CA---APTEPVIHNGSQGTGTNG---SE--ISDSDYQAEY--PDE-YHGEYP----DEYPREYQYGD


Region 2: 303 - 322,  4       aa length,      5 units
EYPD
EYHG
EYPD
EYPR
EYQY

TAPAS results:

_images/PF00402_P37397.png

PF00414 - Neuraxin and MAP1B repeat

PF00414 Protein family information

There are different architecture domains, in which not annotated but well predicted regions were found, new annotations have been created on N-tern and C-term .*

A0A7J8F152 c-term

A0A7J8IZT6 c-term

A0A1V4L0S5 c-term

A0A834QI41 N-term

The protein has usually more than 2000 aa, we tried to predict the structure in the cluster

We found a group of proteins in which the protein is smaller than the rest, and a structure prediction was made

A0A1V4L0S5

A0A1V4L0S5 Interpro sequence information

A0A1V4L0S5 AlphafoldDB sequence information

Sequence:

>tr|A0A1V4L0S5|A0A1V4L0S5_PATFA Microtubule-associated protein 1B OS=Patagioenas fasciata monilis OX=372326 GN=MAP1B PE=4 SV=1
MSISEGTVSDKSATPVDEVVAEDTYSHIEGVASVSTASVATSSFPEPTTDDVSPSLHAEV
GSPHSTEVDDSLSVSVVQTPTTFQETEMSPSKEECPRPMSISPPDFSPKTAKSRTLVHDH
RSPEQSTMSVEFGQESPEQSLAMDFSRQSPEYPTLGTSMQHISENGPTEVDYSPSDIQEP
TYARKISPVEQSSYSQEKDISEIISVSQIEASSSTSSAHTPSQVTSPLPEETFSGVVPPT
DMSLHSFTSEKVQSLGEKLSPKSDLSPLTPRESSPLYSPSFPDSPPEITGAVSASHTPSL
SLQMSSVTAFGYQESLTKHSPEPLLSPEKEDSEKSSRSPEDLSYSYEATEKTTRSPEDIS
YSYEADGKPTRSLQTTVYSYETTGKTTRSPEVADYSYEKIAKDMRTSETTDYSYEMPGKT
TRSPEVMDYSYEMTGKTTRSPEAKDYSYETTGKTIKSSEATDYAYEITGKSTKSPEATDY
SYERIGKATRSPDTMDYSYETTGKSTKSPEAISPCYETTGRTTMSPEAVAYSYETTEKVS
SSPEVTDYSFETTGRATRSPKATSYSYEATAHFTPGKSLAESRQDVDLCLVSSCEYKHPK
TELSPSFINPNPLEWFASEEQPQDQEKPLTQSGGAQPPSGGKQQGRQCDETPPTSVSESA
PSQTDSDVPPETEECPSITADANIDSEDESETIPTDKTITYKHIDPPPVPMQDRSPSPRH
PDVSMVDPEALPVDQNLGKSLKKDLKEKTKTKKQGTKTKSSSPVKKSDGKSKQGASPKPA
TKESLDKISKTVSSKKKESVEKATKNISTPEVKSRVEEKDKDTKNAANTTTSKSAKTATP
GPGNTKVAKSTAVPPGPPVYLDLVYIPNHSNSKNVDVEFFKRVRSSYYVVSGNDAAAEEP
SRAVLDSLLEGKAQWESNLQVTLIPTHDSEVMREWYQETHEKQQDLNIMVLASSSTVVMQ
DESFPACKIEL

MRF results:

Region 1:337 - 574 ,17  aa length,    14 units
RSPEDLSYSYEATEKTT
RSPEDISYSYEADGKPT
RSLQTTVYSYETTGKTT
RSPEVADYSYEKIAKDM
RTSETTDYSYEMPGKTT
RSPEVMDYSYEMTGKTT
RSPEAKDYSYETTGKTI
KSSEATDYAYEITGKST
KSPEATDYSYERIGKAT
RSPDTMDYSYETTGKST
KSPEAISPCYETTGRTT
MSPEAVAYSYETTEKVS
SSPEVTDYSFETTGRAT
RSPKATSYSYEATAHFT

Region 2:705 -807,    56  aa length,2 units
DPPPVPMQDRSPSPRHPDVSMVDPEAL-PVDQNLGKSLKKDLKEKTKTKKQGTKTK
SSSPVKKSD-GKSKQGASPKPATKESLDKISKTVSSKKKESVEKATKN-------I

TAPAS results:

_images/PF00414_A0A1V4L0S5.png

A0A250Y8D3

A0A250Y8D3 Interpro sequence information

A0A250Y8D3 AlphafoldDB sequence information

Sequence:

>A0A250Y8D3 1-2341
MITDAARHKLLVLTGQCFENTGELILQSGSFSFQNFIEIFTDQEIGELLSTTHPANKASLTLFCPEEGDWKNSNLDRHNL
QDFINIKLNSASILPEMEGLSEFTEYLSESVEVPSPFDILEPPTSGGFLKLSKPCCYIFPGGRGDSALFAVNGFNMLING
GSERKSCFWKLIRHLDRVDSILLTHIGDDNLPGINSVLQRKIAELEEEQSQGSTTNSDWMKNLISPDLGVVFLNVPENLK
NPEPNIKMKRGIEEACFTLQYLTKLSMKPEPLFRSVGNTIDPVILFQKMGVGKLEMYVLNPVKNSKEMQYFMQQWTGTNK
DKAELILPNGQEVDIPIPYLTSVSSLIVWHPANPAEKIIRVLFPGNSTQYNILEGLEKLKHLDFLKQPLATQKDLTGQVP
TPTVKQVKLKQRADSRESLKPAAKPLPSKSVRKDSKEEAPDVSKANLVEKPPKVESKEKVIVKKDKPVKTETKPPVTEKE
VPSKEEQPPAKVEVPEKPATDVKPKITKEKVVKKETKAKVEEKKEEKEKPKKEVAKKEEKTPVKKEEKPKKEEVKKEVKK
EIKKEEKKEFKKEVKKETPMKEAKKEIKKEEKKEVKKEEKEPKKEVKKLSKDTKKTSTPLSDTKKPAALKPKVPKKEEPV
KKESVTAGKPKEKGKIKVVKKESKPTEAAAAAAIGTVAATAAVAGIVAAGPAKELEAERSLMSSPEDLTKDFEELKAEEI
DVAKDIKPQLELVDDEEKLKETESVEAYVIQKETEVIKGPAESPDEGITTTEGEGECEQTPEELEPVEKQAVDDIEKFED
EGAGFEESSETGDYEEKAETEEAEEPEEDGEENVCESTSKLSPTEDEESGKAEADVHIKEKRESVASADDRAEEDMEEGV
EKGEAEQSEEEGEEDKAEDAREEEYEPEKAEAEDYVRAVVDKAAEAGGTEDQYGFPTMPPKQPGAQSPGREPASSIHDET
LPGGSESEATASDEENREDQPEEFTATSGYTQSTIEISSEPTPMDEMSTPRDVMSDETNNEETESPSQEFVNITKYESSL
YSQEFSKPVVASFNGLSDGSKTDATEGKDYSATASTISPPSSMEEDKFSKSALRDAYCSEEKAEKASAMLDIKGTVSPVS
DERLSPAKSPSLSPSPPSPIEKTPLGERSVNFSLTPNEIKVSTEAEAVSVSPEVTQEVVEEHCASPEEKTLEVVSPSQSV
TGSAGHTPYYQSPTEEKSSHLPTEVTGKPQAVPVSFEFGDAKDESERASISPMDEPVPDSESPIEKVLSPLRSPPLFGSE
SAYESFLSADGTAPERCTESPFEGKDGKPSSPDQISPISEMTSTGLYQDEREGKSTDFIPIKEDFGPEKKSDDMEAMGAQ
PALALDERKLGGDVSPTQIDVSQFGSFKEDTKMSISEGTVSDKSATPVDEGIAEDTHSHMEGVASVSTASVATSSFPEPT
TDDVSPSLHAEVGSPHSTEVDDSLSVSVVQTPTTFQETEMSPSKEECPRPMSISPPDFSPKTAKSRTPVQDHRSEQSSMS
IEFGQESPEHSLAMDFSRQSPDHSTVGAGVLHITENGPTEVDYSPSDMQDSSLSHKIPPTEEPSYTQDNDLSEFISVSQV
EASPSTSSAHTPSQIASPLQEDTLSDVAPPRDMSLYASLASEKVQSLEGEKLSPKSDISPLTPRESSPLYSPEFSDSTSA
VKESAAACHTSSSPPGDATSAEPYGFRASMLFDTMQHHLALNRDMTASGLEDSGGKTPGDFSYAYQKSEKTTRSPDEEDY
DYESYEKSTRTPDMGSYYYEKTEQTIKSPCDSGYLYETVEKTTKTPEDGGYACEITEKTTRTPEEGGYSYEVTEKTTRTP
EVGGYSYEKTERSRKLLDDISNGYDDSEDAAHTFGDSSYSYETTEKLSSFPESESYSYETSTKTTRSPESAAYCYETTEK
ITKTPQASTYSYETSDRCYTTEKKSPSEARQDVDLCLVSSCEYKHPKTELSPSFINPNPLEWFASEDPIEESEKPLTQSG
GAPPPPGEKQQGRQCDETPPTSVSESAPSQTDSDVPPETEECPSITADANIDSEDESETIPTDKTVTYKHMDPPPAPLQD
RSPSPRHPDVSMVDPEALAIEQNLGKALKKDLKEKTKTKKPGTKTKSSSPVKKADGKPKPLAASPKPGALKESSDKVSRV
ASPKKKDSVEKATKTTTTPEVKATRGEEKDKETKNAANASTSKSVKTAAAGPGTTKTAKSSAVPPGLPVYLDLCYIPNHS
NSKNVDVEFFKRVRSSYYVVSGNDPAAEEPSRAVLDALLEGKAQWGSNMQVTLIPTHDSEVMREWYQETHEKQQDLNIMV
LASSSTVVMQDESFPACKIEL

MRF results:

_images/PF00414_A0A250Y8D3_MRF.png

TAPAS results:

_images/PF00414_A0A250Y8D3.png

PF00624 - Flocculin repeat

PF00624 Protein family information

There are some cases in which the predictor identifies a beta flat solenoids with low model confidence (A7TTI5), but also cases where the prediction of the unit is confident to very high (A0A1Q3ALI5)

A7TTI5

A7TTI5 Interpro sequence information

A7TTI5 AlphafoldDB sequence information

Sequence:

>tr|A7TTI5|A7TTI5_VANPO Uncharacterized protein (Fragment) OS=Vanderwaltozyma polyspora (strain ATCC 22028 / DSM 70294 / BCRC 21397 / CBS 2163 / NBRC 10782 / NRRL Y-8283 / UCD 57-17) OX=436907 GN=Kpol_249p1 PE=4 SV=1
MKHFTRLLTFLNFVLFACSLSNHENNQALSLSELIDHEAILEGNTALVGDNPKSKLHSEK
KLLSIPLNINQNESIYTSVPSTKNQTYFISDHLATNVKNVDKKDITIKSNDISIITIRTQ
NLNILAETTSTELTWVTGHNGIESKLFIYYIEYPVDHFSFTFIRPMTVNNLEKRLVENED
ISSSSIVKPIVTESTKTIVNTITKSDNALVVETTYIVYSRSPYTSTNSKKTYWTGSYTTT
TKTEITTYIGTNGGVTTETIYFIATPTTAFETTSYTYWTGSTANTLSTVTTTFTGTDGIE
TTETIYIVETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTDGIETTETIYIVETPTTAF
ETTSYTYWTGSTANTLSTVTTTFTGTDGIETTETIYIVETPTTAFETTSFTYWTGSTANT
LSTVTTTFTGTDGIETTETIYIVETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTDGIE
TTETIYIVETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTDGIETTETIYIVETPTTAF
ETTSYTYWTGSTANTLSTVTTTFTGTDGIETTETIYIVETPTTAFETTSYTYWTGSTANT
LSTVTTTFTGTDGIETTETIYIVETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTDGIE
TTETIYIVETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTDGIETTETIYIVETPTTAF
ETTSYTYWTGSTANTLSTVTTTFTGTDGIETTETIYIVETPTTAFETTSFTYWTGSTANT
LSTVTTTFTGTDGIETTETIYIVETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTDGIE
TTETIYIVETPTTAFETTSFTYWTGSTANTLSTVTTTFTGTDGIETTETIYIVETPTTAF
ETTSYTYWTGSTANTLSTVTTTFTGTDGIETTETIYIVETPTTAFETTSYTYWTGSTANT
LSTVTTTFTGTDGIETTETIYIVETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTDGIE
TTETIYIVETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTDGIETTETIYIVETPTTAF
ETTSFTYWTGSTANTLSTVTTTFTGTDGIETTETIYIVETPTTAFETTSYTYWTGSTANT
LSTVTTTFTGTDGIETTETIYIVETPTTAFETTSFTYWTGSTANTLSTVTTTFTGTDGIE
TTETIYIV

MRF results:

Region 1: 207-1197, 60 aa length, 47 units

NALVVETTYIVYSRSPYTSTNSKK-TYWTGSYTTTTKTEITTYIGTN
GGVTTETIYFI--ATPTTAFETTSYTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSFTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSFTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSFTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSFTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSYTYWTGSTANTLSTVTTTFTGTD
GIETTETIYIV--ETPTTAFETTSFTYWTGSTANTLSTVTTTFTGTD

TAPASS results:

_images/PF00624_A7TTI5.png

A0A1Q3ALI5

Repeat units annotated: 207-307, 314-353

A0A1Q3ALI5 Interpro sequence information

A0A1Q3ALI5 AlphafoldDB sequence information

Sequence:

>tr|A0A1Q3ALI5|A0A1Q3ALI5_ZYGRO PA14 domain-containing protein (Fragment) OS=Zygosaccharomyces rouxii OX=4956 GN=ZYGR_0BQ00100 PE=4 SV=1
MVSHKSIFQWLLWFSVLGITKALAATACLPANGAQSGFKANFFQYNYGDMTTLRQPSFIA
GGYAKRQLLGTQNNVNNILIAYGMECQLSNGEVVTPTEPWNFDYSQCKNKRYFSQRHNGT
IFGFELTATNFTVELTGYLLAPQTGTYTFTFDHVDDSAILNFGEGIAFDCCNQDAAANGN
TQFSINAIKPDYGPTAHMNYSVDLVGNYYYPMRIVYTNRHVFGWLFTTLTLPDGTNIDND
FTGYVYSFVSEPEQPNCTVTSPLPFVTSTSTTPWTGSFTSTYSTQTNVNTDSDGDNAGTV
IIDVETPTTPPVLTTEYTGYSGSETSTYSTESTWVTGTDGKTTPETIYHVETPTIPPV

MRF results:

Region 1: 326-334,3 aa length,3 units, regex_SX3 0.86
STY
STE
STW

TAPAS results:

_images/PF00624_A0A1Q3ALI5.png

PF00880 - Nebulin repeat

PF00880 Protein family information

In the literature we can observe a very high model confidence (https://www.mpg.de/18283745/nebulin-no-longer-nebulous)

A0A0S7IV57

A0A0S7IV57 Interpro sequence information

A0A0S7IV57 AlphafoldDB sequence information

Sequence:

>tr|A0A0S7IV57|A0A0S7IV57_9TELE NEBU (Fragment) OS=Poeciliopsis prolifica OX=188132 GN=NEBU PE=4 SV=1
SNDVVQARLAYDLQSDAVYKADLKWLQGLGWVPIGSLDVEKAKKAAEVLSDRKYRQHPST
VKFTSPIDAMNIVLAKSNAMTMNKRLYTEAWENEKTKLHIKPDTPEIVLSQQNAINMSKK
LYKQGFEETISKGYFLPPDAVSVKAAKTSRDIISDYKYKTG

MRF results:

Region 1: 3-141, 43 aa length, 4 units
DVVQARLAYDLQSDA--VYK---A---DLKWLQGLGWVPIGSL
DVEKAKKAAEVL--SDRKYR---Q---HPSTVKFTS--PIDAM
NIVLAKSNAMTMN--KRLYTEAWE---NEKTKLHIK--P-DTP
EIVLSQQNAINM--SKKLYK---QGFEETISKGYFL--PPDAV

TAPAS results:

_images/PF00880_A0A0S7IV57.png

PF00904 - Involucrin repeat

PF00904 Protein family information

P14591

P14591 Interpro sequence information

P14591 AlphafoldDB sequence information

Sequence:

>sp|P14591|INVO_PANPA Involucrin OS=Pan paniscus OX=9597 GN=IVL PE=2 SV=1
MSQQHTLPVTLSPALSQELLKTVPPPVNTQQEQMKQPTPLPPPCQKMPVELPVEVPSKQE
EKHMTAVKGLPEQECEQQQQEPQEQELQQQHWEQHEEYQKAENPEQQLKQEKAQRDPQLN
KQLEEEKKLLDQQLDQELVKRDEQLGMKKEQLLELPEQQEGHLKHLEQREGQLELPEQQE
GQLKHLEQQKGQLELPEQQEGQLELPEQQEGQLKHLEQQEGQLKHLEHQEGQLEVPEEQV
GQLKYLEQQEGQLKHLDQQEKQPELPEQQVGQLKHLEQQEGQPKHLEQQKGQLEHLEEQE
GQLKHLEQQEGQLEHLEHQEGQLGLPEQQVQQLKQLEKEEGQPKHLEEEEGQLKHLVQQE
GQLEHLVQQEGQLEHLVQQEGQLEQQEGQVEHLEQQVEHLEQLGQLKHLEEQEGQLKHLE
QQQGQLGVPEQVGQPKNLEQEEKQLELPEQQEGQLKHLEKQEAQLELPEQQVGQPKHLEQ
QEKQLEPPEQQDGQLKHLEQQEGQLKDLEQQKGQLEQPVFAPAPGQVQDIQSALPTKGEV
LLPLEHQQQKQEVQWPPKHK

MRF results:

_images/PF00904_P14591_MRF.png

TAPAS results:

_images/PF00904_P14591.png

B4DWR5

B4DWR5 Interpro sequence information

B4DWR5 AlphafoldDB sequence information

Sequence:

>B4DWR5 1-449
MKKEQLLELPEQQEGHLKHLEQQEGQLKHPEQQEGQLELPEQQEGQLELPEQQEGQLELPEQQEGQLELPEQQEGQLELP
EQQEGQLELPEQQEGQLELSEQQEGQLELSEQQEGQLELSEQQEGQLKHLEHQEGQLEVPEEQMGQLKYLEQQEGQLKHL
DQQEKQPELPEQQMGQLKHLEQQEGQPKHLEQQEGQLEQLEEQEGQLKHLEQQEGQLEHLEHQEGQLGLPEQQVLQLKQL
EKQQGQPKHLEEEEGQLKHLVQQEGQLKHLVQQEGQLEQQERQVEHLEQQVGQLKHLEEQEGQLKHLEQQQGQLEVPEQQ
VGQPKNLEQEEKQLELPEQQEGQLKHLEKQEAQLELPEQQVGQPKHLEQQEKHLEHPEQQDGQLKHLEQQEGQLKDLEQQ
KGQLEQPVFAPAPGQVQDIQPALPTKGEVLLPVEHQQQKQEVQWPPKHK

MRF results:

Region 1:46 -217,     20 aa length, 16 units
-------QLELPEQQEG---
-------QLELPEQQEG---
-------QLELPEQQEG---
-------QLELPEQQEG---
-------QLELPEQQEG---
-------QLELSEQQEG---
-------QLELSEQQEG---
-------QLELSEQQEG---
-------QLKHLEHQEG---
-------QLEVPEEQMG---
-------QLKYLEQQEG---
-------QLKHLDQQEKQPE
LPEQQMGQLKHLE-------
---QQEGQPKHLE-------
---QQEGQLEQLEEQE----
------GQLKHLEQQEGQL-

Region 2: 222 - 398,  20 aa length, 9 units
HQEGQLGLPEQQVLQLKQLE
KQQGQPKHLEEEEGQLKHLV
QQEGQLKHLVQQEGQ---LE
QQERQVEHLEQQVGQLKHLE
EQEGQLKHLEQQQGQLEVPE
QQVGQPKNLEQEEKQLELPE
QQEGQLKHLEKQEAQLELPE
QQVGQPKHLEQQEKHLEHPE
QQDGQLKHLEQQEGQLKDLE

TAPAS results:

_images/PF00904_B4DWR5.png

AlphaFold results trimer:

_images/PF00904_B4DWR5_trimer.png

PF09528 - Ehrlichia_rpt

PF09528 Protein family information

T1L1A4

T1L1A4 Interpro sequence information

T1L1A4 AlphafoldDB sequence information

Sequence:

>tr|T1L1A4|T1L1A4_TETUR Uncharacterized protein OS=Tetranychus urticae OX=32264 GN=107369337 PE=4 SV=1
MRFTIVLALCFIGAASASSLNKRSFLDDIQNNTQNAFHAFEQFGQTFNEKVQEALKNLLS
AFGNKNSSAEASVVVEKRATNPLQLINDLDDPAQFAQTFLKVLLDLATGQGRRKRDIAED
LKKFSEEAKHNAEEALKKLFSFLEQFKSQSSESTEASVVVEKRATNPLQLINDLDDPAQF
AQTLLKVLADIATGQGRRKRDIAEDLKKFSDEAKHNAEEALKKLFSFLEQFKPQSSESTE
APVVVEKRATNPLVLFNDLSQQDLGKFAQDFLKVLADIATAQG

MRF results:

Region 1:35 - 283 ,   99  aa length, 3 units

NAFHAFEQFGQTFNEKVQEALKNLLSAFGNKNS-SAEASVVVEKRATNPLQLINDL--DDPAQFAQTFLKVLLDLATGQGRRKRDIAEDLKKFSEEAKH
NA------------EEALKKLFSFLEQFKSQSSESTEASVVVEKRATNPLQLINDL--DDPAQFAQTLLKVLADIATGQGRRKRDIAEDLKKFSDEAKH
NA------------EEALKKLFSFLEQFKPQSSESTEAPVVVEKRATNPLVLFNDLSQQDLGKFAQDFLKVLADIATAQG-------------------

TAPAS results:

_images/PF00904_P14591.png

Alpha Fold results of cutted region:

_images/T1L1A4_cutted.png

CrossBeta results:

_images/T1L1A4_predictor.png

AlphaFold results trimer:

_images/PF09528_T1L1A4_trimmer.png

Q6W7F7

A dimmer model has been tried, no representative model conservation was obtained

Q6W7F7 Interpro sequence information

Q6W7F7 AlphafoldDB sequence information

Sequence:

>tr|Q6W7F7|Q6W7F7_EHRCH 120 kDa immunodominant surface protein (Fragment) OS=Ehrlichia chaffeensis OX=945 PE=4 SV=1
VSQPSLEPFVAESEVSKVEQEETNPEVLIKDLQDVASHESGVSDQPAQVVTERENEIESH
QGETEKESGITESHQKEDEIVSQSSSEPFVAESEVSKVEQEKTNPEVLIKDLQDVASHES
GVSDQPAQVVTERENEIESHQGETEKESGITESHQKEDEIVSQSSSEPFVAESEVSKVEQ
EETNPEVLIKDLQDVASHESGVSDQPAQVVTERENEIESHQGETEKESGITESHQKEDEI
VSQSSSEPFVAESEVSKVEQEETNPEVLIKDLQDVASHESGVSDQPAQVVTERESEIESH
QGETEKESGITESNQKEDEIVSQPSSEPFVAESEVSKVEQEETNPEVLIKDLQDVASHES
GVSDQPAQVVTERESEIESHQGETEKESGITESHQKEDEIVSQPSSEPFVAESEVSKVEQ
EETNPEILVEDLPLGQV

MRF results:

Region 1:2-401,80  aa length, 5 units

SQPSLEPFVAESEVSKVEQEETNPEVLIKDLQDVASHESGVSDQPAQVVTERENEIESHQGETEKESGITESHQKEDEIV
SQSSSEPFVAESEVSKVEQEKTNPEVLIKDLQDVASHESGVSDQPAQVVTERENEIESHQGETEKESGITESHQKEDEIV
SQSSSEPFVAESEVSKVEQEETNPEVLIKDLQDVASHESGVSDQPAQVVTERENEIESHQGETEKESGITESHQKEDEIV
SQSSSEPFVAESEVSKVEQEETNPEVLIKDLQDVASHESGVSDQPAQVVTERESEIESHQGETEKESGITESNQKEDEIV
SQPSSEPFVAESEVSKVEQEETNPEVLIKDLQDVASHESGVSDQPAQVVTERESEIESHQGETEKESGITESHQKEDEIV

TAPAS results:

_images/PF09528_Q6W7F7.png

PF10529 - Hist_rich_Ca-bd

PF10529 Protein family information

We found a group of proteins in which the repeat region protein is in most of the protein length

P23327

P23327 Interpro sequence information

Sequence:

>P23327 1-699
MGHHRPWLHASVLWAGVASLLLPPAMTQQLRGDGLGFRNRNNSTGVAGLSEEASAELRHHLHSPRDHPDENKDVSTENGH
HFWSHPDREKEDEDVSKEYGHLLPGHRSQDHKVGDEGVSGEEVFAEHGGQARGHRGHGSEDTEDSAEHRHHLPSHRSHSH
QDEDEDEVVSSEHHHHILRHGHRGHDGEDDEGEEEEEEEEEEEEASTEYGHQAHRHRGHGSEEDEDVSDGHHHHGPSHRH
QGHEEDDDDDDDDDDDDDDDDVSIEYRHQAHRHQGHGIEEDEDVSDGHHHRDPSHRHRSHEEDDNDDDDVSTEYGHQAHR
HQDHRKEEVEAVSGEHHHHVPDHRHQGHRDEEEDEDVSTERWHQGPQHVHHGLVDEEEEEEEITVQFGHYVASHQPRGHK
SDEEDFQDEYKTEVPHHHHHRVPREEDEEVSAELGHQAPSHRQSHQDEETGHGQRGSIKEMSHHPPGHTVVKDRSHLRKD
DSEEEKEKEEDPGSHEEDDESSEQGEKGTHHGSRDQEDEEDEEEGHGLSLNQEEEEEEDKEEEEEEEDEERREERAEVGA
PLSPDHSEEEEEEEEGLEEDEPRFTIIPNPLDRREEAGGASSEEESGEDTGPQDAQEYGNYQPGSLCGYCSFCNRCTECE
SCHCDEENMGEHCDQCQHCQFCYLCPLVCETVCAPGSYVDYFSSSLYQALADMLETPEP

MRF results:

_images/PF10529_P23327_MRF.png

TAPAS results:

_images/PF10529_P23327.png

B4DUM3

B4DUM3 Interpro sequence information

Sequence:

>B4DUM3 1-454
MGHHRPWLHASVLWAGVASLLLPPAMTQQLRGDGLGFRNRNKDVSTENGHHFWSHPDREKEDEDVAKEYGHLLPGHRSQD
HKVGDEGVSGEEVFAEHGGQARGHRGHGSEDTEDSAEHRHHLPSHRSHSHQDEDEDEVVSSEHHHHILRHGHRGHDGEDD
EGEEEEEEEEEEEEEASTEYGHQAHRHRGHGSEEDEDVSDGHHHHGPSHRHQGHEEDDDDDDDDDDDDDDDVSIEYRHQA
HRHQGHGIEEDEDVSDGHHHRDPSHRHRSHEEDDNDDDDVSTEYGHQAHRHQDHRKEEVEAVSGEHHHHVPDHRHQGHRD
EEEDEDVSTERWHQGPQHVHHGLVDEEEEEEEITVQFGHYVASHQPRGHKSDEEDFQDEYKTEVPHHHHHRVPREEDEEV
SAELGHQAPSHRQSHQDEETGHGQRGSIKEMSHHPPGHTVVKDRSHLRKDDSEE

MRF results:

_images/PF10529_B4DUM3_MRF.png

TAPAS results:

_images/PF10529_B4DUM3.png

AlphaFold trimer results:

_images/PF10529_B4DUM3_trimer.png

PF12778 - PXPV repeat (3 copies)

PF12778 Protein family information

Q127N3

Q127N3 Interpro sequence information

Sequence:

>Q127N3 1-147
MKTAIKTNRSVATAGAAAALAVAALGFAGAAQARDDVYWSVGVGSPGVSVNVGNAYPVYTPAPVYVQPAPVYYQPAPV
YVRPAPVYYQPAPVFVQPRPYYGPPQVVYVQPGNRHGWHKKHGRDHDDDRGYRGGYGYRQGYAPVYYQR

MRF results:

Region 1: 61  97      9       aa length,      5 units

PAPVYVQ--
PAPVYYQ--
PAPVYVR--
PAPVYYQ--
PAPVFVQPR


Region 2:129-140,     6 aa length,    2 units

GYRGGY
GYRQGY

TAPAS results:

_images/PF12778_Q127N3.png

AlphaFold trimer results:

_images/PF12778_Q127N3_trimer.png

PF14585 - CagY_I

PF14585 Protein family information

There is already a structure, Xmer to be stable

6odi

https://www.ebi.ac.uk/interpro/structure/PDB/6odi/#table

_images/6odi.png

PF14912 - Testicular haploid expressed repeat

PF14912 Protein family information

There is already a structure, Xmer to be stable

8snb

https://www.ebi.ac.uk/interpro/entry/pfam/PF14912/structure/PDB/#table

_images/8snb.png

PF15287 - KRBA1

PF15287 Protein family information

A0A452ID02

A0A452ID02 Interpro sequence information

Sequence:

>A0A452ID02 1-1256
MEENYQLLISLGQPVPTLALLALAVESEAAGSQIRGVSGEPELASDSSSSEGEELAFPEDDPDVGGFWDSRQTEEDGCPT
GDEEGAHQGSLHLSALMKLVKEIPEFLLGNLKAPVEPAEAADSEAEMGSERAYADVKPEVTPETPPPLDLENCLVEASVN
RPNHPDTPSSCLSTSSTERAPLRRLYAEVAPENSPLQGLLNCLKEIPVHRPRHPNMQSPGAQGDVEHKGVVGEVKSLCAA
AGTAENSPLQGLLNCLKEIIVHKPHHPHPSPCKSAKGSTRGDSGKRRLESEDGSSSVEVKTEVTAGDPQDPGLESCPSAR
SVSKASPPAMPASRSPRNRAEGEAGGRSLFGEGAVKREGAAESSLAQGLLSCWRDVISPCHPARPACSSPTSSAHWRTEQ
RGLEPGPWRSPGEEAVLEDSPLRGLENCLKDIAVTSPCCSHLPASRSAQRALGERPGRGAGSLSAEEMMPRTSPLHDLAN
CLQETPVSTSWASRVLARNAGADVSSRRPATVTVRSCCGEDISTETSPLRGLENCLRDIPVNSPHMPASISLTSATQRDM
GQRRPGAGTRRSLREDINAENSPLQGLENCLKDIPVPCHNQSNTPSSRSSLNSSPAPQGRLETAGWPVKTEGSVSEVTPP
LQGLENCLKDIPMVRLRGSRETPSSFCTSKAPGEVEQRRPVPRPWRACAAELSPENSPLRGLEKCLQEISVPRPQTPSAP
ATAGVVGSRQGDTARRRPETGHWDWHGSDKRQAENLPGLEEVAAPSSCPAQTPPSSTRGDAERQEQDVHTRSSGSKDVTV
RSSPLSWLNCVKELTADRVTPSSPPACAAQGDVTLQGAESRGRTLSEEEVMPADAPRYGPATCPQGTHGTGPSRSRTPSD
TLPARDAPGHRCPKRPGTGVKRPHTEARTDAARPPSTSSCTSSEDGGQKADAEWEEFPKRHCSTAALSPWECFRWESRTP
LDLRVERSMIEAVLSEKLDRVSQDFMAMCRDVSSMQSRVAQLERDSRGWALELAALQKGNKRLSETVRRLESRCHMLENR
AHRNSLRLAGLPEGAEGGDPVAFLQRTLPTVLNLPADWPPLEIESVRRVHGGAHWDPATRPRALLFRLLRFSDKLAIMRA
VRKRTEPLTCGGAKVALFPDVCPKLCRRRGAQYAAVRRLWRAAELRLGTQPSGCCHDRARGHWEPLPSPLGRAPTADXCR
RTGEQSHQRAVTESGGLGAAGSAHPPLSLKVRLHSAPEITAPAGSGLELSRFPDCS

MRF results:

Region 1:93 - 914 ,80 aa length,      15 units
LSALMKL--VKEI--PEFLL----GNL---KAP---VEPAEAADSEAEMGSERA-YA----D-VKP---EV-TPETPP--
PLDLENC--LVEA--S---V----NRP---NHP---DTPSSCLST--SSTERAP-----LRR-LYA---EV-APENSP--
LQGLLNC--LKEI--P---V----HRP---RHP---NMQSP--GAQGDVEHK-G-VVGEVKS-LCA---AAGTAENSP--
LQGLLNC--LKEI--I---V----HKP---HHPHPSPCKSAKGSTRGDSGKRRL-ESEDGSS-SVEVKTEV-TA-GDPQD
-PGLESCPSARSV--SK--A----SPP---AMP---ASRSPRNRAEGEAGGRSLFGEGAVK----R---EG-AAESS-LA
-QGLLSC--WRDVISP---C----H-P---ARP---ACSSPTSSAHWRTEQRGL-EPGPWRS-PGE---EA-VLEDSP--
LRGLENC--LKDI--A---V----TSPCCSHLP---ASRS----AQRALGE-RP-GRGAGSL-SAE---EM-MPRTSP--
LHDLANC--LQET--P---V----STS---W-----ASRVLARNAGADVSSRRP-ATVTVRSCCGE---DI-STETSP--
LRGLENC--LRDI--P---V----NSP---HMP---ASISLTSATQRDMGQRRP-GAGTRRS-LRE---DI-NAENSP--
LQGLENC--LKDI--P---VPCH-NQS---NTP---SSRSSLNSSPAPQGRLET-AGWPVKT-EGS----V-SEVTPP--
LQGLENC--LKDI--P---MVRLRGSR---ETP---SSFC-TSKAPGEVEQRRP-VPRPWRA-CAA---EL-SPENSP--
LRGLEKC--LQEI--S---VPRP-QTP---SAP---ATAGVVGSRQGDTARRRP-ETGHWDW-HGS---DKRQAENLP--
--GLE------EV--A---A----PSS----CP---A-QTPPSSTRGDAERQEQ-DVHTRSS-GSK---DV-TVRSSPL-
--SWLNC--VKEL--T---A----DRV----TP---SS-PPACAAQGDVTLQGA-ESRGRTL-SEE---EV-MPADAPRY
--GPATC--PQGT--H---G----TGPSRSRTP---SDTLPARDAPGHRCPKRP-GTGVKRP-HTE---AR-TDAARP--

Region 2:1021-1143,   46 aa length,   3 units

KRLSETVRRLESRCHMLENRAHRNSLRLAGLPE-GAE-GGDPVAFL
QRTLPTVLNLPADWPPLEIESVR---RVHGGAH------WDPATRP
RALLFRLLRFSDKLAIM--RAVRK--RTEPLTCGGAKVALFPDVCP

TAPAS results:

_images/PF15287_A0A452ID02.png

A5PL33

A5PL33 Interpro sequence information

Sequence:

>A5PL33 1-1030
MRENYETLVSVGTAELLPLSAFLSPSEPGRAVGGGSHADEGQEPAGCGDPQGGQPRHSLHLTALVQLVKEIPEFLFGEVK
GAMDSPESESRGASLDGERASPEAAAAREPCPLRGLLSCLPDGPTSQPHLATTPTDSSCSSGPTGDGVQGSPLPIKTADK
PWPTRKEGPGALGGEPSPPTHSPSRRKSHRGQERGTSEAGISPGNSPLQGLINCLKEILVPGPRHPETSPSFLPPLPSLG
TSRLTRADLGPGSPPWAVKTEAVSGDCPLQGLLHCLKELPEAQDRHPSPSGVGNRRLQENPGAWKRGSGGPGYLLTPPPH
PDLGAGGLLSVKMENSWVQSPPGPASCQPGRQPLSPSATGDTRGVPQPSWGPEAQAASASSSPLEALEACLKGIPPNGSS
PSQLPPTSCSQNPQPGDSRSQKPELQPHRSHSEEATREPVLPLGLQSCVRDGPSRPLAPRGTPTSFSSSSSTDWDLDFGS
PVGNQGQHPGKGSPPGSSPLQGLENCLKEIPVPVLRPAWPCSSAADRGPRRAEPRNWTADKEGLRAEACESARLGQGRGE
APTRSLHLVSPQVFTSSCVPACHQRGFKDPGATRPGVWRWLPEGSAPKPSPLHCLESALRGILPVRPLRFACVGGPSPSP
SPGSSSSFSGSEGEDPRPEPDLWKPLPQERDRLPSCKPPVPLSPCPGGTPAGSSGGSPGEDPRRTEPRYCSGLGAGTAQD
PCPVSQLEKRPRVSEASRGLELGHGRPRVAAKTHERLLPQGPPELPSESPPPELPPPEAAPPVLPASSLQPPCHCGKPLQ
QELHSLGAALAEKLDRLATALAGLAQEVATMRTQVNRLGRRPQGPGPMGQASWMWTLPRGPRWAHGPGHRHLPYWRQKGP
TRPKPKILRGQGESCRAGDLQGLSRGTARRARPLPPDAPPAEPPGLHCSSSQQLLSSTPSCHAAPPAHPLLAHTGGHQSP
LPPLVPAALPLQGASPPAASADADVPTSGVAPDGIPERPKEPSSLLGGVQRALQEELWGGEHRDPRWGAH

MRF results:

Region 1: 60-717,15 aa length,        6 units

HLTALVQLVKEI-P--EFLFGEVKGA----MDSPES-ESRG--ASLD--G--E-RAS--PEAAAAREP-CP-L--RGLLSC----LPD----G----P--TSQPH-L-AT--T-PTDSSCSSG--PTGDGVQGSPLPIKTADKPWPTRKEG-PG--
-----------------ALGGEPSPP----THSPSR---RK--SHRG--Q--E-RGT--SEAGISPGN-SP-L--QGLINC----LKEILVPGPRH-P--ETSPSFL-PP--L-PSLGT-SRL--TRADLGPGSP--------PWAVKTEAVSGDC
PLQGLLHCLKEL-P--EAQD---RHP----SPSGVG-NRRL--QENP--GAWK-RGSGGPGYLLTPPP-HPDLGAGGLLSV----KME----N----SWVQSPPG-P-AS--CQPGRQPLSPS--ATGDT-RGVPQP---SWGPEAQAASA-SS-S
PLEALEACLKGI-P--PNGSSPSQLPPTSCSQNPQPGDSRSQKPELQ--P--H-RSH--SE-EATREPVLP-L---GLQSC----VRD----G----P---SRP--L-APRGT-PTSFSSSSS--TDWDLDFGSPVG-NQGQHPGKGSP---PGSS
PLQGLENCLKEI-P--V--------P----VLRPAW-PCSS--AADR--G--PRRAE--PRN-WTADK-EG-L--RA-EACESARLGQ----GRGEAP--TRSLH-LVSP--Q-VFTSSCVPACHQRGFKDPGATRPGVWRWLPEGSAPK--P--S
PLHCLESALRGILPVRPLRFACVGGP----SPSPSP-GSSS--SFSGSEG--E-DPR--PEPDLWK-P-LP-Q--E----------RD----RL---P--SCKP-----P--V-PL-SPCPGG--TPAGSSGGSPGEDPRRTEPRYCSGLG-AG-T

Region 2:636 -649, 2 aa length,       7 units
PS
PS
PS
PG
SS
SS
FS

TAPAS results:

_images/PF15287_A5PL33.png

PF15788 - DUF4705

PF15788 Protein family information

B4DF06

B4DF06 Interpro sequence information

Sequence:

>B4DF06
MLLPPGSLSRPRTFSSQPLQTKLMTHNGLFRPIPYVTAASADEATASQQPPQAQLHRYNGLFRPSSCLPAFSPGPELSQV
DLTRPRSCFFAASPGPAPASWWPLQAQPLPPVSLYSPNVCLTADSSRPASTSLWTPQAKLPTFQQLLHTQLLPPSGLFRP
SSCFTRAFPGPTFVSWQPSLARFLPVSQQPRQAQVLPHTGLSTSSLCLTVASPRPTPVPGRHLRAQNLLKSDSLVPTAAS
WWPMKAQNLLKLTCSGPAPASCQHLQAQPLPHGGFSRPTSSSWLGLQAQLLPHNSLFWPSSCPAHGGQCRPKTSSSQTLQ
AHLLLPGGINRPSFDLRTASAGPALASQGLFPGPALASWQLPQAKFLPACQQPQQAQLLPHSGPFRPNS

MRF results:

Region 1:61 - 145     ,32 aa length,  3 units

LFRPSSCLPAFSPGPE-----------LSQVD
LTRPRSCFFAASPGPAPASWWPLQAQPLPPVS
LYSPNVCLTADSSRPASTSLWTPQAKLPTFQQ

Region 2:342 - 363 , 11 aa length,    2 units
GPALASQGLFP
GPALASWQLPQ

TAPAS results:

_images/PF15788_B4DF06.png

Q6ZQT7

Q6ZQT7 Interpro sequence information

Sequence:

>Q6ZQT7 1-251
MQPGGTAGPEEAPMREAEAGPPQVGLSRPTCSLPASSPGPALPPGCVSRPDSGLPTTSLDSAPAQLPAALVDPQLPEAKL
PRPSSGLTVASPGSAPALRWHLQAPNGLRSVGSSRPSLGLPAASAGPKRPEVGLSRPSSGLPAAFAGPSRPQVGLELGLE
EQQVSLSGPSSILSAASPGAKLPRVSLSRPSSSCLPLASFSPAQPSSWLSAAFPGPAFDFWRPLQAQNLPSSGPLQARPR
PRPHSGLSTPS

MRF results:

Region 1: 111-152,    21 aa length,   2 units

VGSSRPSLGLPAASAGPKRPE
VGLSRPSSGLPAAFAGPSRPQ

TAPAS results:

_images/PF15788_Q6ZQT7.png

PF18727 - ALMS_repeat

PF18727 Protein family information

The sequences are long, more than 3500 amino acids there is no alphafold model

A0A8I3P0L2

A0A8I3P0L2 alphaFold model do not exist

A0A8I3P0L2 Interpro sequence information

Sequence:

>A0A8I3P0L2 1-4373
TYISINKFLFLGDTSKGGIAEITQSSLKPGITTTRESDTGSLLSLFPEDFPQLALRSPQEITIGQHSDTLHQQELVGSHK
TEETPKVSTVPKLDDQNTGISTVPSSSYSQRGKPSILHQQSLPDSYLAEEALKVAAVPEPTDQKTSISTVLPGSYSLGEK
HCIFYPQTLPESHLTEEAVRVSAFSGLADQKTDIPTVLPSSYSLREKHNIFYQQALPDSHLTEEAVRVSAVPGPADQKTR
IHIVLPGSHSLGEKHKIFCQQALPNSHLTKETLKVSAVPGPVEQKSVIPIVLPGSYLLGEKRNIFHPPTLPESHLTEEAV
RVSAAVPGSVDQKTGIPTVLPGSFSLGEKASIFHQQALPESHLTKEALRVSAVPGPIDQKTGIPTVLPGSYSLGENCNIF
QPQTLPDGHLTGEAVRVSTVPGPVDQKTGIPTVLSGSYSLGEKRNIFHPQTLPGIHLTEEAQRVLAVPGPADEKTGIPTV
LPGSYSLGEKRNIFHPQTLPSIHLTEEAQRVLAVPGPADEKTGIPTGLAGSYSLGEKRNIFYPQTLPQSHLTEEALKVLA
GPGPVDQKTGIPTILPGSYSLGERRNIFHPENLRDSHLTEEALRVSGVPSPADQKTDIPAGLAGSYSLGEKRNIFYPQTL
SQSHLIEEAIRVSAFPGPADQKTGIRTGLAGSYSLGEKCNIVHSETLPDNHLTEGTQRVLAVPGPVDQKTGIPTGLAGSY
SLGEKRNIFHPENLPESPLTEEALKVLAGPGPADQKTGIPIGLPGSYSLGEKHHIFHSENLPDSHLTEEAVRVSAVPSPA
DQKTGIPTVLPGSYSLGEKCNIFHPENLPDSHLTEEALKVLAVPGLADQKTGIPTVLPGSYSLGEKHHIFHTKNLADNPL
TEEAIRVSAFPGPVDQKTDIPTGFPGSYSLEEKSNIVHPEILLDSLLTEEAVRVLAVPGPDDQKADVPTGLPGSYSLGEK
CNIVQPETLPDSHLTEEAVRVSAVPGPVDQKTGIPTGLPGSYSLGEKHSIFHPEILPDNHLTEEAVRVLTVPGPPDQKTD
RPTGHPGSYSPREKHNIFYPQTLPESPRTEEALRVSAVPGPVDQKTGRPTVLPGSYSPGEKHHILHPETLPDSHLTEESL
KISTVPVPTDQRTEKIIVPSASLSQREKHVIFSQQQLSDGDLTAQVLKASVAPGPADQNIGLPTLSSSSYSLGEKHCICY
QQALLDSHLIEQAQKVAAVPRPADQKTRIPLASSTSYLQGERPHIFCQQTLPESDLTEQALKYSAPGSAEQKTGIPTLTS
TSYSHREKSSISNQQELPDSPLAEQAPKVPAVPGPAEKKSGSLSEASNFSSRREKHSIFYQQEFLGSSLIEPAQKVSPVP
GPTDQKPEIPTVTSTYSHVEKPFIFYPQGLPDSPLPEEALKVTAVSEPTDQQTGTPVVPSSSYSPGEKPIIFYPQGLTDV
YLTKEALKVSAISGSADWKTGIPTVSSTSYSNREKPIIFYPQGLTDSQLPQEALNISAIPGPADQKTGLPSEPSSSYSLR
EKPIIFYPQDLTNSQVPQAALKVSAIPGPADQKTGLPLEDSSSYSPREKPIIFYPQGLTDSQLPQAALKLSAIPGPADQK
TRLPSEPSSSYSFREKPIIFYPQGLTDSQLPQEALNVSAIPEPADQKTELPSEPSTSYSPREKPSIFYPQDLTDSQLPQE
PLNISAIPGPADQKTGLPSESSSSYSPREKPIIFYSQGLIDGQVPQVALKVSATPGLADQKTGLPSEPSSSYSPREKPII
FYPQGLTDSQLPQEALKVSAIPGPGDQKTGLPSEPSSSYKPSIFYPQDLTDSQLPQEPLNISAIPGPADQKTGLPSESSS
SYSPREKPIIFYSQGLIDGQVPQVALKVSATPGLADQKTGLPSEPSSSYSPREKPIIFYPQGLTDSHLPQKALKVSAILG
PGDQKTGLPSEPSSSYSHREKSNIFYAQEFPGSHLTEEALKVSAFSGIGDQKTGIPTVLSSSYSLGGKPIIFYQQALSDR
HLTDEALNVSASSGPADQETGIPTVSSVSYSHRERPSILYQQPFSDNQLAIAALKVSAVSGSDDQKTRKPTITSASYSER
EKPIIYHQQLPDLTQESLNVFRIPGLGDQRTGITAVTSTTYSHREKPVISYQQELPAPNEGALKVLGAPGSADQQSGIRF
GPSTSYSHRKNPIFSYLESPDITEETLKISAVSGPGDQKTGIHIIPSSSYSYREKDSIFYQEELPDVTEAALKVFALPGP
ADQKTEIPIGPSSSYSHEEKLKISPVILPDDQETELLTAPLSFYSKREKPKISTVIGSDNQKTPLLTVLHNSYSQKVKPG
IFLQHQLSDKHQSENILKISAVSEPIDVNSGIPISLSSSYSHREKSNNFYPQELPDKHLGKGALKVSTIPLPADQKSLLP
TAPSSFSHREQPDIFCQQDFPDRHLTQDALMFSSGVGQADQITGLSTVTPGTYSYSEKQKLVSDHVQMLIDNLDSSNSSV
TSNSMPLNSQADGRVIISKPESSSFEDVRSEEIQDRSSGSKTLKEIRTLLMEAENIALKRCNFPAPLVPFRDVSDISFIQ
SKKVVCFKEPLTADEYNGDLPQRQPFIEESPSNKCIQKDISTQTNLKCQRGIENWEFISSTTVRSPLQEAESKARVTVDE
TCRQYRAAKSVMRSEPEGYSGTIGNKIVIPMMTIIKSDSSSDASSCSWDSNSLESVSDVLLNFFPYSSPKTSLTDSREEG
VSESDDGGGSSVDSLAAHVRNLLKCESSLNHAKQILRNAEEEECRVRARAWNLKFNLAHECGYSISELNEDDRRKVEEIK
AKLFSHERTTDLSKGLQSPRGIGCKPEAVCSHIIIESHEKGCFRTLTAEQPQLDSHPCVFRSADPSDMIRGQRSPSSWRT
RHIDLSKSLDQCNPHFKVWNSLQLRSHSPFQNFAADDFRISQGLRMPFHEKIDPWLSELVEPASVPLEEMDCHSSSQMLP
PEPMKKFTTSITFSSHRHSKCFSDSSVLKVGVTEGSQCTGASVGVFNSHFTEEQNPPRDLEQRTSSPSSFKIVSHSPDKA
VTILAESSRQSPKLSVEHSQQEEKFLERSDFKSSDSEPSTSTKCSNVKEVHFSDNHTFISMSRPSSTLGVKEKNVTITPD
LSSHIILEQRQLFEQSKAPHADHHVRKHHSPPPQHQDYVAPNLPCRIFLEKQELFEQSKAPHLDHQMRENHSPFLQGQDY
IASDLPSSIFLEQRQLFEQSKAPDVDHMGKYHSPLPQVQDYVVEKNNQHKFKSYISNMINVEAKFDNVISQSAPSQCTLV
TSTSASTPPSNRKALSCFRITLYPKTPSKLDSGTLDKRFHTLDPASKTRMNSEFNSDLQTISSRSLEPTSKLLASKPIAQ
NQESLGFVGPKSSPDFQVVQSPLPDSNDISQDLKSILFQNNQIVTSKQTQVNISDLEGYSSPEGTPVSADRSSEGIKAPF
SAFPGKLSSDAVTQITTESPGKTMFSSEIFINTKDRGLAISEPSTQKLGKGPVKFASSSSVQQITHPHGTDGSNDAIAPD
FPAEVLGTRDDDLTVPANIKHKEGIYSKRVVPKASLLVGRKTPQKDNADAQVQVSITDDENLSDKNQKKEIYTKKAVTKA
AQPEEESLQKASKGSSDAAAAEHSARLQDIKLESLPDTKAIKQKEEILNKRTFPKEAWKEDKESLQIDIAESRCHSEFEN
TTHSVFRSAKFYFHHPVHLPSDQDFCHESLGRSVFMRHSLKDFFQHHPDKQREHTSLPSPRQNVEKTKTDYTRIESLSIN
VNLENDVMHTAKSRARDNPKSDKQLNDQKRDHKVTPEPTAQHTVSLNELWNRYQERQRQQRPPQFGDRKELSLVDRLDRL
AKLLQNPITYSLRTSESTQDDSRGERDVKEWSGRQQQQKSKLQKKKRYKSLEKFHKNAGELKKSKMLSTHQAGKSNQIKI
EQIKFDKYILRKQPDFHYRNNTSSDSRPSEESELLTDTATNLLSTTTSPVESDILTQTDREVTLQERSSSISTIDTARLI
QAFGHERVCLSPRQIKLYSSITDHQRRYLERRSKKNKKALNMNHPQMTSEHTRRKHIQVADHVISSDSVSSSTSSFWSSS
STLCNMQNVQMLNKAVQAGNLEIVNGVKKHTRDVGMTFPTPSSSEARIEEDSDMTSWSEEKIEEKRLLTNYLGDKKLRKN
KHSCCEGVSWFVPVENVKSEPKKENLPKLHGPGICWFAPITNTKPWREPLREQNWQGQHVDGHRPLAGPDRERLRPFVRA
TLQESLHLHRPDFISRSGERIKRLKLIVQERKLQNMLESEREALFNVSREWQGYRDPTHLLPKKGFLDARKSRPIGKKEM
IQRSKRIYEQLPEVQRKREEEKRRLEYKSYRLRAQLFKKKVTNQLLGRKVPWN

MRF results:

_images/PF18727_A0A8I3P0L2_MRF.png

TAPAS results:

The output of TAPASS was not shown because of the length of the protein, so some amino acids in the C-term were eliminated

_images/PF18727_A0A8I3P0L2.png

PF02095 - Extensin-like protein repeat

PF02095 Protein family information

P13993

P13993 Interpro sequence information

Sequence:

>P13993 1-230
MASLSSLVLLLAALILSPQVLANYENPPVYKPPTEKPPVYKPPVEKPPVYKPPVENPPIYKPPVEKPPVYKPPVEKPPVY
KPPVEKPPVYKPPVEKPPVYKPPVEKPPVYKPPVEKPPVYKPPVEKPPVYKPPVEKPPVYKPPVEKPPVYKPPVEKPPVY
KPPVEKPPVYKPPVEKPPVYKPPVEKPPVYKPPVEKPPIYKPPVEKPPVYKPPYGKPPYPKYPPTDDTHF

MRF results:

Region 1: 44 -        223,10 aa length, 18 units
VEKPPVYKPP
VENPPIYKPP
VEKPPVYKPP
VEKPPVYKPP
VEKPPVYKPP
VEKPPVYKPP
VEKPPVYKPP
VEKPPVYKPP
VEKPPVYKPP
VEKPPVYKPP
VEKPPVYKPP
VEKPPVYKPP
VEKPPVYKPP
VEKPPVYKPP
VEKPPVYKPP
VEKPPIYKPP
VEKPPVYKPP
YGKPPYPKYP

TAPAS results:

_images/PF02095_P13993.png

Q43414

Q43414 Interpro sequence information

Sequence:

>Q43414 1-227
PVYKPPVEKPPVYKPPIEKPPVYKPPVEKPPVYKPPVEKPPVYKPPIEKPPVYKPPVEKPPIYKPPVEKPPVYKPPVEKP
PVYKPPVEKPPVYKPPVEKPPVYKPPVEKPPVYKPPVEKPPVYKPPVEKPPVYKPPVEKPPIYKPPVEKPPVYKPPIEKP
PVYTPPVEKPPVYKPPIEEPPVYKPPVEKPPVYGPPYEKPPHYPGYPPYEKPPHHPGYPPADDDNRF

MRF results:

Region 1: 2 - 211,    12 aa length, 21 units
VYK--PPVEKPP
VYK--PPIEKPP
VYK--PPVEKPP
VYK--PPVEKPP
VYK--PPIEKPP
VYK--PPVEKPP
IYK--PPVEKPP
VYK--PPVEKPP
VYK--PPVEKPP
VYK--PPVEKPP
VYK--PPVEKPP
VYK--PPVEKPP
VYK--PPVEKPP
VYK--PPVEKPP
IYK--PPVEKPP
VYK--PPIEKPP
VYT--PPVEKPP
VYK--PPIEEPP
VYK--PPVEKPP
VYG--PPYEKPP
HYPGYPPYEK--

TAPAS results:

_images/PF02095_Q43414.png

PF02218 - Repeat in HS1/Cortactin

PF02218 Protein family information

Q9VDF4

Q9VDF4 Interpro sequence information

Sequence:

>Q9VDF4 1-559
MWKASAGHQIQATSAASAEDDDWETDPDFVNDVSEQEQRWGSKTIDGSGRTAGTIDMDKLREETEQADLDKKKQLLKDQN
AGYGYGGKFGVEKDRMDKSAVGHDYQGKVGKHASQKDYSDGFGGKFGVQEDRKDKSAVGWDHVEKVEKHASQKDYATGFG
GKFGVQSDRVDKSAVGWDHIEKVEKHESQKDYSKGFGGKFGVQEDRKDKSAVGWDHKEAPQKHASQVDHKVKPVIEGAKP
SNLRAKFENLAKNSEEESRKRAEEQKRLREAKDKRDREEAAKKTVAENTPRTSTEAPPPKGSRAAIQTGRTGGIGNAISA
FNQMQSPVSETPPARKEPIIIPKAQPVKIELEAKEEPTASTTSAAVAPTPTVVPAREPETAPVAKAAAPPPDVVPQIEVE
TVDTPPRSEPQSPVYVPTPQPEVHAQVQVQPEPQPQADPEPVVEEEPLYQNQAEIKAASPLPPTNGTVSEAVAPSGTATV
PEEAIYANSDNLADYLEDTGIHAIALYDYQAADDDEISFDPDDVITHIEKIDDGWWRGLCKNRYGLFPANYVQVVGQNS

MRF results:

Region 1: 44 -        223,10 aa length, 5 units
DKKKQLL---KDQNAGYGYGGKFGVEKDRMDKSAVGH
DYQGKVGKHASQKDYSDGFGGKFGVQEDRKDKSAVGW
DHVEKVEKHASQKDYATGFGGKFGVQSDRVDKSAVGW
DHIEKVEKHESQKDYSKGFGGKFGVQEDRKDKSAVGW
DHKEAPQKHASQVDHKV----KPVIEGAKPSNLRAKF

TAPAS results:

_images/PF02218_Q9VDF4.png

PF03057 - Repeat in HS1/Cortactin

PF03057 Protein family information

PF03057

PF03057 Interpro sequence information

Sequence:

>A0A0B2V1U5 1-535
MFSLVIGSSFQQLYQAATPTGPVLGPSRNTHLPQSWVIKPKRSTPLDEKRTAPIACRGRQMTAFLEPVALLDGLSIWLLI
ALLLTSFVEALYSSCCCCRRKKKKKKKVKKKTNDNEKSGNKDGEQENDGQADAGAPPAAPPAAPKPPDKGGIAGTFDPNY
QTLAGMGQDIFGADKKAGGGGGGAVGGGGPPKPPAAGGMAGTYDPNYQTLAGMGQDIFGADKKCGGGGGAAPQVPQAPKP
GAGGMAGTYDPNYQTLAGLGQDVFGADKKVGGGGGGPPQAPKPGGGGMAGTYDPNYQTLAGLGQDVFGADKKAAGGGGGG
AGPIRAPENAGAKAGTYDPNYQTLAGIGGDVFGADKKKPAAFGGADGIKVPQNAGAKAGTYDPNYQTLAALDNNVFGEDK
KAKAGGGGGAANIKVPQNAGQKAGTYDPNYQTLAALDNNVFGEDKKAKGGGGGGAGGGIRAPENIGAKAGTYDPNYQTLA
AVGGDVFGADKKKPAGGGGFRTPENQAAKAGTYDPNYQTLAALGNDVFGADKKKF

MRF results:

_images/PF03057_A0A0B2V1_MRF.png

TAPAS results:

_images/PF03057_A0A0B2V1.png

PF03991 - Copper binding octapeptide repeat

PF03991 Protein family information

Q7KYY8

Q7KYY8 Interpro sequence information

Sequence:

>Q7KYY8 1-81
PQGGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWG
Q

MRF results:

_images/PF03991_Q7KYY8_MRF.png

TAPAS results:

_images/PF03991_Q7KYY8.png

PF04649 - Mycoplasma hyorhinis VlpA repeat

PF04649 Protein family information

Q9L8V9

Q9L8V9 Interpro sequence information

Sequence:

>Q9L8V9 1-384
MKKSIFSKKLLVSFGSLVALAAIPLIAISCGQTDNNSSQSQQPGSGTTNTSGGTNSSGSTNGTAGTNSSGSTNGSGNGSN
SETNTGNKTTSESNSGSSTGSQAGTTTNTGSGSNSESGMNSEKTENTQQSEAPGTNTGNKTTSESNSESSTGSQAGTTTN
TGSGSNSESGMNSEKTENTQQSEAPGTKTENTQQSEAPGTKTENTQQSEAPGTNTGNKTTSESNSGSSTGSQAGTTTNTG
SGSNSESGMNSEKTENTQQSEAPGTKTENTQQSEAPGTKTENTQQSEALGTNTGNKTTSESNSGSSTGSQAGTTTNTGSG
SNSESGMNSEKTENTQQSEAPGTNTGNKTTSESNSESGMNSEKTENTQQSEAPGTKTENTQHTS

MRF results:

_images/PF04649_Q9L8V9_MRF.png

TAPAS results:

_images/PF04649_Q9L8V9.png

PF04671 - Erythrocyte membrane-associated giant protein antigen 332

PF04671 Protein family information

W7FNF1

W7FNF1 Interpro sequence information

Sequence:

>W7FNF1 1-518
STTEEIVEKVGSVSEEIIVEEVSASEEIVEEGSVTEEVVEEEKLINEVGETESVTEEIVQKEVSDAEEVLGQEGSMNEEI
LEKESIVEEIVGPEGSVTEEIVDHGSFAEEVKEEELVTEEAVQYEGSVTEEIKEEESITENEAIEESAFAEIIEEKGPNT
DEIVKEEGLDTEEIVNEVSVTDEVIEEEKLVNEQIVGEERSVTEKPVEVERSATEDLVEEEASVTEKVSVHEGSTTEQIL
DESVAEEIVEEEVSVDDKIIEEEVSVDEVVEEEGSVIEEIVEEEESVPEEILEEELSGSEEVLEDEWVTDAFMGQEGSVI
EEIEEIVDGEGSITEEIVEDGSANEKIVEEEPSRVEEVLGKEGFVIEEIIEEGSVIEQVEDTKTVSEKSEESSAIEEVKE
VKEEESISEKIVEKEESVTEEIVRQEESTTEKIVKDVSPTEDFVEQTDSVTEKVIEQEGSNTEVAEDVEEKESASDEHEQ
EDVSVNAQVTYEKKSVTKEIVDEVSRTEEIVEENGSKS

MRF results:

_images/PF04671_W7FNF1_MRF.png

TAPAS results:

_images/PF04671_W7FNF1.png

PF03482 - sic protein repeat

PF03482 Protein family information

Q9JNA7

Q9JNA7 Interpro sequence information

Sequence:

>Q9JNA7 1-363
MNIRNKIENSKTLLFTSLVAVALLGATQPVSAETYTSRNFDWSGDDWPEDDWSGDGLSKYDRSGVGLSQYGWSKYGWSSD
KEEWPEDWPEDDWSSDKKDETEDKTRPPYGGALGTGYEKRDDWRGPGTVATDPYTPPYGGALGTGYEKRDDWGGPGTVAT
DPYTPPYGGALGTGYEKRDDWRGPGTVATDPYTPPYGGALGTGYEKRDDWGGPGTVATDPYTPPYGGALGTGYEKRDDWR
GPGHIPKPENEQSPNPSHIPEPPQIEWPQWNGFDGLSSGPSDWGQSEDTPRFPSEPRVTEKPQHTPQKNPQESDFDRGFS
AGLKAKNSGRGIDFEGFQYGGWSDEYKKGYMQAFGTPYTPSAT

MRF results:

_images/PF03482_Q9JNA7_MRF.png

TAPAS results:

_images/PF03482_Q9JNA7.png