Genetic code P36671


Statement
 

pdf   zip

html

Write a program that converts chains of messenger RNA (derived sequences of DNA) to proteins using the genetic code.

The genetic code is a set of rules that translates the sequences of messenger RNA to proteins. A sequence of messenger RNA is a sequence of bases. There are four possible bases: A, C, G and U. The bases of genes are grouped in threes forming codons. Every codon corresponds to an amino acid. A protein is a sequence of amino acids.

The following figure shows the genetic code. It can be seen, for instance, that the codon GGA corresponds to glycine and that the codon AUC corresponds to isoleucine. There are also three special codons, marked with the stop symbol, that do not encode any amino acid, but indicate the end of codification. Once a stop codon is found, the gene is finished (an AUG does not have to be searched after). Moreover, proteins only start to be synthesized from the first appearance of the codon AUG. Thus, an imaginari gene GCCAAUGACUAAGGCCUAAAGA would correspond to the protein ThrLysAla.

Input

Input is a gene obtained from the GeneBank, a genome bank that can be consulted on the Internet. This gene consists of a brief finished in ‘:’ followed by the sequence of messenger RNA bases corresponding to this gene. It always appears a AUG codon before a Stop codon.

Output

The output must be the protein synthesized by this gene according the previous rules of the genetic code. Your program must print the sequence using the standard names of three letters for each amino acid. For each line, print 26 amino acids, except the last one, that may contain less.

Observation

The second instance is an artificial extract of genome of hepatitis C virus.The private test datas contain the complete genome (10 kilobases).

Public test cases
  • Input

    Small test:
    GCCAAUGACUAAGGCCUAAAGA
    

    Output

    ThrLysAla
    
  • Input

    Hepatitis C virus, partial genome:
    UUGUGGUACUGCCUGAUAGGGUGCUUGCGAGUGCCCCGGGAGGUCUCGUAGACCGUGCACCAUGAGCACG
    AAUCCUAAACCUCAAAGAAAAACCAAACGUAACACCAACCGUCGCCCACAGGACGUCAAGUUCCCGGGUG
    GCGGUCAGAUCGUUGGUGGAGUUUACUUGUUGCCGCGCAGGGGCCCUAGAUUGGGUGUGCGCGCGACGAG
    GAAGACUUCCGAGCGGUCGCAACCUCGAGGUAGACGUCAGCCUAUCCCCAAGGCACGUCGGCCCGAGGGC
    AGGACCUGGGCUCAGCCCGGGUACCCUUGGCCCCUCUAUGGCAAUGAGGGUUGCGGGUGGGCGGGAUGGC
    UCCUGUCUCCCCGUGGCUCUCGGCCUAGCUGGGGCCCCACAGACCCCCGGCGUAGGUCGCGCAAUUUGGG
    UAAGGUCAUCGAUACCCUUACGUGCGGCUUCGCCGACCUCAUGGGGUACAUACCGCUCGUCGGCGCCCCU
    CUUGGAGGCGCUGCCAGGGCCCUGGCGCAUGGCGUCCGGGUUCUGGAAGACGGCGUGAACUAUGCAACAG
    GGAACCUUCCUGGUUGCUCUUUCUCUAUCUUCCUUCUGGCCCUGCUCUCUUGCCUGACUGUGCCCGCUUC
    AGCGUUGGUGGUAGCUCAGCUGCUCCGGAUCCCACAAGCCAUCAUGGACAUGAUCGCUGGUGCUCACUGG
    GGAGUCCUGGCGGGCAUAGCGUAUUUCUCCAUGGUGGGGAACUGGGCGAAGGUCCUGGUAGUGCUGCUGC
    UAUUUGCCGGCGUCGACGCGGAAACCCACGUCACCGGGGGAAGUGCCGGCCGCACCACGGCUGGGCUUGU
    UGGUCUCCUUACACCAGGCGCCAAGCAGAACAUCCAACUGAUCAACACCAACGGCAGUUGGCACAUCAAU
    AGCACGGCCUUGAACUGCAAUGAAAGCCUUAACACCGGCUGGUUAGCAGGGCUCUUCUAUCAGCACAAAU
    UCAACUCUUCAGGCUGUCCUGAGAGGUUGGCCAGCUGCCGACGCCUUACCGAUUUUGCCCAGGGCUGGGG
    UCCUAUCAGUUAUGCCAACGGAAGCGGCCUCGACGAACGCCCCUACUGCUGGCACUAACCUCCAAGACCU
    

    Output

    SerThrAsnProLysProGlnArgLysThrLysArgAsnThrAsnArgArgProGlnAspValLysPheProGlyGly
    GlyGlnIleValGlyGlyValTyrLeuLeuProArgArgGlyProArgLeuGlyValArgAlaThrArgLysThrSer
    GluArgSerGlnProArgGlyArgArgGlnProIleProLysAlaArgArgProGluGlyArgThrTrpAlaGlnPro
    GlyTyrProTrpProLeuTyrGlyAsnGluGlyCysGlyTrpAlaGlyTrpLeuLeuSerProArgGlySerArgPro
    SerTrpGlyProThrAspProArgArgArgSerArgAsnLeuGlyLysValIleAspThrLeuThrCysGlyPheAla
    AspLeuMetGlyTyrIleProLeuValGlyAlaProLeuGlyGlyAlaAlaArgAlaLeuAlaHisGlyValArgVal
    LeuGluAspGlyValAsnTyrAlaThrGlyAsnLeuProGlyCysSerPheSerIlePheLeuLeuAlaLeuLeuSer
    CysLeuThrValProAlaSerAlaLeuValValAlaGlnLeuLeuArgIleProGlnAlaIleMetAspMetIleAla
    GlyAlaHisTrpGlyValLeuAlaGlyIleAlaTyrPheSerMetValGlyAsnTrpAlaLysValLeuValValLeu
    LeuLeuPheAlaGlyValAspAlaGluThrHisValThrGlyGlySerAlaGlyArgThrThrAlaGlyLeuValGly
    LeuLeuThrProGlyAlaLysGlnAsnIleGlnLeuIleAsnThrAsnGlySerTrpHisIleAsnSerThrAlaLeu
    AsnCysAsnGluSerLeuAsnThrGlyTrpLeuAlaGlyLeuPheTyrGlnHisLysPheAsnSerSerGlyCysPro
    GluArgLeuAlaSerCysArgArgLeuThrAspPheAlaGlnGlyTrpGlyProIleSerTyrAlaAsnGlySerGly
    LeuAspGluArgProTyrCysTrpHis
    
  • Information
    Author
    Jordi Petit
    Language
    English
    Translator
    Carlos Molina
    Original language
    Catalan
    Other languages
    Catalan
    Official solutions
    C++
    User solutions
    C++