A nucleic acid or amino acid sequence can be seen as composed of a number of possibly overlapping -mers or words of length , for a certain . The -mer composition of a sequence is given by the frequency with which each possible -mer occurs within the sequence. The 1-mer composition is related to the GC content of a DNA sequence, and the 2-mer, 3-mer, and 4-mer compositions are also known as the di-nucleotide, tri-nucleotide, and tetra-nucleotide compositions of a DNA sequence. For example, the di-nucleotide composition of TATAAT is given by one occurrence of AA, two ocurrences of AT, and two ocurrences of TA.
Write pseudocode, Python code, and C++ code for the word composition problem. The program must implement and use the word composition function in the pseudocode, which must be iterative and is not allowed to perform input/output operations. Make two submissions, including the pseudocode as a comment to both the Python and the C++ code.
The input is a string (a genomic sequence) over the alphabet and an integer with .
The output is a sorted list of all the -mers of and their frequencies.
Author: Gabriel Valiente
Generation: 2026-01-25T17:28:15.490Z
© Jutge.org, 2006–2026.
https://jutge.org