# Word composition

A nucleic acid or amino acid sequence can be seen as composed of a
number of possibly overlapping $k$-mers or words of length $k$, for a
certain $k \geq 1$. The $k$-mer composition of a sequence is given by
the frequency with which each possible $k$-mer occurs within the
sequence. The 1-mer composition is related to the GC content of a DNA
sequence, and the 2-mer, 3-mer, and 4-mer compositions are also known as
the di-nucleotide, tri-nucleotide, and tetra-nucleotide compositions of
a DNA sequence. For example, the di-nucleotide composition of TATAAT is
given by one occurrence of AA, two ocurrences of AT, and two ocurrences
of TA.

Write pseudocode, Python code, and C++ code for the word composition
problem. The program must implement and use the word composition
function in the pseudocode, which must be iterative and is not allowed
to perform input/output operations. Make two submissions, including the
pseudocode as a comment to both the Python and the C++ code.

## Input

The input is a string $s$ (a genomic sequence) over the alphabet
$\Sigma=\{A,C,G,T\}$ and an integer $k$ with $1 \leq k \leq \|s\|$.

## Output

The output is a sorted list of all the $k$-mers of $s$ and their
frequencies.

## Problem information

Author: Gabriel Valiente

Generation: 2026-01-25T17:28:15.490Z

© *Jutge.org*, 2006--2026.\
<https://jutge.org>
