A nucleic acid or amino acid sequence of length can be seen as composed of a number of possibly overlapping -mers or words of length , for . An interesting problem is the generation of all the words of length contained in a genomic sequence with nucleotides, for all with . That is, the generation of all the subwords of a genomic sequence of length .
Write code for the subwords problem. The program must implement and use the SUBWORDS function in the pseudocode discussed in class, which is iterative and is not allowed to perform input/output operations. Make one submission with Python code and another submission with C++ code.
The input is a string over the alphabet .
The output is a sorted list of all the nonempty subwords of , without repetitions.
Input
TATAAT
Output
A AA AAT AT ATA ATAA ATAAT T TA TAA TAAT TAT TATA TATAA TATAAT