Protein Domain Overlap Z99185


Statement
 

pdf   zip

thehtml

Proteins contain different domains (structural or functional units responsible for a particular function).

We need a program that, given a list of domains in a protein, and their positions in it, determines domain overlapping regions in the sequence.

Input

The input is a list of protein domains found in a DNA sequence. For each domain, the id of the protein, the name of the domain, and its position span in the protein are given.

The input consists of an integer N (the number of protein domain records), followed by N lines, each consisting of two strings, and two integers:

 ‍ ‍ ‍ ‍ protein_id domain_name start_position end_position

where:

  • protein_id (string): The protein’s identifier.
  • domain_name (string): The name of the domain.
  • start_position, end_position (integers): The span of the domain within the protein sequence.

Output

List proteins in alphabetical order. For each protein, list its domains in order of starting position. If any domain overlaps with the previous one, mark it with "OVERLAP".

Print a summary list of the overlapping domains at the end of each protein. If no overlaps exist, print "No overlaps".

Follow the format of the examples.

Public test cases
  • Input

    6
    P12X43 Kinase 5 50
    T5678A Phosphatase 10 40
    T5678A Transmembrane 50 90
    P12X43 SH3 30 70
    T5678A Immunoglobulin 35 60
    P12X43 Pleckstrin 80 100
    

    Output

    P12X43:
      Kinase (5-50)
      SH3 (30-70) OVERLAP
      Pleckstrin (80-100)
      Overlapping domains in protein P12X43: Kinase-SH3
    T5678A:
      Phosphatase (10-40)
      Immunoglobulin (35-60) OVERLAP
      Transmembrane (50-90) OVERLAP
      Overlapping domains in protein T5678A: Phosphatase-Immunoglobulin Immunoglobulin-Transmembrane
    
  • Input

    8
    P12X31 RRM 5 20
    R3012Y Collagen 35 45
    H2F127 FN3 35 50
    R3012Y EGF 5 15
    R3012Y Cadherin 20 30
    P12X31 Catalase 25 40
    P12X31 Kinase 45 60
    H2F127 RRM 10 30
    

    Output

    H2F127:
      RRM (10-30)
      FN3 (35-50)
      No overlaps
    P12X31:
      RRM (5-20)
      Catalase (25-40)
      Kinase (45-60)
      No overlaps
    R3012Y:
      EGF (5-15)
      Cadherin (20-30)
      Collagen (35-45)
      No overlaps
    
  • Input

    12
    P448X1 SH3 85 110
    P59S87 Catalase 5 25
    P59S87 HTH 10 35
    P59S87 Immunoglobulin 75 90
    P448X1 Transmembrane 10 40
    P448X1 Kinase 30 60
    P59S87 7TM 40 50
    P59S87 PDZ 45 70
    M32101 Pleckstrin 5 15
    P448X1 SH2 55 90
    P59S87 WD40 5 25
    M32101 Porin 20 30

    Output

    M32101:
      Pleckstrin (5-15)
      Porin (20-30)
      No overlaps
    P448X1:
      Transmembrane (10-40)
      Kinase (30-60) OVERLAP
      SH2 (55-90) OVERLAP
      SH3 (85-110) OVERLAP
      Overlapping domains in protein P448X1: Transmembrane-Kinase Kinase-SH2 SH2-SH3
    P59S87:
      Catalase (5-25)
      WD40 (5-25) OVERLAP
      HTH (10-35) OVERLAP
      7TM (40-50)
      PDZ (45-70) OVERLAP
      Immunoglobulin (75-90)
      Overlapping domains in protein P59S87: Catalase-WD40 WD40-HTH 7TM-PDZ
    
  • Information
    Author
    Lluís Padró
    Language
    English
    Official solutions
    Python
    User solutions