A protein might have several sites of modification–that is, sites where you can observe or predict particular residues to be modified in some way. In a particular instance of a given protein, the modified sites are active and the unmodified sites are inactive. This instance is called a proteoform, a distinct molecular form of a protein product that arises from a single gene. It is defined by its exact amino acid sequence combined with any PTMs on that sequence.
Because you might not know which sites are simultaneously active in a living organism, the BioPharma Finder application computes the masses and identities of the possible proteoforms of a given protein. The result is up to 2n combinations of proteoforms, where n is the maximum number of sites of modifications on the protein. Some of these generated proteoforms might not exist in nature or in living organisms.
For example, for a protein that has only three phosphorylation sites and no other modifications, the application generates up to the following eight (23) records in the protein sequence:
- One record for the unmodified sequence with no variable modifications
- Three records for the three proteoforms, each containing one modification
- Three records for the possible combinations of two phosphorylations
- One record for the proteoform with all three phosphorylations
The actual number of generated proteoforms depends on the minimum and maximum number of modification sites that you set. The minimum number of modifications per proteoform is one and the highest maximum number is three.
For the above example, if you set the minimum sites to two and the maximum sites to three, the application will generate a total of four proteoforms (or five proteoforms if you opt to include the unmodified sequence as a proteoform):
- Three records for the possible combinations of two phosphorylations
- One record for the proteoform with all three phosphorylations
From this generated list of proteoforms for this protein, you can then select which proteoforms you want to save with the protein sequence for a search.