The ProSight Annotator (formerly the Protein Annotator) now supports the ability to expand database isoforms into individual proteoforms. Once expanded, you can add, remove, and modify individual proteoforms for precise control of their search space.
In order to expand and curate proteoforms, do the following:
Prerequisites
- You have the ProSightPD Database Manager open.
Procedure
- Select Create ProSightPD Database.
- In the File Path(s)area, select Browse to open an existing database file.
NOTE
Thermo Fisher Scientific recommends creating a database from a UniProt formatted .xml file as it contains additional annotated information not found in a .fasta file.
- Locate the database file.
- The file appears in the File Path(s) box.
NOTE
(Optional) if you create a new database, you can expand the isoforms to proteoforms at that time.
- Select ProSight Annotator.
- The ProSight Annotator dialog box opens.
- Before expanding, edit the isoform to include all proteoforms of interest.
- Double-click Expand Isoform
in the Modify column.
- The Expand Isoform to Proteoforms dialog box opens.
- Add modifications. Any modifications added here are included once the isoform is expanded into proteoforms.
- (Optional) To refine proteoform expansion, select the gear icon,
, in the upper right corner of the Expand Isoform to Proteoform dialog box.
- The Proteoform Expansion Parameters dialog box opens. Refer to the Proteoform Expansion Parameters table for information on expansion parameters.
NOTE
Expanding a large number of proteoforms >1000 can be slow to complete and may result in reduced performance while editing the proteoform list.
Parameter | Description |
---|---|
Maximum Mass (Da.) To Search | Proteoforms larger than the default setting are not included in the database. The default is set to 70,000. Instrument performance should be taken into account when setting this parameter. It is unlikely to identify large numbers of proteoforms greater than 70 kDa. |
Maximum PTMs Per Isoform | This parameter limits the number of PTMs expanded into proteoforms. Including a large number of PTMs increases the number of proteoforms exponentially resulting in long search times and large database files. This parameter should reflect experimental conditions. For example, in a targeted experiment with few sequences in the database, a larger number of PTMs per isoform can be considered while retaining good performance. However, for complex samples containing hundreds of sequences, a moderate number such as 11 ensures acceptable performance and search time. In the case where specific proteoforms must be included, Thermo Fisher Scientific recommends using the default expansion parameters and adding a specific proteoform after expansion. |
Maximum SNPs Per Isoform | It is uncommon to find a sequence with many mutations, so the default number is set to 2. However, if a sequence of interest is known to have many mutations, adjust accordingly to reflect all possible proteoforms in the database. |
Maximum PTMs Per Proteoform | This parameter limits the total number of possible modifications on a single proteoform. The default value is 4. However, there are certain cases such as histones where it is possible to encounter heavily modified proteoforms. As the number of max PTMs increase per proteoform, the size of the database increases as well. It is unlikely to identify a proteoform with more than 10 modifications on a single proteoform. |
- After you set all modifications and Proteoform Expansion Parameters, select Save and Exit in the lower right corner of the ProSight Annotator.
- A list of proteoforms displays. This list can be modified. Highlight the proteoform to delete, copy, or modify by adding Point Features or Range Features.
NOTE
If you want to keep a proteoform and add a modified form of the proteoform, you must copy that proteoform first.
- After you have curated the database with proteoforms, select Save and Exit in the lower right corner.
- The database now contains isoform and proteoform level information.
- Select Create Database in the ProSightPD Database Manager just as you would when creating a database without proteoform level annotation.
- In order to take advantage of a proteoform curated database, you must set the Candidate Source parameters to include proteoforms. See the Candidate Source parameters table for a list of parameters. After creating the database, select the database from the dropdown menu in the search node.
- The Candidate Source parameters determine how your database is searched.
IMPORTANT
In order to take advantage of a proteoform curated database, you must set the Candidate Source parameters to include proteoforms. See the Candidate Source parameters table for a list of parameters.
Parameter | Description |
---|---|
Isoforms | This parameter only searches isoforms (legacy setting). |
Proteoforms | This parameter only searches proteoforms that you expanded. |
Both-Exclude Isoforms with Proteoforms | This parameter searches both isoforms and proteoforms, but excludes isoforms that have expanded proteoforms. This search ignores proteoforms from expanded isoforms that are not explicitly added by the user (for example, for all expanded isoforms, only user-curated proteoforms are searched. Unexpanded isoforms are searched as usual). |
Both-Include Isoforms with Proteoforms | This parameter searches both isoforms and proteoforms and includes searching for isoforms that have expanded proteoforms. This search includes proteoforms from expanded isoforms that are not explicitly added by the user. This search includes all user-expanded proteoforms and any proteoforms what would normally be searched using an isoform search. Note: Max PTM and Max SNP parameters dictate how large the isoform search space is. |