The following steps can be followed to build a database from a unipot formatted FASTA database, however, when you browse for the files you need to change the file type from .xml to .fasta.

In order to reduce the search space and remove implausible search candidates, Thermo Fisher Scientific recommends that you use the Max Isoform Mass and Max Isoform Expansion mass thresholds. The Max Isoform Mass limits the size of the sequences imported from your input database files. The Max Isoform Expansion limits the size of the potential proteoform candidates. For example, if the Max Isoform mass is 100 kDa and the Max Isoform Expansion mass is 50 kDa, then a 90 kDa sequence is included in your ProSightPD database. However, only proteoforms less than 50 kDa can be expanded and searched.

NOTE

For select proteins from a specific genome or proteins from different genomes, download .xml files from UniProt.

Procedure

  1. Go to the following website:
  2. https://www.uniprot.org/
  3. Locate and download the XML file.
  4. Save the file.
  5. In the Proteome Discoverer application, open the ProSightPD Database Manager by doing one of the following:.
  6. Select ProSightPD Database Manager .
  7. Select Help > Open Database Manager.
  8. The ProSightPD Database Manager dialog box opens.
  9. To adjust the upper mass limit for sequences to be added from your database input file to your ProSightPD formatted database, select in the upper right and do the following:
  10. The Advanced Parameters dialog box opens.
  11. Enter the Max Isoform Mass (kDa).
  12. (Optional) Enter the Max Isoform Expansion Mass (kDa).
  13. Select Save
    .
  14. TIP

    The default setting for the upper mass limit is 250 kDa. Consider setting this limit to the largest proteoform of interest.

  15. IMPORTANT

    It is important to note that subsequences or truncations of parent sequences larger than the upper mass limit will not be detected. For example, if the upper mass limit is set to 70 kDa, a 25 kDa truncated proteoform from 75 kDa parent sequence will not be detected. In order to detected this truncated species, you must increase the upper mass limit to greater than 75 kDa.

  16. In the ProSightPD Database Manager dialog box, do the following:
  17. Select Create ProSightPD Database.
  18. Select Browse.
  19. The Open dialog box opens.
  20. Locate the saved XML file.
  21. Select Open.
  22. The database file appears in the dialog box. The application displays the number of proteins found in the imported file, as shown in the following figure.

If the imported database is correctly formatted, a green check mark and the number of correctly formatted proteins appear. If the file is incorrectly formatted, a red X appears.

If the number of entries shown is fewer than you expected, the lower number might be due to the Max Isoform Mass setting being too low. For information on changing that setting, see Change the isoform mass threshold.

Other reasons that the number of isoforms might be unexpectedly low:

  • A sequence contains an unrecognized amino acid (B,Y,Z, and so on) or a wildcard.
  • The entry header is incorrectly formatted (that is, not UniProt format).
  • Some of the entries were duplicated.

The Database Manager removes incorrectly formatted and duplicate entries.