Proteome Discoverer uses a protein grouping inference process to group proteins when you use the Protein Grouping node in the consensus workflow.

The application initially collects all peptide spectrum matches (PSMs) that the Peptide and Protein filter node did not filter out.

Protein grouping inference process in the Proteome Discoverer application
All PSMs not filtered out Final protein groups Preliminary protein groups Group all proteins that share the same set or subset of identified peptides. Step 1 Step 2 Step 3 Step 4 Filter out protein groups that have no unique peptides among the considered peptides. Iterate through all spectra and select which PSM to use in ambiguous cases. Resolve cases where protein groups form circular rings of identified peptides. Steps 2–4 are performed only when you set the Apply Strict Parsimony Principle parameter of the Protein Grouping node in the consensus workflow to True.

Step 1:

Procedure

  1. Proteome Discoverer creates preliminary protein groups from the PSMs collected.
  2. It combines all proteins into one protein group that contains the same subset of peptides.

The application takes the next steps in the protein grouping process if you set the Apply Strict Parsimony Principle parameter of the Protein Grouping node in the consensus workflow to True.

Step 2:

Procedure

  1. The application removes all protein groups that have no unique peptides among the peptides that it considers for the protein grouping process.
  2. If a protein group does not contain at least one unique peptide, the application also includes all of the peptides included by other protein groups, so there is no supporting evidence for the existence of this protein group.

The application explicitly retains all protein groups that form circular rings of overlapping shared peptides. For example, suppose a circular ring comprises these protein groups:

  • ABCD (identified by peptides a, b, c, and d)
  • CDEF (identified by peptides c, d, e, and f)
  • EFAB (identified by peptides e, f, a, and b)

To explain all identified peptides, Proteome Discoverer needs only two of the three protein groups, but at this point it is not clear which to take and which to reject. The application postpones the resolution of this issue until step 5.

Step 3:

Procedure

  1. Proteome Discoverer collects all spectra with more than one peptide match to consider for the protein grouping process.
  2. The application resolves these ambiguous cases and selects one of the PSMs to use for the protein grouping process while rejecting the remaining peptide matches of a spectrum.
  3. In cases where the application considers more than one PSM for a spectrum, it resolves this ambiguity by selecting the PSM that is connected to the “best” protein group and rejecting the other PSMs. The “best” protein group is the group with the highest number of unambiguous and unique peptides.

Step 4:

Proteome Discoverer resolves the cases where protein groups form circular rings of overlapping identified peptides. This is the last step of the protein group inference process, which results in the final list of protein groups that the application reports in the Proteins page of the result file.

The PSM Ambiguity column on the PSMs and MS/MS Spectrum Info pages can help you understand the process of selecting PSMs for the protein group. This column is available for every PSM, every search input entry (representing the searched spectra), and every peptide group. For the search input entries and the peptide groups, this column displays the best PSM ambiguity from all connected PSMs. See The PSMs page for more information.

In the following example, the application identifies eight different PSMs for search input 20. Only seven PSMs ranked 1 through 7 are of high confidence. However, eight PSMs meet the specified protein grouping criteria because, on the basis of user-specified criteria in the Peptide and Protein Filter node, all PSMs for the top-scored proteins are retained. Because the search input cannot be unambiguously assigned to a single protein, the PSM ambiguity is set to Ambiguous.

PSMs shown for search input