Some column headings include an indication of the search node. The indication includes the following:

  • The letter represents the number of the MSF file used in the consensus workflow that generated the data. For example, the first MSF file used is A, the second MSF file used is B, and so on.
  • A number that represents the node number of the search engine node that was used in the processing workflow that generated the MSF file. For example, if the search engine node is the third node added to the workflow, its number is 3; if it was the fourth to be added, its number is 4, and so on.

Column headings can include the following:

  • A search node name, for example, Sequest HT
  • A search node name and a letter, for example, Sequest HT A
  • A search node name, a letter, and a number, for example, Sequest HT A4

The Merge Mode parameter of the MSF Files node determines which items the application includes in the column headings.

The Analysis Settings page of the Result Summaries view displays the letters that represent the MSF files, for example, Processing Step A. This page also displays the node numbers in the workflow tree.

The following table describes the columns on the Proteins page.

Column

Description

# AAs

Displays the length of the protein sequence.

# Decoy Proteins

Displays the number of the higher-ranked decoy or reverse proteins.

This column appears when the workflow includes the Protein FDR Validator node. For more information, see Protein FDR Validator node.

# Peptides

Displays the total number of distinct peptide sequences identified from all included searches.

# Peptides (by Search Engine)

Displays the number of distinct peptide sequences in the protein.

This column appears when the consensus workflow includes the Protein Scorer node. For information on this node, see Protein Scorer node.

# Protein Groups

Displays the total number of protein groups.

# Protein Unique Peptides

Displays the total number of peptides that are unique to a particular protein.

# PSMs

Displays the number of identified peptide spectrum matches identified from all included searches, including those redundantly identified.

# PSMs (by Search Engine)

Displays the number of identified peptide spectrum matches identified from all included searches, including those redundantly identified.

This column appears when the consensus workflow includes the Protein Scorer node. For information on this node, see Protein Scorer node.

# Razor Peptides

Displays the number of razor peptides (that is, peptides shared among multiple protein groups or proteins) used to quantify the protein when you use razor peptides for quantification.

This column appears when you set the Peptides to Use parameter of either the Precursor Ions Quantifier node or the Reporter Ions Quantifier node to All or to Unique + Razor.

# Unique Peptides

Displays the total number of distinct peptide sequences unique to the protein group.

Abundance Ratio Adj. P-Values

Displays the p-values adjusted by using the Benjamini-Hochberg correction for the false discovery rate. For more information on p-values, see Calculate p-values and adjusted p-values for quantification results.

The color-coding of the adjusted p-values is as follows:

Abundance Ratio P-Value

Displays the p-value of the sample group calculated by running the Tukey HSD test (post hoc) after an analysis of variance (ANOVA) test. For more information on p-values, see Calculate p-values and adjusted p-values for quantification results.

The P value is a number between 0 and 1. If the null hypothesis states that there is no difference between the sample groups for the variance tests, one of the following pertains:

  • A low P value means there is a low probability that the null hypothesis is false; that is, the sample groups show a significant difference.
  • A high P value means there is a high probability that the null hypothesis is true; that is, the sample groups do not show a significant difference.

The color-coding of the p-values is as follows:

Abundance Ratios

Displays abundance ratios as normal space values.

This column appears when there are sample ratios defined in the analysis setup.

Abundance Ratios (by Bio. Rep.)

Displays the abundance ratios of the biological replicates as normal space values.

Abundance Ratios (log2)

Displays the abundance ratios as log2 values.

Abundance Ratios (log2) (by Bio. Rep.)

Displays the abundance ratios of the biological replicates as log2 values.

Abundances

Displays the abundance values of the samples before scaling and normalization.

Abundances (by Bio. Rep.)

Displays the abundance values of the biological replicates.

Abundances (by Bio. Rep.) Counts

Displays the number of abundance values used to calculate the abundances of the biological replicates.

Abundances (Grouped)

Displays the abundance values of the sample groups.

A grouped abundance value is calculated as the arithmetic mean of all the replicate abundance values within a sample group. You can specify the sample grouping on the Grouping and Quantification page when you set up an analysis.

This column appears when you group samples in the analysis setup, and there is at least one sample group consisting of at least two samples.

Abundances (Grouped) Counts

Displays the number of samples with detected abundance values used to calculate the abundance of the whole sample group.

Abundances (Grouped) Standard Errors [%]

Displays the standard error of the abundance values of the samples in a sample group, normalized to the group’s median abundance.

Abundances (Normalized)

Displays the normalized abundances values of the samples.

This column appears when you set the Normalization Mode parameter of the Precursor Ions Quantifier node or the Reporter Ions Quantifier node to Total Peptide Amount or Specific Protein Amount.

Abundances (Scaled)

Displays the normalized and scaled abundance values of the samples.

This column appears when you set the Scaling Mode parameter of the Precursor Ions Quantifier node or the Reporter Ions Quantifier node to On All Average or On Controls Avg.

Abundances Counts

Displays the number of abundance values used to calculate the sample abundance.

Accession

Displays by default the unique identifier assigned to the protein by the FASTA database used to generate the report.

Biological Process

Displays the GO Slim categories of the protein’s biological processes as colored boxes.

This column appears when the consensus workflow includes the Protein Annotation node.

calc. pI

Displays the theoretically calculated isoelectric point, which is the pH at which a particular molecule carries no net electrical charge for the protein.

The amino acids that make up proteins can be positive, negative, neutral, or polar in nature, and together they give a protein its overall charge. At a pH below their isoelectric point, proteins carry a net positive charge; at a pH above their isoelectric point, they carry a net negative charge. Gel electrophoresis can then separate proteins according to their isoelectric point (overall charge) with a polyacrylamide gel, using a technique called isoelectric focusing. This technique uses a pH gradient to separate proteins and is the first step in two-dimensional gel polyacrylamide gel electrophoresis.

When you have searched the fractions resulting from isoelectric focusing, you can use the calc. pI value to estimate whether you might expect to find a particular protein in the given fraction.

Protein FDR Confidence

Displays the level of confidence for the identified protein as determined by the Protein FDR Validator node.

This column appears when the consensus workflow includes the Protein FDR Validator node in the consensus workflow. For more information on this node, see Protein FDR Validator node.

Master

Indicates whether the protein is the master protein of a protein group. For some peptides, a list of proteins might contain this peptide sequence, but none of them is a master protein. This situation can occur if the peptide contains isoleucine at a position where the master protein has leucine or vice versa.

Unique Sequence ID

Displays a numeric identifier unique to each protein. When you export the protein data from multiple searches and combine them during data processing in Python™, R, or a similar program, you can use the unique sequence IDs to identify the duplicate protein sequences from different runs by fast integer comparison instead of slow sequence comparison.

Protein Group IDs

Displays the identification numbers of the reference protein groups.

Biological Process

Displays the GO Slim categories of the protein’s biological processes as colored boxes.

This column appears when the consensus workflow includes the Protein Annotation node.

Cellular Component

Displays the GO Slim categories of the protein’s cellular components as colored boxes.

This column appears when the consensus workflow includes the Protein Annotation node.

Checked

Indicates whether the item is selected.

Chromosome

Displays chromosome information from the Ensembl genome database.

This column appears when the consensus workflow includes the Protein Annotation node.

Coverage [%]

Displays the percentage of the protein sequences covered by identified peptides.

Coverage [%] (by Search Engine)

Displays the percentage of the protein sequence covered by identified peptides by each search approach.

Description

Provides the name of the protein exclusive of the identifier that appears in the Accession column. This description appears in the table by default.

Ensembl Gene ID

Displays annotations from the Ensembl genome database.

This column appears when the consensus workflow includes the Protein Annotation node.

Entrez Gene ID

Displays the Entrez Gene database identification of the gene that the protein is derived from. If the gene is not stored in the Entrez Gene database, the value displayed is 0.

This column appears when the consensus workflow includes the Protein Annotation node.

Exp. q-value

Displays the q-values derived from the validation. The values must be greater than the thresholds set by the Protein FDR Validator node.

This column appears when the consensus workflow includes the Protein FDR Validator node. For more information on this node, see Protein FDR Validator node.

FASTA Title Lines

Displays the FASTA title of the protein.

Found in Files

Represents the best confidence of the PSMs of the protein that the application identified in the files:

  • Green: High confidence
  • Yellow: Medium confidence
  • Red: Low confidence
  • Blue: Found but unidentified PSM. Only the results from precursor ion quantification searches contain blue boxes.

This column appears only when you include the Data Distributions node in the consensus workflow and set its Show Found in Files parameter to True.

Found in Fractions

Represents the best confidence of the PSMs of the protein that the application identified in the fractions:

  • Green: High confidence
  • Yellow: Medium confidence
  • Red: Low confidence
  • Blue: Found but unidentified PSM. Only the results from precursor ion quantification searches contain blue boxes.

This column appears only when you include the Data Distributions node in the consensus workflow and set its Show Found in Fractions parameter to True.

Found in Sample Groups

Represents the best confidence of the PSMs of the protein that the application identified in the sample groups:

  • Green: High confidence
  • Yellow: Medium confidence
  • Red: Low confidence
  • Blue: Found but unidentified PSM. Only the results from precursor ion quantification searches contain blue boxes.
  • Grey: n/a

This column appears only when you include the Data Distributions node in the consensus workflow and set its Show Found in Sample Groups parameter to True.

Found in Samples

Represents the best confidence of the PSMs of the protein that the application identified in the samples:

  • Green: High confidence
  • Yellow: Medium confidence
  • Red: Low confidence
  • Blue: Found but unidentified PSM. Only the results from precursor ion quantification searches contain blue boxes.

This column appears only when you include the Data Distributions node in the consensus workflow and set its Show Found in Samples parameter to True.

Gene Symbol

Displays the official gene name that is used in publications. This information is taken from the second line of the General page of the ProteinCard page (see The ProteinCard Page).

This column appears when the consensus workflow includes the Protein Annotation node.

GO Accessions

Displays the GO terms contained in the graph of the annotated GO term of a protein. When you move the cursor over the GO term, the application displays the annotated GO term and all ancestor terms.

This column appears when the consensus workflow includes the Protein Annotation node.

Master

Indicates whether a protein is a master protein in a protein group.

Modifications

Displays the modifications identified in the protein consolidated from all PSMs. The column shows confidence value if the IMP-ptmRS node was used in the processing workflow.

Molecular Function

Displays the GO Slim categories of the protein’s molecular functions as colored boxes. See Figure 132.

This column appears when the consensus workflow includes the Protein Annotation node.

MW [kDa]

Displays the calculated molecular weight of the protein. The application calculates the molecular weight without considering PTMs.

Separating proteins by molecular weight can be one of the steps in two-dimensional gel electrophoresis. You can use the protein’s molecular weight as a rough constraint to estimate whether it is reasonable to identify a particular protein in a certain fraction that was searched.

Pfam IDs

Displays the identification numbers of families of proteins. A special sequence comparison algorithm, called the Hidden Markov Model, groups proteins into families by comparing the sequences. Each family has its own ID number that starts with Pf ….

This column appears when the consensus workflow includes the Protein Annotation node.

Protein FDR Confidence

Displays the level of confidence of the identified protein groups as determined by the Protein FDR Validator node.

This column appears when the consensus workflow includes the Protein FDR Validator node. For more information on this node, see Protein FDR Validator node.

Protein Group IDs

Displays the identification numbers of the referenced protein groups.

Score Sequest HT

Displays the protein score, which is the sum of the scores of the individual peptides.

This column appears when the consensus workflow includes the Protein Scorer node. For information on this node, see Protein Scorer node.

Sequence

Displays the sequence of amino acids that compose the peptide in the protein.

Sequence Coverage

Displays the parts of the protein sequence that were identified. The column ToolTip shows the number of amino acids composing the protein.

You cannot export this column to a text file.

Sum PEP Score

Displays the scores that the Protein FDR Validator node calculates on the basis of the PEP values of the PSMs. The application uses these scores to rank the list of proteins.

Unique Sequence ID

Displays a unique identifier for the protein sequence.

WikiPathway Accessions

Displays the accessions from the Wiki Pathways database.

This column appears on the Proteins page when you include the Protein Annotation node in the consensus workflow.

WikiPathways

Displays the descriptions from the Wiki Pathways database.

This column appears on the Proteins page when you include the Protein Annotation node in the consensus workflow.

Contaminant

Displays an X symbol next to the proteins marked as contaminants in the searched FASTA file or files.

This column appears when the consensus workflow includes the Protein Marker node. For more information see Protein Scorer node.

Species Map

Extracts from the FASTA database the species names for proteins and displays and annotates them as colored entries in a distribution map.

This column appears only when you include the Protein Marker node in the consensus workflow and set its As Species Map parameter to True.

Species

Extracts from the FASTA database the species names for proteins and displays and annotates them as semicolon-separated text.

This column appears only when you include the Protein Marker node in the consensus workflow and set its As Species Names parameter to True.