Some column headings include an indication of the search node. The indication includes the following:
- The letter represents the number of the MSF file used in the consensus workflow that generated the data. For example, the first MSF file used is A, the second MSF file used is B, and so on.
- A number that represents the node number of the search engine node that was used in the processing workflow that generated the MSF file. For example, if the search engine node is the third node added to the workflow, its number is 3; if it was the fourth to be added, its number is 4, and so on.
Column headings can include the following:
- A search node name, for example, Sequest HT
- A search node name and a letter, for example, Sequest HT A
- A search node name, a letter, and a number, for example, Sequest HT A4
The Merge Mode parameter of the MSF Files node determines which items the application includes in the column headings.
The Analysis Settings page of the Result Summaries view displays the letters that represent the MSF files, for example, Processing Step A. This page also displays the node numbers in the workflow tree.
The following table describes the columns on the Proteins page.
Column | Description |
---|---|
# AAs | Displays the length of the protein sequence. |
# Decoy Proteins | Displays the number of the higher-ranked decoy or reverse proteins. This column appears when the workflow includes the Protein FDR Validator node. For more information, see Protein FDR Validator node. |
# Peptides | Displays the total number of distinct peptide sequences identified from all included searches. |
# Peptides (by Search Engine) | Displays the number of distinct peptide sequences in the protein. This column appears when the consensus workflow includes the Protein Scorer node. For information on this node, see Protein Scorer node. |
# Protein Groups | Displays the total number of protein groups. |
# Protein Unique Peptides | Displays the total number of peptides that are unique to a particular protein. |
# PSMs | Displays the number of identified peptide spectrum matches identified from all included searches, including those redundantly identified. |
# PSMs (by Search Engine) | Displays the number of identified peptide spectrum matches identified from all included searches, including those redundantly identified. This column appears when the consensus workflow includes the Protein Scorer node. For information on this node, see Protein Scorer node. |
# Razor Peptides | Displays the number of razor peptides (that is, peptides shared among multiple protein groups or proteins) used to quantify the protein when you use razor peptides for quantification. This column appears when you set the Peptides to Use parameter of either the Precursor Ions Quantifier node or the Reporter Ions Quantifier node to All or to Unique + Razor. |
# Unique Peptides | Displays the total number of distinct peptide sequences unique to the protein group. |
Abundance Ratio Adj. P-Values | Displays the p-values adjusted by using the Benjamini-Hochberg correction for the false discovery rate. For more information on p-values, see Calculate p-values and adjusted p-values for quantification results. The color-coding of the adjusted p-values is as follows: |
Abundance Ratio P-Value | Displays the p-value of the sample group calculated by running the Tukey HSD test (post hoc) after an analysis of variance (ANOVA) test. For more information on p-values, see Calculate p-values and adjusted p-values for quantification results. The P value is a number between 0 and 1. If the null hypothesis states that there is no difference between the sample groups for the variance tests, one of the following pertains:
The color-coding of the p-values is as follows: |
Abundance Ratios | Displays abundance ratios as normal space values. This column appears when there are sample ratios defined in the analysis setup. |
Abundance Ratios (by Bio. Rep.) | Displays the abundance ratios of the biological replicates as normal space values. |
Abundance Ratios (log2) | Displays the abundance ratios as log2 values. |
Abundance Ratios (log2) (by Bio. Rep.) | Displays the abundance ratios of the biological replicates as log2 values. |
Abundances | Displays the abundance values of the samples before scaling and normalization. |
Abundances (by Bio. Rep.) | Displays the abundance values of the biological replicates. |
Abundances (by Bio. Rep.) Counts | Displays the number of abundance values used to calculate the abundances of the biological replicates. |
Abundances (Grouped) | Displays the abundance values of the sample groups. A grouped abundance value is calculated as the arithmetic mean of all the replicate abundance values within a sample group. You can specify the sample grouping on the Grouping and Quantification page when you set up an analysis. This column appears when you group samples in the analysis setup, and there is at least one sample group consisting of at least two samples. |
Abundances (Grouped) Counts | Displays the number of samples with detected abundance values used to calculate the abundance of the whole sample group. |
Abundances (Grouped) Standard Errors [%] | Displays the standard error of the abundance values of the samples in a sample group, normalized to the group’s median abundance. |
Abundances (Normalized) | Displays the normalized abundances values of the samples. This column appears when you set the Normalization Mode parameter of the Precursor Ions Quantifier node or the Reporter Ions Quantifier node to Total Peptide Amount or Specific Protein Amount. |
Abundances (Scaled) | Displays the normalized and scaled abundance values of the samples. This column appears when you set the Scaling Mode parameter of the Precursor Ions Quantifier node or the Reporter Ions Quantifier node to On All Average or On Controls Avg. |
Abundances Counts | Displays the number of abundance values used to calculate the sample abundance. |
Accession | Displays by default the unique identifier assigned to the protein by the FASTA database used to generate the report. |
Biological Process | Displays the GO Slim categories of the protein’s biological processes as colored boxes. This column appears when the consensus workflow includes the Protein Annotation node. |
calc. pI | Displays the theoretically calculated isoelectric point, which is the pH at which a particular molecule carries no net electrical charge for the protein. The amino acids that make up proteins can be positive, negative, neutral, or polar in nature, and together they give a protein its overall charge. At a pH below their isoelectric point, proteins carry a net positive charge; at a pH above their isoelectric point, they carry a net negative charge. Gel electrophoresis can then separate proteins according to their isoelectric point (overall charge) with a polyacrylamide gel, using a technique called isoelectric focusing. This technique uses a pH gradient to separate proteins and is the first step in two-dimensional gel polyacrylamide gel electrophoresis. When you have searched the fractions resulting from isoelectric focusing, you can use the calc. pI value to estimate whether you might expect to find a particular protein in the given fraction. |
Protein FDR Confidence | Displays the level of confidence for the identified protein as determined by the Protein FDR Validator node. This column appears when the consensus workflow includes the Protein FDR Validator node in the consensus workflow. For more information on this node, see Protein FDR Validator node. |
Master | Indicates whether the protein is the master protein of a protein group. For some peptides, a list of proteins might contain this peptide sequence, but none of them is a master protein. This situation can occur if the peptide contains isoleucine at a position where the master protein has leucine or vice versa. |
Unique Sequence ID | Displays a numeric identifier unique to each protein. When you export the protein data from multiple searches and combine them during data processing in Python™, R, or a similar program, you can use the unique sequence IDs to identify the duplicate protein sequences from different runs by fast integer comparison instead of slow sequence comparison. |
Protein Group IDs | Displays the identification numbers of the reference protein groups. |
Biological Process | Displays the GO Slim categories of the protein’s biological processes as colored boxes. This column appears when the consensus workflow includes the Protein Annotation node. |
Cellular Component | Displays the GO Slim categories of the protein’s cellular components as colored boxes. This column appears when the consensus workflow includes the Protein Annotation node. |
Checked | Indicates whether the item is selected. |
Chromosome | Displays chromosome information from the Ensembl genome database. This column appears when the consensus workflow includes the Protein Annotation node. |
Coverage [%] | Displays the percentage of the protein sequences covered by identified peptides. |
Coverage [%] (by Search Engine) | Displays the percentage of the protein sequence covered by identified peptides by each search approach. |
Description | Provides the name of the protein exclusive of the identifier that appears in the Accession column. This description appears in the table by default. |
Ensembl Gene ID | Displays annotations from the Ensembl genome database. This column appears when the consensus workflow includes the Protein Annotation node. |
Entrez Gene ID | Displays the Entrez Gene database identification of the gene that the protein is derived from. If the gene is not stored in the Entrez Gene database, the value displayed is 0. This column appears when the consensus workflow includes the Protein Annotation node. |
Exp. q-value | Displays the q-values derived from the validation. The values must be greater than the thresholds set by the Protein FDR Validator node. This column appears when the consensus workflow includes the Protein FDR Validator node. For more information on this node, see Protein FDR Validator node. |
FASTA Title Lines | Displays the FASTA title of the protein. |
Found in Files | Represents the best confidence of the PSMs of the protein that the application identified in the files:
This column appears only when you include the Data Distributions node in the consensus workflow and set its Show Found in Files parameter to True. |
Found in Fractions | Represents the best confidence of the PSMs of the protein that the application identified in the fractions:
This column appears only when you include the Data Distributions node in the consensus workflow and set its Show Found in Fractions parameter to True. |
Found in Sample Groups | Represents the best confidence of the PSMs of the protein that the application identified in the sample groups:
This column appears only when you include the Data Distributions node in the consensus workflow and set its Show Found in Sample Groups parameter to True. |
Found in Samples | Represents the best confidence of the PSMs of the protein that the application identified in the samples:
This column appears only when you include the Data Distributions node in the consensus workflow and set its Show Found in Samples parameter to True. |
Gene Symbol | Displays the official gene name that is used in publications. This information is taken from the second line of the General page of the ProteinCard page (see The ProteinCard Page). This column appears when the consensus workflow includes the Protein Annotation node. |
GO Accessions | Displays the GO terms contained in the graph of the annotated GO term of a protein. When you move the cursor over the GO term, the application displays the annotated GO term and all ancestor terms. This column appears when the consensus workflow includes the Protein Annotation node. |
Master | Indicates whether a protein is a master protein in a protein group. |
Modifications | Displays the modifications identified in the protein consolidated from all PSMs. The column shows confidence value if the IMP-ptmRS node was used in the processing workflow. |
Molecular Function | Displays the GO Slim categories of the protein’s molecular functions as colored boxes. See Figure 132. This column appears when the consensus workflow includes the Protein Annotation node. |
MW [kDa] | Displays the calculated molecular weight of the protein. The application calculates the molecular weight without considering PTMs. Separating proteins by molecular weight can be one of the steps in two-dimensional gel electrophoresis. You can use the protein’s molecular weight as a rough constraint to estimate whether it is reasonable to identify a particular protein in a certain fraction that was searched. |
Pfam IDs | Displays the identification numbers of families of proteins. A special sequence comparison algorithm, called the Hidden Markov Model, groups proteins into families by comparing the sequences. Each family has its own ID number that starts with Pf …. This column appears when the consensus workflow includes the Protein Annotation node. |
Protein FDR Confidence | Displays the level of confidence of the identified protein groups as determined by the Protein FDR Validator node. This column appears when the consensus workflow includes the Protein FDR Validator node. For more information on this node, see Protein FDR Validator node. |
Protein Group IDs | Displays the identification numbers of the referenced protein groups. |
Score Sequest HT | Displays the protein score, which is the sum of the scores of the individual peptides. This column appears when the consensus workflow includes the Protein Scorer node. For information on this node, see Protein Scorer node. |
Sequence | Displays the sequence of amino acids that compose the peptide in the protein. |
Sequence Coverage | Displays the parts of the protein sequence that were identified. The column ToolTip shows the number of amino acids composing the protein. You cannot export this column to a text file. |
Sum PEP Score | Displays the scores that the Protein FDR Validator node calculates on the basis of the PEP values of the PSMs. The application uses these scores to rank the list of proteins. |
Unique Sequence ID | Displays a unique identifier for the protein sequence. |
WikiPathway Accessions | Displays the accessions from the Wiki Pathways database. This column appears on the Proteins page when you include the Protein Annotation node in the consensus workflow. |
WikiPathways | Displays the descriptions from the Wiki Pathways database. This column appears on the Proteins page when you include the Protein Annotation node in the consensus workflow. |
Contaminant | Displays an X symbol next to the proteins marked as contaminants in the searched FASTA file or files. This column appears when the consensus workflow includes the Protein Marker node. For more information see Protein Scorer node. |
Species Map | Extracts from the FASTA database the species names for proteins and displays and annotates them as colored entries in a distribution map. This column appears only when you include the Protein Marker node in the consensus workflow and set its As Species Map parameter to True. |
Species | Extracts from the FASTA database the species names for proteins and displays and annotates them as semicolon-separated text. This column appears only when you include the Protein Marker node in the consensus workflow and set its As Species Names parameter to True. |