The Protein Marker node identifies the proteins, including their associated peptide groups and PSMs, from a specified contaminants FASTA database and marks them as contaminants with an X in the columns of the result file. When a sample contains other kinds of contamination, such as contamination from the previous sample in the HPLC column or proteins used in the preparation of the sample, you can use the Protein Marker node to specify up to three additional FASTA files and mark the proteins found in these FASTA files in an extra column or columns. The name of this column is the name of the FASTA file or an optional name that you can specify with one of the node’s parameters. The X always appears if the protein (or peptide group or PSM) is found in the FASTA file associated with the column. A protein marked with an X in more than one column means that it is present in each of the FASTA files indicated.
The advantages of separating into different FASTA files are:
- You can mark standard contaminants in all searches.
- You can also use the Protein Marker node to know which FASTA file contains the identified protein, use the Protein Marker node to have the proteins marked by the FASTA file.
The Protein Marker node automatically attaches to the Peptide and Protein Filter node.
For more information on identifying contaminants and on creating a consensus workflow that uses this node, see Displaying Species Names for Proteins and Peptide Groups.
The following table describes the parameters for the Protein Marker node.
Parameter | Definition |
---|---|
Protein Database | Specifies the name of a FASTA file that contains proteins marked as contaminants. |
Column Name | Specifies the names of additional columns in the result file that display the proteins from additional FASTA files that contain proteins marked as contaminants. If you do not specify a column name, the application uses the name of the FASTA file as the column name. |
Protein Database | Specifies the name of an additional FASTA file that contains another set of proteins marked as contaminants. |
As Species Map | Determines whether to annotate and display species names extracted from a FASTA database search as colored entries in a distribution map. This information is available for NCBI (RefSeq), UniProt (Swiss-Prot, TrEMBL) FASTA formats, and sequence databases downloaded from the Annotation Server. The application can display up to 30 species names in a distribution map. If the number exceeds 30, it does not display any species in the map; instead, it lists all species, separating proteins found in more than one species by semicolons.
|
As Species Names | Determines whether to annotate and display species names extracted from a FASTA database search as strings in a text column. Names are separated by semicolons. This information is available for NCBI (RefSeq), UniProt (Swiss-Prot, TrEMBL) FASTA formats, and sequence databases downloaded from the Annotation Server.
|
Annotation Groups | If set to true, the above selected contaminants, marker databases, and species information columns would be created in Annotation Protein Groups entity in Protein Annotation node (if any). This process might take several minutes depending upon the size of annotation groups which is generally large. |
Pathway Groups | If set to true, the above selected contaminants, marker databases, and species information columns would be created in Pathway Protein Groups entity in Protein Annotation node (if any). This process might take several minutes depending upon the size of pathway groups, which is generally large. |
Modification Sites | If set to true, the above selected contaminants, marker databases, and species information columns would be created in Modification Sites entity (if any). |
Peptide Isoform Groups | If set to true, the above selected contaminants, marker databases, and species information columns would be created in Peptide Isoform Groups entity (if any). |