The protein descriptions and accessions shown on the Proteins page of the result file are taken from the title lines of the added FASTA file. The rules for extracting these values are defined as regular expressions (go to http://en.wikipedia.org/wiki/Regular_expression).
To use unsupported FASTA files, which might contain descriptions or accessions that are difficult to read, you might need to edit existing parsing rules or add new parsing rules to the system. (For example, if you want to use FASTA files from the Saccharomyces Genome Database (SGD) or the Arabidopsis Information Resource (TAIR) web pages.)
Prerequisites
- You have downloaded a FASTA file from an appropriate website.
Procedure
- Select Administration > Maintain FASTA Parsing Rules.
- The FASTA Parsing Rules view opens.
- Do the following:
- Add or edit the parsing rule.
- Test the parsing rule.
- Correct the parsing rule until it meets your needs.
- Select Apply.
No. Description No. Description 1
Parsing rule category selection area: Displays the four categories (Title Line Rules, Accession Rules, Taxonomy Rules, and Avoid Expression Rules) into which the application groups parsing rules.
3
List of parsing rules in the selected category: Displays all available parsing rules in the selected category. This list corresponds to the available values of the appropriate parameter of the MSF Files node. If you select a single entry, the FASTA Parsing Rules view displays the parsing rule in the parsing rule area on the right.
2
Parsing rule text area: Displays the regular expression of the parsing rule. Each line corresponds to a single regular expression. These expressions are tested as alternatives or connected.
4
Test area: Loads the title lines of a sample FASTA file to test the matching of the expression.
The following figure shows regular expressions for SwissProt accessions in the parsing rules text area.