To analyze the statistical significance of changes observed in abundances of proteins and peptides, you usually apply statistical tests, like t-tests or ANOVA, to your matrix of abundance values. This matrix traditionally shows the peptides or proteins as columns and the samples as rows. Most statistical tests have issues when values are missing in the data, so you usually try to remove these gaps in the data before further statistical analysis by either removing variables with missing values for some observations or trying to fill the gaps with some sensible values. Filling the gaps with reasonable values is called missing value imputation.
You can use a number of strategies to impute missing values. Imputed values are artificial and are never optimal, but the following recommendations apply:
- That they are not all the same constant value but exhibit some kind of randomness
- That they do not lower the variance of the detected values
- That, at best, they do not create significance that would not exist without them being imputed
To impute missing values, set the Imputation Mode parameter in the Precursor Ions Quantifier node or the Reporter Ions Quantifier node in the consensus workflow to one of the following settings. These nodes specify how to treat missing values.
- (Default) None: Does not impute any missing values.
- Replicate-Based Resampling: Uses the detected values from replicate measurements. For each sample group, this method first uses a linear model to assess the dependence of the standard deviation of the replicate measurements of a protein or peptide on the median abundance of the protein or peptide. Then for each protein or peptide with missing values, it uses the median of the detected values and draws random values from a normal distribution around this median with a standard deviation derived in the first step. When it does not detect any values in a sample group for a protein or peptide, it imputes low abundance values by using the already available method random sampling from the distribution of the lower fifth percentile of abundance values.
- Low Abundance Resampling: Replaces missing values with random values sampled between the minimum and the lower 5 percent of all detected values.
The application applies the missing value imputation before the optional scaling of abundance values and after the optional normalization of abundance values. If the application performs normalization, the raw abundances still contain gaps but it fills the gaps on the normalized abundances level according to the chosen imputation method.
Imputation has little to no effect when you use the pairwise ratio-based method. Use this method only when you select the summed abundance ratio method in the Precursor Ions Quantifier or the Reporter Ions Quantifier nodes in the consensus workflow. When it imputes missing values, it creates a new Abundance Origin column on the Proteins and Peptide Groups page with information about where an abundance value came from.
- Det: The value is derived from a detected signal.
- Imp: The value was imputed.