SLiMSuite & SeqSuite: open-source bioinformatics in Python: January 2014

Tuesday, 14 January 2014

Using SLiMFinder on Phage Display Data (or other peptides)

Although SLiMFinder is designed with whole protein sequences in mind, it can also be used to identify statistically over-represented motifs in peptide data, including phage display results. Indeed, it is the third example application in the original SLiMFinder paper.

Unfortunately, the SLiMFinder webserver is currently not set up for phage display analysis, so if you are interested in this kind of work then you will need to download SLiMSuite.

Suggested settings for phage display data are below. If anyone has a go and/or wants more advice, please get in touch. (If you try it, I’d be interested to hear how well it works!) Similarly, if you want some advice/ideas on how to combine the peptides with interaction data and full length protein sequences for a more sophisticated analysis, send me a bit more info and I’d be happy to make some suggestions.

Custom settings for phage display data

Here is an overview of the settings that should be tweaked for phage display analysis:

Amino acid frequencies. One thing you will want to try is changing the way that the amino acid frequencies are used. By default, SLiMFinder will use the amino acid frequencies of the input dataset but for phage display peptides this is not really right as the peptides are clearly biased in their composition due to the motifs they contain. Instead, you probably want to set the amino acid frequencies for the background model to those of the human proteome (for human peptides) or even a uniform amino acid distribution. (Select frequencies that model the pre-screening amino acid frequencies.) This is done using the aafreq=FILE option, where FILE can be a fasta file of protein sequences or a delimited file of aa frequencies with the headings “AA” and “FREQ”. (See the manual for details.) If in doubt, try a few runs with different amino acid frequencies.

Evolutionary Filtering. Evolutionary filtering should be switched off (efilter=F) but you will also want to make sure that there is no redundancy in your peptides. (rje_seq.py can be used for this.)

SLiMChance. If you are not so interested in the statistical significance and primarily want to use SLiMFinder to return a ranked list of interesting motifs in the data, set sigcut=1.0 and choose the number of motifs to return with topranks=X.

Ambiguity. Peptide data is usually pretty quick to run, and so it is probably worth exploring the full range of ambiguity with combamb=T (combined amino acid and variable-lengh wildcards). The basic equiv=LIST set for aa degeneracy should be OK for most jobs but you can easily tweak it to add or remove ambiguity combinations as appropriate.

Masking. You will probably want to switch off all masking (masking=F). Low complexity masking might be useful but metmask=F posmask="" should be used as the N-termini are not true protein N-termini.