SLiMSuite & SeqSuite: open-source bioinformatics in Python

New SLiMSuite Blog

2014-06-26T08:59:00.000+01:00

This Blog has now been retired. Please visit the new SLiMSuite blog (and update any bookmarks).

New SLiMSuite release now available

2014-06-23T03:13:00.001+01:00

SLiMSuite Short Linear Motif discovery and analysis: New SLiMSuite release now available: A new download of SLiMSuite (release 2014-06-22 ) is now available. As well as fixing the minor GOPHER output bug , a new Taxonomy proces...

SLiMSuite Short Linear Motif discovery and analysis: Minor bug in GOPHER output with BLAST+

2014-05-14T13:28:00.001+01:00

SLiMSuite Short Linear Motif discovery and analysis: Minor bug in GOPHER output with BLAST+: A bug has been identified with the current SLiMSuite release when using BLAST+ to generate orthologue alignments with GOPHER. Sequences extr...

SLiMSuite Short Linear Motif discovery and analysis: Blog switchover

2014-04-25T12:06:00.001+01:00

Posts and pages from this blog have now been imported into a new SLiMSuite Short Linear Motif discovery and analysis blog, which will take over as the main source of ongoing news, tips, documentation and updates. Posts will be cross-posted here for a while before eventually this blog is discontinued.

SLiMSuite 2014-04-22 now available

2014-04-23T14:00:00.000+01:00

A new download of SLiMSuite (release 2014-04-22) is now available. As well as fixing the gopher.py error, the download page and readme have had a slight makeover, which should make them load quicker.

As part of ongoing consolidation and documentation, SeqSuite has now been incorporated into in a single SLiMSuite download. (Previously, SLiMSuite was available as a reduced set of programs and SeqSuite had the full set.) The intention is to retire the SeqSuite moniker over the coming months, although the programs themselves will still be available.

The lastest release also features a new program, SLiMFarmer, for running (Q)SLiMFinder and SLiMProb batch jobs on parallel processors. SLiMFarmer is still under development and should hopefully work with other SLiMSuite programs too but has not yet been tested.

Other miscellaneous updates are listed below.

Updates since last release:

• comparimotif_V3: Updated from Version 3.10.
→ Version 3.10: Added forking.
→ Version 3.11: Added additional overlap/matchfix checks during basic comparison to try and speed up.
→ Version 3.12: Replaced deprecated sets.Set() with set().

• gablam: Updated from Version 2.11.
→ Version 2.12: Consolidated use of BLAST V2.

• haqesac: Updated from Version 1.9.
→ Version 1.10: Added exceptions for BLAST failure.

• picsi: Updated from Version 1.1.
→ Version 1.2: Updated to BUDAPEST 2.3 and rje_mascot.

• pingu_V4: Created.
→ Version 4.0: Initial Compilation based on code from SLiMBench and PINGU 3.9 (inherited as pingu_V3).
→ Version 4.1: Adding compilation of PPI databases using new rje_xref V1.1 and older objects from PINGU V3.
→ Version 4.2: Bug fixes for use of PPISource to create PPI databases.

• qslimfinder: Updated from Version 1.6.
→ Version 1.7: Fixed "MustHave=LIST" correction of motif space.

• seqmapper: Updated from Version 2.0.
→ Version 2.1: Added catching of failure to read input sequences. Removed 'Run' from GABLAM table.

• slimbench: Updated from Version 2.0.
→ Version 2.1: Fixed memsaver=T unless in development mode (dev=T). Removed old Assessment. Tested with simbench analysis.
→ Version 2.2: Replaced searchini=LIST with searchini=FILE and moved to SimBench commands.
→ Version 2.2: Modified the FN/TN and ResNum calculations. No longer rate TP in random data as OT.

• slimfarmer: Created.
→ Version 0.0: Initial Compilation.
→ Version 1.0: Functional version using rje_qsub and rje_iridis to fork out SLiMSuite runs.
→ Version 1.1: Updated to use rje_hpc.JobFarmer and incorporate main SLiMSuite farming within SLiMFarmer class.

• slimfinder: Updated from Version 4.5.
→ Version 4.6: Minor modification to seqocc=T function. !Experimental! Added main occurrence output and modified savespace.

• slimmutant: Created.
→ Version 0.0: Initial Compilation.
→ Version 1.0: Working version with standalone functionality.

• slimprob: Updated from Version 1.0.
→ Version 1.1: Tidied import commands.
→ Version 1.2: Increased extras=X levels. Adjusted maxsize=X assessment to be post-masking.

• ned_rankbydistribution: Updated from Version 1.1.
→ Version 1.2: Replaced depracated Set module.

• rje: Updated from Version 4.8.
→ Version 4.9: Added rje.slimsuite, which determines the slimsuite home directory from rje.py file path.
→ Version 4.10: Added osx=T/F option for Mac-specific running options.

• rje_blast_V2: Updated from Version 2.4.
→ Version 2.5: Minor modifications for SLiMCore UPC generation.
→ Version 2.6: Minor bug fixes.

• rje_db: Updated from Version 1.2.
→ Version 1.3: Minor modifications for SLiMCore FUPC development.
→ Version 1.4: Added list checking with addEmptyTable.

• rje_dismatrix_V2: Updated from Version 2.9.
→ Version 2.10: Minor modifications for SLiMCore UPC.

• rje_genemap: Updated from Version 1.4.
→ Version 1.5: Minor tweak of expected HGNC input following change to downloads.

• rje_hpc: Created.
→ Version 1.0: Initial Compilation based on rje_iridis V1.10.

• rje_iridis: Updated from Version 1.9.
→ Version 1.10: Modified freemem setting to run on Katana. Made rsh optional. Removed defunct IRIDIS3 option.

• rje_obj: Updated from Version 1.3.
→ Version 1.4: Added sourceDataFile() method from SLiMBench for wider use.
→ Version 1.5: Added 'basestr' and 'basefile' cmdlist types.
→ Version 1.6: Added osx=T/F option for Mac-specific running options.

• rje_qsub: Updated from Version 1.4.
→ Version 1.5: Added emailing of job stats after run. Added vmem limit.

• rje_seq: Updated from Version 3.17.
→ Version 3.18: Minor BLAST+ bug fixes. Added exceptions to readBLAST failure.

• rje_seqlist: Updated from Version 1.3.
→ Version 1.4: Added dna2prot reformat function.

• rje_slimcore: Updated from Version 1.12.
→ Version 1.13: Modified the savespace settings to reduce numbers of files. targz file now uses RunID not Build Info.
→ Version 1.14: Started adding code for Fragmented UPC (FUPC) clustering.

• rje_slimlist: Updated from Version 1.2.
→ Version 1.3: Added auto-download of ELM data.

• rje_uniprot: Updated from Version 3.14.
→ Version 3.14: Added dblist=LIST and dbsplit=T/F for additional DB link output control. Set unipath default to url.
→ Version 3.15: Added extraction of taxonomic groups. Add UniFormat to improve pure downloads.
→ Version 3.16: Added WBGene ID's from WormBase as one of the recognised DB XRef to parse.
→ Version 3.17: Efficiency tweak to URL-based extraction of acclist.
→ Version 3.18: Minor modification to database parsing.

• rje_xref: Updated from Version 1.0.
→ Version 1.1: Added output of ID lists to text files. Major reworking. Tested with HPRD and HGNC.

Missing gopher.py file

2014-04-08T08:35:00.000+01:00

There is a bug with the current software download, with a file missing from the libraries/ directory. The download will hopefully be updated soon but in the meantime please email richard.edwards[at]unsw.ed.au and I will send you the file.

Using SLiMFinder on Phage Display Data (or other peptides)

2014-01-14T11:02:00.000+00:00

Although SLiMFinder is designed with whole protein sequences in mind, it can also be used to identify statistically over-represented motifs in peptide data, including phage display results. Indeed, it is the third example application in the original SLiMFinder paper.

Unfortunately, the SLiMFinder webserver is currently not set up for phage display analysis, so if you are interested in this kind of work then you will need to download SLiMSuite.

Suggested settings for phage display data are below. If anyone has a go and/or wants more advice, please get in touch. (If you try it, I’d be interested to hear how well it works!) Similarly, if you want some advice/ideas on how to combine the peptides with interaction data and full length protein sequences for a more sophisticated analysis, send me a bit more info and I’d be happy to make some suggestions.

Custom settings for phage display data

Here is an overview of the settings that should be tweaked for phage display analysis:

Amino acid frequencies. One thing you will want to try is changing the way that the amino acid frequencies are used. By default, SLiMFinder will use the amino acid frequencies of the input dataset but for phage display peptides this is not really right as the peptides are clearly biased in their composition due to the motifs they contain. Instead, you probably want to set the amino acid frequencies for the background model to those of the human proteome (for human peptides) or even a uniform amino acid distribution. (Select frequencies that model the pre-screening amino acid frequencies.) This is done using the aafreq=FILE option, where FILE can be a fasta file of protein sequences or a delimited file of aa frequencies with the headings “AA” and “FREQ”. (See the manual for details.) If in doubt, try a few runs with different amino acid frequencies.

Evolutionary Filtering. Evolutionary filtering should be switched off (efilter=F) but you will also want to make sure that there is no redundancy in your peptides. (rje_seq.py can be used for this.)

SLiMChance. If you are not so interested in the statistical significance and primarily want to use SLiMFinder to return a ranked list of interesting motifs in the data, set sigcut=1.0 and choose the number of motifs to return with topranks=X.

Ambiguity. Peptide data is usually pretty quick to run, and so it is probably worth exploring the full range of ambiguity with combamb=T (combined amino acid and variable-lengh wildcards). The basic equiv=LIST set for aa degeneracy should be OK for most jobs but you can easily tweak it to add or remove ambiguity combinations as appropriate.

Masking. You will probably want to switch off all masking (masking=F). Low complexity masking might be useful but metmask=F posmask="" should be used as the N-termini are not true protein N-termini.

File management for large SLiMSuite runs

2013-12-03T06:37:00.000+00:00

The latest release of SLiMSuite features a slight modification to the way that files are generated and tidied, which can be beneficial for large runs.

Previously, a different results directory (resdir=PATH) was required for each different run to avoid dataset-specific results being over-written. The partial exception was the *.pickle.gz file, which included some SLiMBuild information in its name. (This is predominantly to speed up the ability of (Q)SLiMFinder to recognise when an intermediate pickle file can be used or not.) As of the latest release, the RunID (runid=X) is also now included in dataset-specific output, allowing results from several different runs (with different RunIDs) to go into the same results directory.

The exception is the files that are created as part of the initial setup/SLiMBuild process: *.slimdb, *.dis.tdt and *.upc. From a given Dataset and RunID, the following files will therefore be generated in ResDir/

Dataset.RunID.cloud.txt
Dataset.RunID.mapping.fas
Dataset.RunID.maskaln.fas
Dataset.RunID.masked.fas
Dataset.RunID.motifaln.fas
Dataset.RunID.occ.csv
Dataset.dis.tdt
Dataset.#SLiMBuild-Text#.pickle.gz
Dataset.slimdb
Dataset.upc

Note that the default ResDir is SLiMFinder/, QSLiMFinder/ or SLiMProb and the default RunID is the date and time of the run.

TarGZ and SaveSpace

Obviously, the results directory can quickly fill up with files if there are multiple datasets and/or runs with different RunIDs. The way to get round this is to use the targz=T and savespace=X options.

targz=T will package up all of the files associated with a specific run into a single Dataset.RunID.tgz file. This does not work on Windows. (Note that previous versions generated a Dataset.tar.gz file.) The *.pickle.gz file associated with the run will not be included in the tar file unless savespace=2+ (see below).

Note: the tar file is actually generated from the run directory, not the results directory and will include the relative path to ResDir in the tarred files. This means that if you enter ResDir/ and then tar -xzf Dataset.RunID.tgz, an additional ResDir/ will be created in which the files can be found. This is actually pretty useful as it allows the user to unpack individual runs and then delete the whole directory when finished. To return individual results to their “rightful” place, simply run the tar command from the same directory that the SLiMSuite program was run from (e.g. tar -xzf ResDir/Dataset.RunID.tgz).

The savespace=X option saves space by deleting excess files. It is strongly recommended that this is used in conjunction with the targz=T. There are now four levels of savespace=X:

0 = Delete no files
1 = Delete all bar *.upc and *.pickle (Pickle excluded from tar.gz with this setting)
2 = Delete all bar *.upc files (Pickle included in tar.gz with this setting)
3 = Delete all dataset-specific files including *.upc and *.pickle (not *.tar.gz)

Another way to think of this is that 0 will delete nothing, 1 will leave enough files to rerun the same dataset/SLiMBuild combination, 2 will leave enough to run the same dataset with additional SLiMBuild settings, whilst 3 will cleanup absolutely everything.

The recommended setting for running on a cluster or supercomputer is targz=T savespace=1 unless file numbers are an issue, in which case targz=T savespace=2 would be better. targz=T savespace=3 is only really recommended when you are confident that all datasets will run to completion without issues. If there is a chance of nodes going down or walltimes being reached, it is better to keep the pickle files accessible for re-runs.

New downloads and fixed webpages

2013-12-03T05:56:00.000+00:00

New releases of SeqSuite and SLiMSuite are now available. The webpages have now hopefully been fixed too, including the broken Manual links. (A bit of trouble parsing some the docstrings had messed up the HTML, in case you care!) Please report any more anomalies.

There are not many major updates since the last release. The biggest are that SLiMFinder (and QSLiMFinder) now produce a single *.occ.csv containing motif instances for all datasets, in addition to the old dataset-specific files. This is to make the output more consistent with SLiMProb although do note that some of the column headers are different. The new file contains the same data as the old dataset-specific *.occ.csv files plus two additional columns: Dataset and RunID. (These match the main *.csv output.)

Dataset-specific results files have also been cleaned up a little for (Q)SLiMFinder and SLiMProb (i.e. the SLiMCore Class in libraries/rje_slimcore) to make the targz=T/F and savespace=X options a little more useful and consistent. This will be the subject of another post shortly.

Other miscellaneous updates are listed below.

Updates since last release:

• comparimotif_V3: Updated from Version 3.10.
→ Version 3.10: Added forking.
→ Version 3.11: Added additional overlap/matchfix checks during basic comparison to try and speed up.

• qslimfinder: Updated from Version 1.6.
→ Version 1.7: Fixed "MustHave=LIST" correction of motif space.

• slimfinder: Updated from Version 4.5.
→ Version 4.6: Minor modification to seqocc=T function. !Experimental! Added main occurrence output and modified savespace.

• rje_pydocs: Updated from Version 2.8.
→ Version 2.8: Added docsource=PATH : Input path for Python Module documentation (manuals etc.) ['../docs/']
→ Version 2.9: Attempts to fix some broken links and sort out manuals confusion

• rje_slimcore: Updated from Version 1.12.
→ Version 1.13: Modified the savespace settings to reduce numbers of files. targz file now uses RunID not Build Info.

Wonky webpages

2013-11-29T00:55:00.002+00:00

It has come to my attention that the formatting has got a bit messed up at the SLiMSuite download pages. A new release of the downloads will be made soon and hopefully these kinks can get ironed out at the same time. (I'm not sure what's happened!)

SLiMSuite Down Under

2013-11-15T09:51:00.000+00:00

Rich has recently moved to Sydney, Australia to take up a position at the University of New South Wales (UNSW). As a result, things are a bit disrupted at present but a better-than-normal service should resume shortly, as should continuing to update the documentation. There are also plans to mirror the Bioware servers in UNSW, so watch this space.

If you are in Sydney and fancy a SLiM-related job, Rich also has a postdoc opportunity at present.

A note on using BLAST+ with SLiMSuite

2013-08-23T16:18:00.002+01:00

One of the major changes in the last release was the incorporation of BLAST+ as a replacement for BLAST. It should be noted that BLAST+ has not been benchmarked with SLiMSuite and it is not clear how and when it will behave differently, particularly with regards to UPC generation (i.e. generating clusters of unrelated proteins).

Early indications are that BLAST+ has a greater tendency to return no hits for short sequences. This can cause issues with SLiMSuite programs if oldblast=F. This will be fixed in the next release but running with dev=T gets round this issue in the meantime.

Please note that UPC may be different with BLAST versus BLAST+. This will need to be the focus of further study.

Log Files

2013-08-22T14:30:00.000+01:00

Every program generates a log file when it is run. By default, this file will be named after the calling program (e.g. gasp.py will produce a log called gasp.log) but this can be changed with the log=FILE option. The basefile=X option will also set the base name of the log file, as well as the main results files (for most programs). Logs will be appended unless the newlog (or newlog=T) option is used.

The log file records information that may help subsequent interpretation of results or identify problems. Each line is tab delimited in the form:

#XXX    HH:MM:SS    Log Message.

Where #XXX is an identifier that can be used to parse out specific types of information, HH:MM:SS is the runtime in hours, minutes and seconds, and Log Message will be something (hopefully) informative.

All log files start with the same few lines:

#~~#    #~~#    #~~#
#LOG    00:00:00    Activity Log for PROGRAM X.X: DATE TIME YEAR
#DIR    00:00:00    Run from directory: RUNPATH
#ARG    00:00:00    Commandline arguments: ARGLIST
#CMD    00:00:00    Full Command List: [FULL ARGLIST]

This should contain all the information required to repeat the analysis:

PROGRAM X.X: DATE TIME YEAR will have the program name, version number and the date/time of the run.
RUNPATH is the directory from which the program was run.
ARGLIST is the list of command-line arguments given to the program.
FULL ARGLIST is the full list of command-line arguments including any arguments read in from ini files.

The last line can help identify the source of any unexpected behaviour due to default settings etc.

(The #~~# #~~# #~~# line is simply to act as a separator if appending an existing log file.)

If the program runs to completion successfully, it will end with another #LOG line:

#LOG    HH:MM:SS    PROGRAM V:X.X End: DATE TIME YEAR

If this line is not present then something went wrong during the run (see Error Messages, below - or it is still in progress. Other information is also recorded along with the runtime (HH:MM:SS since the program started). For help interpreting log files, please check the relevant software manual or contact me if the information is missing. (Hopefully, the log content is mostly self-explanatory but I shall add any explanations I have to send people to the relevant manual’s appendix.)

Error Messages

One of the most important aspects of the log file is to register any error messages. These are marked by an #ERR line header. Hopefully, there will not be any but if there was a problem with the run then these lines should contain the details. To catch these lines separately, errorlog=FILE will output error messages to an additional file.

New Software Release

2013-08-21T14:52:00.001+01:00

New releases of SLiMSuite and SeqSuite are now available. Please note that RJESuite has now been discontinued - for simplicity, all of the extra gubbins is now part of the SeqSuite release. SLiMSuite still represents a cut-down version that focuses on Short Linear Motif analysis tools.

There have been a number of updates since the last release, which will be the focus of future posts. The biggest change since the last release is the implementation of BLAST+ as the default in place of BLAST for most tools. The old BLAST can still be invoked using the oldblast=T switch. In addition to blastpath=PATH, a new blast+path=PATH parameter will need to be set.

Apart from some file organisation tweaks, the other major change is that CompariMotif now has a memsaver=T mode, which will process very large motif lists much quicker and avoid memory issues. The XGMML output is not (yet) available in this mode. For multi-processor CPUs and large searchdb motif lists, CompariMotif now also supported forking (forks=X).

Documentation is in the process of having an overhaul and is still lagging behind as a result. Please ask if anything is unclear and that section of documentation will be prioritised.

Updates since last release:

• aphid: Updated from Version 2.0.
→ Version 2.1: Reduced import commands.

• budapest: Updated from Version 2.1.
→ Version 2.2: Removed unrequired rje_dismatrix import.
→ Version 2.3: Updated to use rje_blast_V2. Needs further updates for BLAST+. Deleted obsolete OLDreadMascot() method.

• comparimotif_V3: Updated from Version 3.9.
→ Version 3.10: Added MemSaver option, which will read and process input motifs (not searchdb) one motif at a time.
→ Version 3.10: Added forking.

• fiesta: Updated from Version 1.5.
→ Version 1.6: Removed HAQESAC import (uses MultiHAQ).
→ Version 1.7: Updated to use rje_blast_V2. Needs work to make function with BLAST+.

• gablam: Updated from Version 2.10.
→ Version 2.11: Altered to use BLAST+ and rje_blast_V2.

• gasp: Updated from Version 1.3.
→ Version 1.4: Minor tweaks to imports.

• gfessa: Updated from Version 1.2.
→ Version 1.3: Tidied module imports.
→ Version 1.4: Switched to rje_blast_V2. More work needed for BLAST+.

• haqesac: Updated from Version 1.8.
→ Version 1.9: Added rje_blast_V2 implementation and BLAST+. Use oldblast=T for old BLAST.

• peptcluster: Updated from Version 1.3.
→ Version 1.4: Bug fixes for end of sequence characters and different length peptides.

• picsi: Updated from Version 1.0.
→ Version 1.1: Updated to blast_V2 and BLAST+.

• pingu: Updated from Version 3.8.
→ Version 3.9: Tidied imports.

• qslimfinder: Updated from Version 1.5.
→ Version 1.6: Removed excess module imports.

• slimbench: Updated from Version 1.8.
→ Version 1.9: Added memsaver option. Replaced SLiMSearch with SLiMProb. Altered default IO paths.
→ Version 1.9: Removed 3DID again: new ELM interaction_domains file has position-specific PPI details.
→ Version 2.0: Major overhaul of input options to standardise/clarify. Implemented auto-downloads and PPI datasets.

• slimprob: Updated from Version 1.0.
→ Version 1.1: Tidied import commands.

• slimsuite: Created.
→ Version 0.0: Initial Compilation with downloadelm function.

• rje_pydocs: Updated from Version 2.6.
→ Version 2.7: Added rje_ppi output for module links.
→ Version 2.8: Added parsing of commandline options from docstring and cmdRead calls.
→ Version 2.8: Added docsource=PATH : Input path for Python Module documentation (manuals etc.) ['../docs/']

• rje: Updated from Version 4.6.
→ Version 4.7: Added self.warn list and self.warnLog() functions to Log object. Modified i=-1 quitchoice to raise not quit.
→ Version 4.8: Added perc cmdtype = float that is multiplied by 100.0 if < 1.0. Removed server option from iniCmds().

• rje_ancseq: Updated from Version 1.2.
→ Version 1.3: Changed "biproblem" error handling in gaspProbs()

• rje_blast_V1: Updated from Version 1.14.
→ Version 1.15: Added OldBLAST/Legacy option to Object for compatibility with rje_blast_V2. (Always True!)

• rje_blast_V2: Updated from Version 2.1.
→ Version 2.2: Added gablamData() to return old-style GABLAM dictionary from table.
→ Version 2.3: Added blastCluster() method to return UPC clustering and GABLAM distance matrix from a file.
→ Version 2.4: Scrapped BLAST "Run" field to simplify code - keep a single run per BLASTRun object.

• rje_db: Updated from Version 1.0.
→ Version 1.1: Added sortedEntries() function.
→ Version 1.2: Added Table.hasField(field). Add openTable(), readEntry() and readSet() methods.

• rje_forker: Created.
→ Version 0.0: Initial Compilation.

• rje_iridis: Updated from Version 1.8.
→ Version 1.9: Added scanning of legacy folder - moving GOPHER_V2!

• rje_obj: Updated from Version 1.0.
→ Version 1.1: Added rje_zen import and self.zen() to call rje_zen.Zen().wisdom().
→ Version 1.2: Added warnLog functions.
→ Version 1.3: Added perc cmdtype = float that is multiplied by 100.0 if < 1.0. Also added cmdtype = date for YYYY-MM-DD.

• rje_ppi: Updated from Version 2.7.
→ Version 2.8: Tweaked Spring Layout. Stores original Hub and Spoke Field.

• rje_seq: Updated from Version 3.16.
→ Version 3.17: Updated to use BLAST+ and rje_blast_V2

• rje_sequence: Updated from Version 2.2.
→ Version 2.3: Added alternative self.info keys for sequence (for UniProt splice variants). Added SpliceVar dict.

• rje_slimcore: Updated from Version 1.10.
→ Version 1.11: Tidied some of the module imports.
→ Version 1.12: Upgraded BLAST to BLAST+. Can use old BLAST with oldblast=T.

• rje_slimlist: Updated from Version 1.1.
→ Version 1.2: Added some extra functions for CompariMotif Memsaver mode

• rje_tree: Updated from Version 2.9.
→ Version 2.10: Added cleanup of *.r.csv file following R-based PNG generation.

• rje_uniprot: Updated from Version 3.13.
→ Version 3.14: Added direct retrieval of UniProt entries from URL, including full proteomes. Updated output file naming.
→ Version 3.14: Added dblist=LIST and dbsplit=T/F for additional DB link output control. Set unipath default to url.

• rje_xml: Updated from Version 0.1.
→ Version 0.2: Added parsing from URL.

• rje_xref: Updated from Version 0.0.
→ Version 1.0: Added xfrom and xto fields and xMap() function for mapping from one ID set to another.

External Components of SeqSuite

2013-08-21T09:30:00.000+01:00

In addition to the python modules included in the main downloads, some of the programs make use of the additional published programs. Wherever possible, these are freely available for downloading and installing. It is recommended that the user downloads and installs these programs according to the instructions given on the appropriate website.

Common programs

Some of the more common programs are listed below. The websites and instructions listed are subject to change, so it is advisable to Google for updated information if in doubt.

ALIGN: This is part of the Fasta package (Pearson, 1994; Pearson, 2000) and can be downloaded from the University of Virginia. Make sure that align is part of the download. For some reason it seems to have been dropped from later packages. You may need to install an earlier package first (e.g. 2.1) and then a later package. ALIGN is not a core component of any SeqSuite program and need not be installed.

BLAST(+): BLAST (Altschul, et al., 1990) and BLAST+ are freely available for download from NCBI. BLAST has now largely been superseded by BLAST+ but some programs are still restricted to BLAST at the moment. Other tools can be made to use BLAST using oldblast=T.

CLUSTALW: ClustalW (Higgins and Sharp, 1988; Thompson, et al., 1994) is an old stalwart for bioinformatics and is freely available from EMBL: ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW/. Note that CLUSTALW is used as a backup for ClustalO (below) and to draw trees. See Replacing Components with Other Programs (below) for details of how to incorporate other tree-drawing packages.

CLUSTAL Omega: CLUSTALO is a newer multiple alignment program from the Clustal team, available from clustal.org. (See below for more multiple alignment options.)

"The last alignment program you'll ever need."

R: The statistical programming language, R, is used for PNG visualisation by some SeqSuite programs. R is freely available from: http://cran.r-project.org/. Note that some installations of R can require a bit of tweaking of the R scripts provided (in libraries/r/). Please email seqsuite@gmail.com if you require some help with this and/or have problems with the R-coded PNG visualisations.

It is recommended that paths to these programs are placed into an INI file (see Command-line Options). These can usually be replaced with different programs if desired (Replacing Components with Other Programs).

Replacing Components with Other Programs

The most important functions performed by the external programs alignment and tree-drawing. This section lists some ways to incorporate alternative programs for these functions into RJE programs. I am always interested to add more functionality, so if there is a program you would like to use instead of those listed, then please contact me and I may be able to add them in a more controlled fashion than below.

Alignment programs

By default, Clustal Omega is used for alignments as I have found this to be both fast and accurate. There can be problems with memory allocation for larger datasets and so and ClustalW (Higgins and Sharp, 1988; Thompson, et al., 1994) is used for large datasets above a certain total number of residues (as determined by the cwcut=X parameter). Either of these programs can be replaced, however, by another program that uses the same command-line format call the programs.

For ClustalW, the system call is:

clustalw INFILE

where INFILE is in fasta format (*.fas) and the output file (*.aln) is in ClustalW align format. The path to ClustalW can be changed to redirect to another program using the clustalw=COMMAND option. (This maybe written as clustalw=PATH in places but the full path including the clustalw program should be given.)

The following alignment program options can currently be used with SeqSuite programs:

clustalw=COMMAND : Path to CLUSTALW program ['clustalw']
clustalo=COMMAND : Path to CLUSTAL Omega program ['clustalo']
mafft=COMMAND    : Path to MAFFT alignment program ['mafft']
muscle=COMMAND   : Path to MUSCLE ['muscle']            
fsa=COMMAND      : Path to FSA alignment program ['fsa']            
pagan=COMMAND    : Path to PAGAN alignment program ['pagan']            
alnprog=X        : Choice of alignment program to use (clustalw/clustalo/muscle/mafft/fsa/pagan) [clustalo]

Any of these could be replaced with another script or program with the same input/output. For example, muscle=PATH could be used to redirect to any program using the system: program -in INFILE -out OUTFILE, where INFILE and OUTFILE are both fasta format. (Remember to set alnprog=muscle.)

Tree-drawing programs

The default for SeqSuite programs is to use the Neighbour-joining method implemented in ClustalW for drawing trees. Although this is not the most accurate phylogeny construction algorithm around, it is fast and efficient and reasonable for trees of closely-related sequences with high bootstrap support, such as those HAQESAC was designed to build and work with.

Again, this program can be replaced with another using the maketree=PATH option. The system call used is:

clustalw -infile=INFILE -bootstrap=X -seed=X [-kimura]

for UNIX, or

clustalw INFILE -bootstrap=X -seed=X [-kimura]

for Windows, where INFILE is in fasta format (*.fas) and the output file (*.phb) is in bootstrapped Phylip format (I think).

It should work to have a program output a Newick Standard Format tree as *.nsf but I have not tested that. Phylip tree-drawing is also implemented. See rje_tree module documentation for details. Other phylogenetics programs can be added on request - anything able to generate Phylip or Newick format trees should be easy to add.

Wrapper scripts

If the chosen program does not accept the same input/output commands/formats then a wrapper script should be written. It is suggested to use Perl or Python for this. Although I cannot promise help in every suggestion, you are welcome to e-mail me for help with this and I will see what I can do.

Incorporating Other Programs into the Python Code

If you are feeling brave, you can actually edit the Python modules themselves. The key methods for this are rje_seq.muscleAln(), rje_seq.clustalAln() and rje_tree.makeTree(). Obviously, I cannot promise to give technical support for any changes that are made but, if you know what you are doing, you should be OK and I will help where I can.

References

This reference list needs completing but references for the older software listed include:

Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ (1990). Basic local alignment search tool. J Mol Biol, 215: 403-410.
Edgar RC (2004). MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics, 5: 113.
Higgins DG and Sharp PM (1988). CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene, 73: 237-244.
Pearson WR (1994). Using the FASTA program to search protein and DNA sequence databases. Methods Mol Biol, 24: 307-331.
Pearson WR (2000). Flexible sequence similarity searching with the FASTA3 program package. Methods Mol Biol, 132: 185-219.
Thompson JD, Higgins DG and Gibson TJ (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, 22: 4673-4680.

Command-line Options

2013-08-20T10:10:00.000+01:00

The behaviour of all of the programs is subject to modification via the setting of command-line options. Some of these are generic and apply to most/all SLiMSuite programs - see the rje.py documentation for these, or the section below - whereas others are program specific.

Setting commandline options

Commandline options have two parts: the argument and the value. These can be fed to programs in one of two formats:

argument=value
-argument value

These two lines have equivalent functions. The two styles can be mixed within a program call, e.g.

python program.py arg1=val1 -arg2 val2

Options can also be supplied within *.ini files (see below).

Option Types

There are essentially three types of command-line option:

Those that require a value (numerical or text), option=X. Those that require a filename as the value will be witten: option=FILE. Those that require a directory path as the value will be witten: option=PATH. Those that lead to an accessory application (rather than just its path) may also be listed as option=COMMAND. Paths and filenames should always use forward slash (/) separators, whatever the operating system.
True/False (On/Off) options, option=T/F. For these options:
- option=F and option=False are the same and turn the option off.
- option (or -option), option=T and option=True are the same and turn the option on.
List options. These are like the value options but have multiple values, separated by commas: option=X,Y. Where .. is used, the number elements is optional, e.g. option=X,Y,..,Z could take option=X or option=A,B,C,D. Where option=LIST is used, the number of elements is optional and LIST could actually be the name of a file containing the list of elements.

Long option values, whitespace and special characters

Some characters, such as whitespace, commas, pipes (“|”) and ampersands, will be interpreted by UNIX in particular ways from the commandline. If you have such characters within the option value, then either place the settings in an INI file (see below) or enclose the option value in quotes. If the value contains whitespace, double quotes will be needed even within an INI file, as whitespace is used to delimit commandline options, e.g.

python program.py option="Two words" limits="2,3"

NB. For PATH variables, directories should be separated by a forward slash (/). If paths contain spaces, they must be enclosed in double quotes:

path="example path".

It is recommended that paths do not contain spaces as function cannot be guaranteed if they do.

INI Files

As well as feeding commands in on the command-line, any options listed can also be save in a plain text file and called using the option ini=FILE. The precedence of loading default run settings from ini files is slightly complex but (hopefully) makes sense once it is clear that there is two kinds of precedence being invoked:

For each ini file there is a directory precedence determining where to look for that file. Once the file is found, commands from that file will be read in and the program will stop looking for other versions of the file. Each ini file is looked for:
- in the current directory from which the run command is being executed
- the directory containing the program being run. (Under usual circumstances, it is not recommended to put ini files in these directories, using instead:
- the settings/ directory of the distribution. This is the recommended location for default ini files and universal default values for all runs should be put here.
For each ini file that is read in, each command has a setting precedence as described below, such that later values will over-rule earlier values for the same argument. Default ini files (if present) are read in the following order:
- Global defaults are read from a defaults.ini file. (This is recommended.)
- System defaults are read from an rje.ini file. (This file is not recommended and is largely for development reasons.)
- Program defaults are read from the file named after the program (e.g. haqesac.ini for HAQESAC). (This will be the same root filename as the default *.log file if you are not sure.)

For example, if you are running haqesac.py in a directory containing haqesac.ini, the full list of commandline arguments will be any in PATH/settings/defaults.ini (if it exists) plus any in PATH/settings/rje.ini (if it exists) plus the contents of ./haqesac.ini plus the options given on the commandline. If, on the other hand, there is no ./haqesac.ini file, options will instead be read from PATH/settings/haqesac.ini (if it exists). (The PATH/ is determined using the path given to the haqesac.py.) If any of these files have been placed in tools/ instead (not recommended), these will be used in place of those from settings/.

It is recommended that a defaults.ini file is made and placed in the settings/ directory. This file should contain the paths to the External Programs used by RJE programs:

blastpath=PATH
blast+path=PATH
fastapath=PATH
clustalw=COMMAND
muscle=COMMAND

Note that the first three are just paths to the programs, while for ClustalW and MUSCLE the actual program commands themselves must be included. This is to make it easier to replace these programs with alternatives.

If running in windows, it is also advisable to add the win32=T command to the defaults.ini file.

INI File formatting

INI files are simple plain text files. Several commands can be put on a single line, although it is generally clearer to stick to one command per line. Any text on a line following a hash (#) will be treated as a comment and ignored unless it is part of an option value in double quotes. This allows INI files to be documented.

Option Precedence

Later options will supersede earlier ones if they are mutually exclusive. Options from an INI file will be inserted into the list at the point the ini=FILE command is called. (Default *.ini files are read in the order listed above, i.e. options from the defaults.ini file are read first, followed by the program.ini file.) This means that ini file options can be over-ruled, e.g. program.py ini=eg.ini i=1 will supersede any interactivity setting in eg.ini with i=1, whereas program.py i=1 ini=eg.ini will use any interactivity setting in eg.ini and over-rule i=1.

Interactivity and Verbosity settings

By default, the programs are generally setup to run through to completion without any user-interaction if given all the options it needs. For more interaction with the program as it runs, use the argument i=1.

python xxx.py commandlist i=1

Both the level of interactivity and the amount printed to screen can be altered, using the interactivity i=X and verbosity v=X command-line options, respectively, where X is the level from none (-1) to lots (2+). Although in theory i=-1 and v=-1 will ask for nothing and show nothing, there is a chance that some print statements will have escaped in these early versions of the program. There is also the possibility that accessory programs may print things to the screen beyond the control of the calling program. Please report any that you spot!

Please report any irritations and suggestions for changes to what is printed at different verbosity levels.

General Command-line Options

Along with the some of the options listed above, there are a number of core options that are used in many or all of the SLiMSuite programs. Defaults are given in square brackets.

NOTE: Default settings might vary between programs. To set global defaults, it is recommended to put these options in the defaults.ini file.

Help and Program Logs

help            : Prints help documentation to screen.
v=X             : Sets screen verbosity (-1 for silent) [0]
i=X             : Sets interactivity (-1 for full auto) [0]
silent=T/F      : If set to True will not write to screen or log. [False]
log=FILE        : Redirect log to FILE [program.log]
newlog=T/F      : Create new log file. [False]
errorlog=FILE   : If given, will write errors to an additional error file. [None]

General Input/Output Options

outfile=FILE    : This will set the 'root' filename for (non-log) output files in most programs (FILE.*) [None]
basefile=FILE   : Equivalent of log=FILE outfile=FILE. [None]
force=T/F       : Force to regenerate data rather than keep old results. [False]
append=T/F      : Append to results files rather than overwrite. [False]
backups=T/F     : If True, option given to backup certain files if append=F. [True]
delimit=X       : Sets standard delimiter for results output files. [varies]
mysql=T/F       : “MySQL output” with lowercase headers that lack spacers. (Not all programs) [False]

System settings

win32=T/F       : Run in Win32 Mode for Windows operation. [False]
memsaver=T/F    : Run in “Memory Saver” mode. Varies with program. [False]
runpath=PATH    : Run program as if in given path (log files and some programs only) [PATH called from]
rpath=COMMAND   : Path to installation of R. ['R']
maxbin=X        : Maximum number of trials for using binomial (else use Poisson) [∞]

Forking Options

forks=X         : Number of forks. (Some programs only.) [0]
killforks=X     : Number of seconds of inactivity before killing forks. [3600]
noforks=T/F     : Over-ride and cancel forking if True. [False]

This information is also available by printing the __doc__ attribute of the rje.py module at a Python prompt (print rje.__doc__), or using the help option: python rje.py help. Please contact me if you want any further details of a specific option and/or advice as to when (not) to use it.

Updated programs coming soon...

2013-08-06T10:02:00.000+01:00

SLiMSuite and Seqsuite have been undergoing some tidying and additional tweaks, such as implementing BLAST+ in most programs. The documentation is also undergoing a bit of an overhaul (see the Documentation links in the left sidebar) and so the distribution of the latest code is being held back for a while. If you want access to the latest versions, however, feel free to get in touch. (Particularly if you want to use BLAST+ with SLiMSuite or HAQESAC.)

New look Bioware

2013-08-01T21:58:00.000+01:00

The Bioware server has a new(ish!) look! The function of the tools should be much the same (although various updates are in progress) but the feel of the site should hopefully be cleaner and more consistent on mobile devices. Feedback welcome!

Availability, Installation and Setup

2013-08-01T10:29:00.000+01:00

SLiMSuite and Seqsuite are currently available from http://bioware.soton.ac.uk as three packages:

SLiMSuite contains software for Short Linear Motif (SLiM) analysis.
SeqSuite contains all of the SLiMSuite programs plus some additional sequence analysis programs.
RJESuite contains SLiMSuite, SeqSuite and a bunch of other miscellaneous utilities and bits and bobs.

In future, it is envisaged that a single Git repository will contain all the relevant code and documentation.

All three packages have the same basic installation, directory structure and setup requirements. For basic functionality, no other setup should be necessary beyond downloading and unzipping the package in the desired directory if Python is installed on your system. Some programs will need to use external components or accessory applications, which may need additional installation.

If you do not have Python, you can download it free from www.python.org at http://www.python.org/download/. The modules are written in Python 2.x and most have been tested with 2.7. The Python website has good information about how to download and install Python but if you have any problems, please get in touch and I will help if I can.

All the required files should have been provided in the download zip file. The Python Modules are open source and may be changed if desired, although please give me credit for any useful bits you pillage. I cannot accept any responsibility if you make changes and the program stops working, however! If you want some help understanding the way the modules and classes are set up so you can edit them, just contact me.

Directory Structure

Once unzipped, the download will unpack a top level seqsuite/ or slimsuite/ directory with the following subdirectories:

data/ contains example data for testing programs. (Currently under development.)

docs/ contains documentation.

extras/ contains accessory programs that are not part of the main program suite.

legacy/ contains superseded programs that are no longer supported. (Currently under development.)

libraries/ contains all the python libraries used by the main tools (and extras), some of which have standalone functionality.

settings/ contains INI files set default options.

tools/ contains the main program suite.

NOTE: It is recommended that analyses are performed outside these directories for ease of reinstallation.

Third party software

Many of the tools make use of third party software. Where possible, instructions will be provided for obtaining these programs but a quick Google is usually sufficient - wherever possible, third party software is free for academic use and (ideally) open source.

When third party software is used, SeqSuite will also need to the path to the program, or suite of programs. This will be covered more in the Command-line Options section but BLAST and clustalw deserve a special mention as examples because many of the programs use these as default programs for certain functions.

BLAST is actually a suite of programs and the path containing these executables should be provided using `blastpath=PATH/', e.g.:

blastpath=/usr/ncbi/bin/

For BLAST, do not give the full path to the program (e.g. blastpath=/usr/ncbi/bin/blastp). BLAST cannot be replaced easily by other programs. BLAST has now largely been superseded by BLAST+, which needs its own path parameter:

blast+path=PATH

Some programs are still restricted to BLAST at the moment and other tools can be made to use the BLAST with the oldblast=T switch.

Clustalw is a useful standalone program that is used as a default for alignments and trees in the absence of newer (better) programs. For this, and other single executables, the full path to the program is given:

clustalw=/usr/bioware/clustalw1.83/clustalw

In these situations, a different program with the same input and output can be substituted.

NOTE: Remember to set the relevant paths in an appropriate *.ini file in settings/. Where possible, error messages will identify issues with third party software but due to a lack of testing on a diversity of systems, this is not always possible. If a program crashes, please check the *.log file for signs that there may be a problem with the installation and/or path given for third party programs, such as BLAST.

Upgrading

At present, each upgrade is distributed as a separate package. You can check the current version by the date in the name of the distribution file (in ISO 8601 standard, YYYY-MM-DD format). Plans are afoot to switch to a Git repository, which will make upgrades easier.

Getting Help

2013-07-29T16:59:00.000+01:00

Much of the information here is also contained in the documentation of the Python modules themselves. A full list of command-line parameters can be printed to screen using the help option, with short descriptions for each one:

python program.py help
python program.py -help
python program.py -h

Details of command-line options specific to each program can also be found in the distributed readme.txt and readme.html files.

If stuck, or something is unclear, then please e-mail me (seqsuite@gmail.com) whatever question you have. If it is the results of an error message, then please send me that and the log file too.

SLiMScape: a protein short linear motif analysis plugin for Cytoscape.

2013-07-17T17:23:00.001+01:00

New paper published!

O’Brien KT, Haslam NJ & Shields DC (2013). SLiMScape: a protein short linear motif analysis plugin for Cytoscape. BMC Bioinformatics 14(1):224. [Epub ahead of print]

BACKGROUND: Computational protein short linear motif discovery can use protein interaction information to search for motifs among proteins which share a common interactor. Cytoscape provides a visual interface for protein networks but there is no streamlined way to rapidly visualize motifs in a network of proteins, or to integrate computational discovery with such visualizations.

RESULTS: We present SLiMScape, a Cytoscape plugin, which enables both de novo motif discovery and searches for instances of known motifs. Data is presented using Cytoscape’s visualization features thus providing an intuitive interface for interpreting results. The distribution of discovered or user defined motifs may be selectively displayed and the distribution of protein domains may be viewed simultaneously. To facilitate this SLiMScape automatically retrieves domains for each protein.

CONCLUSION: SLiMScape provides a platform for performing short linear motif analyses of protein interaction networks by integrating motif discovery and searchtools in a network visualization environment. This significantly aids in the discovery of novel short linear motifs and in visualizing the distributionof known motifs.

PMID: 23855714

SLiMSuite at the OMICS Group 3rd International Conference on Proteomics & Bioinformatics

2013-07-13T22:58:00.001+01:00

If anyone is attending the OMICS Group 3rd International Conference on Proteomics & Bioinformatics this week then be sure to say hello. I am speaking on the last day in the “Computational Biology” track.. (Never the best time to talk at a conference as there is limited time for follow up but at least it is before lunch!)

SLiM Pickings: mining structural and sequence data for the prediction of short linear protein interaction motifs

Short Linear Motifs (SLiMs) are short functional protein sequences that act as ligands to mediate transient protein-protein interactions (PPI) in critical biological pathways and signaling networks. SLiMs are short (3-15aa), generally tolerate considerable sequence variation and typically have fewer than five residues critical for function. These features result in a degree of evolutionary plasticity not seen in domains and SLiMs often add new functions to proteins by convergent evolution. They also present a challenge for computational identification, making it difficult to differentiate biological signal from stochastic patterns. Despite this, discovering new SLiMs is of great interest due to their potential as therapeutic targets.

In recent years, we have made great progress in SLiM discovery, particularly through development of the SLiMSuite package of bioinformatics tools. SLiMs generally occur in structurally disordered regions of proteins and exhibit evolutionary conservation relative to other disordered residues. SLiMFinder uses this knowledge and exploits patterns of convergent evolution to predict novel, over-represented motifs within a statistical framework with high specificity. Applying this approach to a comprehensive set of human PPI data has highlighted interactome complexity and quality as the next challenges for SLiM prediction. Our latest development, QSLiMFinder (“Query” SLiMFinder) tackles some of these issues by incorporating specific interaction data to restrict the motif search space, which improves both the sensitivity and biological relevance of predictions. We are now using QSLiMFinder to combine structurally defined domain-motif interactions with large-scale PPI data to perform large-scale de novo SLiM prediction.

Documentation

2013-07-10T16:41:00.000+01:00

SLiMSuite and SeqSuite have grown into rather unwieldy beasts since their origins as individual programs and the documentation has struggled to keep up. In particular, the original plan of a single PDF manual per program is getting creaky. Because of the shared reliance on common modules, multiple programs make use of the same sets of options for alignments and conservation scoring etc. and propagating tweaks and modifications through all the manuals can be a bit head-wrecking.

As a result of all of this, the documentation currently undergoing a bit of a review and rethink. I am still keen to keep the PDF manuals (as I think they are useful) but will be working through an intermediate phase of online Markdown/HTML documentation of some kind. The current plan is to trickle out draft copies via the blog and then probably release a Git repository once sufficiently populated.

In the meantime, I would be interested to hear any thoughts regarding favoured documentation styles etc. (e.g. HTML vs PDF, large files vs small chunks) as well as bits that are particularly unclear or in need of attention.

New Software Release

2013-07-08T13:50:00.003+01:00

New releases of SeqSuite, SLiMSuite and RJESuite are now available.

The biggest change since the last release is the renaming of SLiMSearch to SLiMProb. This is to avoid confusion between the old SLiMSearch 1.x (now SLiMProb) and the newer SLiMSearch 2.x webserver, which has a different range of functions.

Updates since last release:

• cpppred: Created.

• gopher: Updated from Version 3.1.
→ Version 3.2: Minor tweak to prevent unwanted directory generation for programs using existing GOPHER alignments.
→ Version 3.3: Added rje_blast_V2 to use BLAST+. Run with legacy=T to stick with old NCBI BLAST. Started utilising rje_seqlist.

• pepbindpred: Created.

• slimprob: Created.
→ Version 1.0: SLiMProb 1.0 based on SLiMSearch 1.7. Altered output files to be *.csv and *.occ.csv.

• file_monster: Updated from Version 2.0.
→ Version 2.1: Added dirsum function.

• rje: Updated from Version 4.5.
→ Version 4.6: Added dev and warn options.

• rje_blast_V2: Created.
→ Version 2.0: Initial Compilation from rje_blast_V1 V1.14.
→ Version 2.1: Tweaking code to work with GOPHER 3.x - removing self.info etc. Added blastObj() method.

• rje_db: Updated from Version 0.4.
→ Version 0.5: Initial coding of index mode. (Not yet fully functional.)
→ Version 1.0: Working, so upgraded to version 1.0!

• rje_obj: Updated from Version 0.0.
→ Version 1.0: Fully working version, so upgraded to 1.0. Added dev and warn options.

• rje_seq: Updated from Version 3.15.
→ Version 3.16: Added BLAST+ path and seqFromBlastDBCmd()

• rje_slimcalc: Updated from Version 0.5.
→ Version 0.6: Minor tweak to avoid unwanted GOPHER directory generation.
→ Version 0.7: Added RLC to "All" conscore running.

• rje_slimcore: Updated from Version 1.9.
→ Version 1.10: Bypass UPC generation for single sequences.

Documentation is still in the process of development. BLAST+ implementation is ongoing - please get in touch if this is something you need.

Second QSLiMFinder poster now on F1000 Posters

2013-04-29T16:19:00.001+01:00

The second QSLiMFinder poster from the recent Cold Spring Harbor Laboratory "Systems Biology: Networks" meeting is now available at F1000 Posters:

Edwards RJ & Palopoli N. Computational prediction of short linear motifs mediating host-pathogen protein-protein interactions.

(I'm not sure why the last post about the other poster disappeared for a few days but it's back now!