# $Id: INSTALL,v 1.37 2004/12/01 13:00:12 bosborne Exp $ o BIOPERL INSTALLATION o SYSTEM REQUIREMENTS o OPTIONAL o ADDITIONAL INSTALLATION INFORMATION o THE BIOPERL BUNDLE o INSTALLING BIOPERL THE EASY WAY USING CPAN o INSTALLING BIOPERL THE EASY WAY USING 'make' o WHERE ARE THE MAN PAGES? o EXTERNAL PROGRAMS o INSTALLING BIOPERL SCRIPTS o INSTALLING BIOPERL IN A PERSONAL MODULE AREA o INSTALLING BIOPERL MODULES THE HARD WAY o USING MODULES NOT INSTALLED IN THE STANDARD LOCATION o THE TEST SYSTEM o BUILDING THE OPTIONAL bioperl-ext PACKAGE o DEPENDENCIES AND Bundle::BioPerl o BIOPERL INSTALLATION Bioperl has been installed on many forms of Unix, Win9X/NT/2000/XP, and on Mac OS (see the PLATFORMS file for more details). For instructions on Unix and Mac OS read on, for installation on Windows please see the INSTALL.WIN file. o SYSTEM REQUIREMENTS - perl 5.005 or later*. - External modules: Bioperl uses functionality provided in other Perl modules. Some of these are included in the standard perl package but some need to be obtained from the CPAN site. The list of external modules is included at the bottom of this INSTALL document. The CPAN Bioperl Bundle (Bundle::BioPerl) makes installation of these external modules easy. Simply install the bundle using your CPAN shell and all necessary modules will be installed. See THE BIOPERL BUNDLE, below. * Note that most modules will work with earlier versions of Perl. The only ones that will not are Bio::SimpleAlign.pm and the Bio::Index::* modules. If you don't need these modules and you want to install bioperl using an earlier version of Perl, edit the "require 5.005;" line in Makefile.PL as necessary. o OPTIONAL - ANSI C or Gnu C compiler for XS extensions (bioperl-ext package, see BUILDING THE OPTIONAL bioperl-ext PACKAGE, below. o ADDITIONAL INSTALLATION INFORMATION - Additional information on Bioperl and MAC OS: OS 9 - http://bioperl.org/Core/mac-bioperl.html OS X - http://www.tc.umn.edu/~cann0010/Bioperl_OSX_install.html o THE BIOPERL BUNDLE You typically need root privileges to install using CPAN. If you don't have these privileges please see INSTALLING BIOPERL IN A PERSONAL MODULE AREA for additional information. Install the Bioperl Bundle using CPAN. One way: >perl -MCPAN -e "install Bundle::BioPerl" Another way: >perl -MCPAN -e shell cpan>install Bundle::BioPerl o INSTALLING BIOPERL THE EASY WAY USING CPAN You can use the CPAN shell to install Bioperl. For example: >perl -MCPAN -e shell Then find the name of the Bioperl version you want: cpan>d /bioperl/ CPAN: Storable loaded ok Going to read /home/bosborne/.cpan/Metadata Database was generated on Tue, 24 Feb 2004 23:55:23 GMT Distribution B/BI/BIRNEY/bioperl-1.2.tar.gz Distribution B/BI/BIRNEY/bioperl-1.4.tar.gz Now install: cpan>install B/BI/BIRNEY/bioperl-1.4.tar.gz If you've installed everything perfectly and all the network connections are working then you may pass all the tests run in the 'make test' phase. It's also possible that you may fail some tests. Possible explanations: problems with local Perl installation, network problems, previously undetected bug in Bioperl, flawed test script, problems with CGI script using for sequence retrieval at public database, and so on. Remember that there are over 700 modules in Bioperl and the test suite is running almost 9000 individual tests, a few failed tests may not affect your usage of Bioperl. If you decide that the failed tests will not affect how you intend to use Bioperl and you'd like to install anyway do: cpan>force install B/BI/BIRNEY/bioperl-1.4.tar.gz This is what most experienced Bioperl users would do. However, if you're concerned about a failed test and need assistance or advice then contact bioperl-l@bioperl.org. o INSTALLING BIOPERL THE EASY WAY USING 'make' The advantage of this approach is it's stepwise, so it's easy to stop and analyze in case of any problem. Download, then unpack the tar file. For example: >gunzip bioperl-1.2.tar.gz >tar xvf bioperl-1.2.tar >cd bioperl-1.2 Now issue the make commands: >perl Makefile.PL >make >make test If you've installed everything perfectly and all the network connections are working then you may pass all the tests run in the 'make test' phase. It's also possible that you may fail some tests. Possible explanations: problems with local Perl installation, network problems, previously undetected bug in Bioperl, flawed test script, problems with CGI script using for sequence retrieval at public database, and so on. Remember that there are over 700 modules in Bioperl and the test suite is running almost 9000 individual tests, a few failed tests may not affect your usage of Bioperl. If you decide that the failed tests will not affect how you intend to use Bioperl and you'd like to install anyway do: >make install This is what most experienced Bioperl users would do. However, if you're concerned about a failed test and need assistance or advice then contact bioperl-l@bioperl.org. To 'make install' you need write permission in the perl5/site_perl/ source area. Quite often this will require you becoming root, so you will want to talk to your systems manager if you don't have the necessary privileges. It is possible to install the package outside of the standard Perl5 location. See INSTALLING BIOPERL IN A PERSONAL MODULE AREA, below. o WHERE ARE THE MAN PAGES? We had to disable the automatic creation of man pages because this step was triggering a "line too long" error on some OSs due to shell constraints. If you'd like to try and create them comment out or delete the MY::manifypods sub in Makefile.PL before you issue the 'perl Makefile.PL' step. o EXTERNAL PROGRAMS Bioperl can interface with some external programs for executing analyses. These include clustalw and t_coffee for Multiple Sequence Alignments (Bio::Tools::Run::Alignment::Clustalw and Bio::Tools::Run::TCoffee) and blastall,blastpgp, & bl2seq for BLAST analyses (Bio::Tools::Run::StandAloneBlast), and to all the programs in the EMBOSS suite (Bio::Factory::EMBOSS). - Environment Variables Some modules which run external programs need certain environment variables set. If you do not have a local copy of the specific executable you do not need to set these variables. Additionally the modules will attempt to locate the specific applications in your runtime PATH variable. You may also need to set an environment variable to tell BioPerl about your network configuration if your site uses a firewall. Setting environment variables on unix means adding lines like the following to your shell *rc file. For bash or sh: export BLASTDIR=/data1/blast For csh or tcsh: setenv BLASTDIR /data1/blast The environment variables include: Bio::Tools::Run::StandAloneBlast BLASTDIR - which specifies where the NCBI blastall, blastpgp, bl2seq, etc.. are located. A 'data' directory is assumed to be present in this dir as well where the blastable databases are located as well as substitution matricies. BLASTDATADIR or BLASTDB - (either is optional) if one does not want to locate the data dir within the same dir as where the BLASTDIR variable points, a BLASTDATADIR or BLASTDB variable can be set to point to a dir where BLAST database indexes are located. Bio::Tools::Run::Alignment::Clustalw CLUSTALDIR - points to the directory where the clustalw executable is located. Bio::Tools::Run::Alignment::TCoffee TCOFFEEDIR - points to the directory where the t_coffee executable is located. HTTP_PROXY - If you access the internet via a proxy server then you can tell the Bioperl modules which require network access about this by using the HTTP_PROXY environment variable. The value set includes the proxy address and the port used (e.g. http://wwwcache.example.com:8080). o INSTALLING BIOPERL SCRIPTS Bioperl comes with a set of production-quality scripts that are kept in the scripts/ directory. You can install these scripts if you'd like, simply answer the questions on 'make install'. The installation directory is specified by the INSTALLSCRIPT variable in the Makefile, the default location is /usr/bin. Installation will copy the scripts to the specified directory, change the 'PLS' suffix to 'pl', and prepend 'bp_' to all the script names if they aren't so named already. o INSTALLING BIOPERL IN A PERSONAL MODULE AREA If you lack permission to install perl modules into the standard site_perl/ system area you can configure bioperl to install itself anywhere you choose. Ideally this would be a personal perl directory or standard place where you plan to put all your 'local' or personal perl modules. Simply pass a parameter to perl as it builds your system specific makefile. Example: >perl Makefile.PL LIB=/home/users/dag/My_Local_Perl_Modules >make >make test >make install This tells perl to install bioperl in the desired place, e.g.: /home/users/dag/My_Local_Perl_Modules/Bio/Seq.pm Then in your Bioperl script you would write: use lib "/home/users/dag/My_Local_Perl_Modules"; use Bio::Seq; The man pages will probably be installed in $LIB/man. For more information on these sorts of custom installs see the documentation for ExtUtils::MakeMaker. You can also use CPAN to install accessory modules in your local directory. First enter the CPAN shell, then set the arguments for the command "perl Makefile.PL", like this: >perl -e shell -MCPAN cpan>o conf makepl_arg LIB=/home/users/dag/My_Local_Perl_Modules o INSTALLING BIOPERL MODULES THE HARD WAY As a last resort, you can simply copy all files in Bio/ to any directory in which you have write privileges. This is generally NOT recommended since some modules may require special configuration (currently none do, but don't rely on this). You will need to set "use lib '/path/to/my/bioperl/modules';" in your perl scripts so that you can access these modules if they are not installed in the standard site_perl/ location. See above for an example. To get manpage documentation to work correctly you will have to configure man so that it looks in the proper directory. On most systems this will just involve adding an additional directory to your $MANPATH environment variable. The installation of the Compile directory can be similarly redirected, but execute the make commands from the Compile/SW directory. If all else fails or are unable to access the perl distribution directories, ask your system administrator to place the files there for you. You can always execute perl scripts in the same directory as the location of the modules (Bio/ in the distribution) since perl always checks the current working directory when looking for modules. o USING MODULES NOT INSTALLED IN THE STANDARD LOCATION You can explicitly tell perl where to look for modules by using the lib module which comes standard with perl. Example: #!/usr/bin/perl use lib "/home/users/dag/My_Local_Perl_Modules/"; use Bio::Seq; <...insert whizzy perl code here...> Or, you can set the environmental variable PERL5LIB: csh or tcsh: setenv PERL5LIB /home/users/dag/My_Local_Perl_Modules/ bash or sh: export PERL5LIB=/home/users/dag/My_Local_Perl_Modules/ o THE TEST SYSTEM The Bioperl test system is located in the t/ directory and is automatically run whenever you execute the 'make test' command. Alternatively if you want to investigate the behavior of a specific test such as the SeqIO test you would type: >perl -I. -w t/SeqIO.t The -I tells Perl to use the current directory as the include path - this makes sure you are testing the modules in this directory not ones installed elsewhere in your PERL5LIB path. The -w tells Perl to print all warnings. If you are trying to learn how to use a module, often the test suite is a good place to look. All good extreme programmers try and write a test BEFORE they write the module to insure that their module behaves the way they expect. You'll notice some 'ok' and 'skip' commands in a test, this is part of the Perl test suite that signifies a passed test with an 'ok N', where N is the test number. Alternatively you can tell Perl to skip tests. This is useful when, for example, your test detects that the network is not present and thus should skip, not fail, any tests that require a network connection. o BUILDING THE OPTIONAL bioperl-ext PACKAGE The bioperl-ext package contains C code and XS extensions for various alignment and trace file modules (Bio::Tools::pSW for DNA Smith-Waterman, Bio::Tools::dpAlign for protein Smith-Waterman, Bio::SearchDist for EVD fitting of extreme value, Bio::SeqIO::staden). This Installation works out-of-the box for all platforms except BSD and Solaris boxes. For other platforms skip this next paragraph. - CONFIGURING for BSD and Solaris boxes You should add the line -fPIC to the CFLAGS line in Compile/SW/libs/makefile. This makes the compile generate position independent code, which is required for these architectures. In addition, on some Solaris boxes, the generated Makefile does not make the correct -fPIC/-fpic flags for the C compiler that is used. This requires manual editing of the generated Makefile to switch case. Try it out once, and if you get errors, try editing the -fpic line - INSTALLATION Move to the directory bioperl-ext. This is available as a separate package released from ftp://bioperl.org/pub/DIST. This is where the C code and XS extension for the bp_sw module is held and execute these commands: (possibly after making the change for *BSD and Solaris, as detailed above) perl Makefile.PL # makes the system specific makefile # Solaris/BSD users might need to edit the Makefile here make # builds all the libaries make test # runs a short test make install # installs the package correctly. This should install the compiled extension. The Bio::Tools::pSW module will work cleanly now. o DEPENDENCIES AND Bundle::BioPerl The following packages are used by Bioperl. Not all are required for Bioperl to operate properly, however, some functionality will be missing without them. You can easily install all of these, except srsperl.pm, using the Bundle::BioPerl CPAN bundle. The DBD::mysql, DB_File and XML::Parser modules require other applications or databases: MySQL, Berkeley DB, and expat respectively. Module Where it is Used ------------------------------------------------------------------ HTTP::Request::Common GenBank+GenPept sequence retrieval, remote http Blast jobs Bio::DB::* Bio::Tools::Run::RemoteBlast LWP::UserAgent GenBank+GenPept sequence retrieval, remote http Blast jobs Bio::DB::* Bio::Tools::Run::RemoteBlast AcePerl Access to ACeDB databases Bio::DB::Ace Available at http://stein.cshl.org IO::String IO handle to read or write to a string Bio::SeqIO Bio::Variation::* Bio::DB::* Bio::Index::Blast Bio::Tools::* Bio::Biblio::IO Bio::Structure::IO XML::Parser Parsing of XML documents Bio::Biblio::IO::medlinexml Requires expat from http://sourceforge.net/projects/expat/ XML::Writer Parsing + writing of XML documents Bio::SeqIO::game Bio::Variation::* XML::Parser::PerlSAX Parsing of XML documents Bio::SeqIO::game Bio::Variation::* Bio::SearchIO::blastxml Bio::Biblio::IO::medlinexml XML::Twig Parsing of XML documents Bio::Variation::IO::xml File::Temp Temporary File creation Bio::DB::FileCache Bio::DB::XEMBL SOAP::Lite SOAP protocol, XEMBL Services Bio::Biblio::* Bio::DB::XEMBLService HTML::Parser HTML parsing of GDB page Bio::DB::GDB DBD::mysql Mysql API for loading and querying of Mysql-based GFF feature and BioSQL databases Bio::DB::GFF bioperl-db external package bioperl-pipeline external package Mysql DB free from www.mysql.org GD GD graphical drawing library Bio::Graphics Requires GD library from www.boutell.com/gd srsperl Sequence Retrieval System (SRS) alternative way of retrieving sequences Bio::LiveSeq::IO::SRS.pm See README in Bio/LiveSeq/IO Storable Persistent object storage & retrieval Bio::DB::FileCache Text::Shellwords Text parser Bio::Graphics::FeatureFile XML::DOM XML parser Bio::SeqIO::bsml Bio::SeqIO::interpro DB_File Perl access to Berkeley DB Bio::DB::Flat Bio::DB::Fasta Bio::SeqFeature::Collection Bio::Index::* Requires Berkeley DB, from Linux RPM or from www.sleepycat.com Graph::Directed Generic graph data and algorithms Bio::Ontology::SimpleOntologyEngine Data::Stag::ITextWriter Structured Tags datastructures Bio::SeqIO::chadoitext Data::Stag::SxprWriter Structured Tags datastructures Bio::SeqIO::chadosxpr Data::Stag::XMLWriter Structured Tags datastructures Bio::SeqIO::chadoxml Text::Wrap Very optional Bio::SearchIO::Writer::TextResultWriter HTML::Entities If you want to run Web analysis modules Bio::Tools::Analysis::DNA::* Bio::Tools::Analysis::Protein::* Class::AutoClass Used to create objects Bio::Graph::SimpleGraph* Clone Used to clone objects Bio::Graph::ProteinGraph XML::SAX New style SAX parser Bio::SeqIO::bsml_sax XML::SAX::Base New style SAX parser Bio::SeqIO::tigrxml XML::SAX::Writer