您好,欢迎光临本网站![请登录][注册会员]  
文件名称: BioPython_Tutorial.pdf
  所属分类: Python
  开发工具:
  文件大小: 851kb
  下载次数: 0
  上传时间: 2019-08-18
  提 供 者: drji*****
 详细说明:Chapter1 Introduction 1.1 WhatisBiopython? TheBiopythonProjectisaninternationalassociationofdevelopersoffreelyavailablePython(http://www. python.org)toolsforcomputationalmolecularbiology.Thewebsitehttp://www.biopython.orgprovides anonlineresourceformodules,scripts,andweblinksfordevelopersofPython-basedsoftwareforlifescience research. Basically,wejustliketoprograminpythonandwanttomakeitaseasyaspossibletousepythonfor bioinformaticsbycreatinghigh-quality,reusablemodulesandscripts. 1.1.1 WhatcanIfindintheBiopythonpackage ThemainBiopythonreleaseshavelotsoffunctionality,including: •Theabilitytoparsebioinformaticsfilesintopythonutilizabledatastructures,includingsupportfor thefollowingformats: –Blastoutput–bothfromstandaloneandWWWBlast –Clustalw –FASTA –GenBank –PubMedandMedline –ExPASyfiles,likeEnzyme,ProdocandProsite –SCOP,including‘dom’and‘lin’files –UniGene –SwissProt •Filesinthesupportedformatscanbeiteratedoverrecordbyrecordorindexedandaccessedviaa Dictionaryinterface. •Codetodealwithpopularon-linebioinformaticsdestinationssuchas: –NCBI–Blast,EntrezandPubMedservices –ExPASy–ProdocandPrositeentries •Interfacestocommonbioinformaticsprogramssuchas: 5 –StandaloneBlastfromNCBI –Clustalwalignmentprogram. •Astandardsequenceclassthatdealswithsequences,idsonsequences,andsequencefeatures. •Toolsforperformingcommonoperationsonsequences,suchastranslation,transcriptionandweight calculations. •CodetoperformclassificationofdatausingkNearestNeighbors,NaiveBayesorSupportVector Machines. •Codefordealingwithalignments,includingastandardwaytocreateanddealwithsubstitution matrices. •Codemakingiteasytosplitupparallelizabletasksintoseparateprocesses. •GUI-basedprogramstodobasicsequencemanipulations,translations,BLASTing,etc. •Extensivedocumentationandhelpwithusingthemodules,includingthisfile,on-linewikidocumen- tation,thewebsite,andthemailinglist. •IntegrationwithBioSQL,asequencedatabaseschemaalsosupportedbytheBioPerlandBioJava projects. Wehopethisgivesyouplentyofreasonstodownloadandstartusi4.3. 1 Specifying the dictionary keys 4.3.2 Indexing a dictionary using the SEGUID checksum 1. Writing Sequence Files ) 4.4.1 Converting between sequence file formats 4.4.2 Converting a file of sequences to their reverse complements 34 4.4.3 Getting your SeqRecord objects as formatted strings 5 Sequence Alignment Input/ Output 37 5. 1 Parsing or Reading Sequence Alignments 37 5.1.1 Single alignments 37 5.1.2 Multiple alignments 5.1.3 Ambiguous Alignments 42 5.2 Writing Alignments 44 5.2.1 Converting between sequence alignment file formats 45 5.2.2 Getting your Alignment objects as formatted strings 6 BLAST 48 6.1 Running Blast locally 6.2 Running Blast over the Internet 6.3 Saving BLAST output 51 6.4 Parsing BLAST output 6.5 The blast record class 6.6 Deprccated BLAST parsers 54 6.6.1 Parsing plain-text BLAST output 54 6.6.2 Parsing a file full of BLAST runs 57 6.6.3 Finding a bad record somewhere in a huge file 8 6.7 Dealing with PSI-BLAST 59 6.8 Dealing with RPS-BLAST 7 Accessing NCBI's Entrez databases 60 7. 1 Entrez Guidelines 7.2 EInfo: Obtaining information about the Entrez databases 61 7.3 ESearch: Searching the entrez databases 7.4 EPost: Uploading a list of identifiers 7.5 ESummary: Retrieving summaries from primaNy Ds 7.6 EFetch: Downloading full records from Entrez 7. 7 ELink 7.8 EGQucry: Obtaining counts for scarch term 67 7.9 ESpell: Obtaining spelling suggestiONs 67 7.10 Specia. lized parsers 68 7.10.1 Parsing Medline records 68 7.11 Examples .70 7. 11.1 PubMed and medline ..70 7.11.2 Searching, downloading, and parsing Entrez Nucleotide records 71 7.11.3 Searching, downloading, and parsing Gen Bank records 73 7.11.4 Finding the lineage of all orgallisIll 74 7. 12 Using the history and WebEnv 75 7. 12.1 Searching for and downloading sequences using the history 75 7. 12.2 Searching for and downloading abstracts using the history 76 8 Swiss-Prot, Prosite, Prodoc, and ExPASy 78 8.1 Bio SwissProt: Parsing Swiss-Prot files 78 8.1.1 Parsing Swiss-Prot records 78 8.1.2 Parsing the Swiss-Prot keyword and category list 80 8.2 Bio Prosite: Parsing Prosite records 81 8.3 Bio Prosite. Prodoc: Parsing Prodoc records 8.4 Bio ExPASy: Accessing the ExPASy server 8.4.1 Retrieving a Swiss-Prot record 83 8.4.2 Searching Swiss-Prot 83 8.1.3 Retrieving Prosite and prodoc records 84 9 Going 3D: The PDB module 86 9. 1 Structure representation 86 9.1.1 Structure 88 9. 1.2 Modcl 9. 1. 3 Chain 9.1. 4 Resid 9 9.1.5 Atom 90 9.2 Disorder 9.2.1 General appi roac 9.2.2 Disordered at 91 9. 2. 3 Disordered residues 91 9. 3 Hetero residues 9.3.1 Associated problems 9.3.2 Water residues 9.3.3 Other hetero residues 9.4 Some random usage examples 9.5 Common problems in PDB files 9.5.1 Examples 9.5.2 Automatic correction 94 9.5.3 Fatal errors 9.6 Othcr fcaturcs 10 Bio. Pop Gen: Population genetics 96 10.1 GenePop 0.2 Coalescent simulation 10.2.1 Creating scenarios 10.2.2 Running sImcoal2 100 10.3 Other applications 101 10.3.1 FDist: Detecting selection and molecular adaptation 101 10.4 Future Developments 104 11 Supervised learning methods 105 11.1 The Logistic Regression Model 105 11.1.1 Background and Purpose 105 11.1.2 Training the logistic regressiOn nodel 106 11.1.3 Using the logistic regression model for classification 108 11.1.4 Logistic Regression, Linear Discriminant Analysis, and Support Vector Machines 110 11.2 k-Nearest Neighbors 110 11.2.1 Background and purpose 110 11.2.2 Initializing a k-nearest neighbors model 111 11.2.3 Using a k-nearest neighbors model for classification 111 11.3 Naive Bayes 113 11.4 Maximum Entropy 113 11.5 Markov models 113 12 Cookbook- Cool things to do with it 114 12.1 Sequence parsing plus simple plots 114 12.1.1 Histogram of sequence lengths 114 12.1.2 Plot of sequence GCVc 115 12.1.3 Nucleotide dot plots 116 12.2 Dealing with alignments 119 12.2.1 Clustalw 119 12.2.2 Calculating summary information 121 12.2.3 Calculating a quick consensus sequence 121 12.2.4 Position Specific Score Matrices 122 12.2.5 Information Content 123 12.2.6 Translating between Alignment formats 124 12.3 Substitution matrices 124 12.3.1 Using common substitution matrices 125 12.3.2 Creating your own substitution matrix from an alignment 125 12.4 BioSQL-storing sequences in a relational database 126 12.5 InterPro 126 13 Advanced 128 13.1 The Seq Record and SeqFeature classes 128 13.1.1 Sequence Ids and Description- dealing with Segrecords .,128 13.1.2 Features and Annotations-Seq Features 129 13.2 Regression Testing Framework 132 13.2.1 Writing a Regression Test 133 13.3 Parser Design 13 13.4 Substitution Matrices 134 13.4.1 SubsMat 134 13.4.2 FreqTable 136 14 Where to go from here- contributing to Biopython 138 14.1 Maintaining a distribution for a platform .138 14.2 Bug Reports Feature Requests 13 14.3 Contributing Code 139 15 Appendix: Useful stuff about Python 140 15.1 What the heck is a handle? 140 15.1.1 Creating a handle from a string 140 Chapter Introduction 1.1 What is Biopython? TheBiopythonProjectisaninternationalassociationofdevelopersoffreelyavailablePython(http://www python.org)toolsforcomputationalmolecularbiologythewebsitehttp://www.biopython.orgprovides an online resource for modules, scripts, and web links for developers of Python-based software for life science research Basically, we just like to program in py thon and want to make it as easy as possible to use python for bioinformatics by creating high-quality, reusable modules and scripts 1.1.1 What can I find in the Biopython package The main Biopython releases have lots of functionality including The ability to parse bioinformatics files into python utilizable data structurcs, including support for the following formats Blast output- both from standalone and www blast Clustalw FASTA GenBank Pubmed and medline ExPASy files, like Enzyme, Prodoc and Prosite SCOP, including dom' and lin?files niCene Swiss Prot Files in the supported formats can be iterated over record by record or indexed and accessed vi Dictionary interface Code to deal with popular on-line bioinformatics destinations such as ncbi- Blast. Entrez and pubmed services EXPASy-Prodoc and prosite entries e Interfaces to common bioinformatics programs such as Standalone blast from ncbl Clustalw alignment program a standard sequence class that deals with sequences, ids on sequences, and sequence features Tools for performing common operations on sequences, such as translation, transcription and weight calculations Code to perform classification of data using k Nearest Neighbors, Naive Bayes or Support Vector achines Codc for dcaling with alignments, including a standard way to crcatc and dcal with substitution matrices Codc making it casy to split up parallelizable tasks into scparatc processcs GUI-based programs to do basic sequence manipulations, translations, BLASTing. etc Extensive documentation alld help with using the nodules, including this lile, OIl-line wiki documen tation, the web site and the mailing list Integration with BiosQL, a sequence database schema also supported by the BioPerl and bioJava projects We hope t, his gives you plenty of reasons to download and start using Biopython 1.2 Installing Biopython All of the installation information for Biopython was separated from this document to make it easier to keep updated Theshortversionisgotoourdownloadspage(http://biopython.org/wiki/download),downloadand nstall the listed dependencies, then download and install biopython. For Windows we provide prc-compilcd click-and-run installers, while for Unix and other operating systems you must install from source as described in the included ReadME file. This is usua. ly as simple as the st andard command sudo python setup. py install &. The longer version of our installation instructions covers installation of python, Biopython dependencies and Biopython itself IT is available in Pdf(Http: //biopython. org/dist/docs/instAll/insTallation df)andHtmlformats(http://biopython.org/dist/docs/insTalL/inStallation.html) 1.3 FAQ 1. Which“ Numerical python” do i need? For Biopython 1. 48 or earlier, you need the old Numeric module. For Biopython 1.49 onwards, you need the newer NumPy instead. Both Numeric and NumPy can be installed on the same machine fine Seealsohttp://numpy.scipy.org/ 2. Why is the Seq object missing the(back)transcription translation methods described in this Tutorial? You leed Biopython 1.49 or later. Alternatively, use the Bio Seq Nodule functions described ill Section 3.11 3. Why doesni'l Bio SeqIO work: IL irnyor'ts fine but cher e is neo Purse function elc You need Biopython 1.43 or later. Older versions did contain some related code under the Bio. SeqIO name which has since been deprecated- and this is why the import " works 4. Why doesn,'t Bio. SeqIO read()work? The module imports fine but there is no read function! You need biopython 1.45 or later 5. Why isn't Bio AlignIO present? The module import fails! You need Biopython 1.46 or later 6. What file formats do Bio SeqID and Bio. AlignIO read and write? Seehttp://biopython.org/wiki/seqioandhttp://biopython.org/wiki/alignioonthewikifor the latest listing 7. Why don't the Bio. SeqIO and Bio. AlignIO input functions let me provide a sequence alphabet? You need Biopython 1.49 or later 8. Why doesnt str(...)gi aue me the fall sequence of a Seq object? You leed Biopython 1.45 or later. Alternatively, rather than str(my_seq), use my_seq. tostring) ( which will also work on recent, versions of Biopython) 9. Why doesn't Bio Blast work with the latest plain text NCBI blast output? The ncbi keep tweaking the plain text output from the blast tools, and keeping our parser up to datc is an ongoing struggle. Wc recommend you usc the XML output instcad, which is designed to bc read by a Collputer progralr 10. Why doesn't Bio Entrez read( work The module imports fine but there is no read function You need Biopython 1.46 or later 11. Why doesn't Bio PDB. MMCIFParser work I see an import error about MMCIFlex Since Biopython 1.42, the underlying Bio PDB mmCIF. MMCIFlex module has not been installed by default. It requires a third party tool called Hex(fast lexical analyzer generator). At the time of writing, you'll have install flex, then tweak your Biopython setup. py file and reinstall from source 12. I looked in a directory for code, but I couldn't seem to find the code that docs something. Where's it hidden? One thing to know is that we put code in__init__. py files. If you are not used to looking for code in this file this can be confusing The reason we do this is to make the imports easier for users. For instance, instead of having to do arepetitive import like from Bio GenBank import GenBank, you can just use from Bio import GenBank Chapter 2 Quick Start- What can you do with Biopython? This section is designed to get you started quickly with Biopython, and to give a general overview of what is available and how to use it. All of the examples in this section assume that you have some general working knowledge of python, and that you have successfully installed Biopython on your system. If you think you need to brush up on your python, the main python web site provides quite a bit of free documentation to gotstartcdwith(http://www.pythonorg/doc/) Since much biological work on the computer involves connecting with databases on the internet, some of the examples will also require a working internet connection in order to run Now that that is all out of the way, let's get into what we can do with Biopython 2.1 General overview of what Biopython provides As mentioned in the introduction, Biopython is a set of libraries to provide the ability to deal with " things f interest to biologists working on the computer. In general this means that you will need to have at least somc programming cxpcricncc (in python, of coursc! )or at Icast an intcrcst in learning to program Biopython's job is to make your job easier as a programmer by supplying reusable libraries so that you can focus on answering your specific question of interest, instead of focusing on the internals of parsing a particular file format(of course, if you want to help by writing a parser that doesn't exist and contributing it to Biopython, please go ahead! ) So Biopython's job is to make you happy! One thing to note about Biopython is that it often provides multiple ways of " doing the same thing Things have improved in recent releases, but this can still be frustrating as in Python there should ideally bc onc right way to do something. However, this can also bc a rcal boncfit bccausc it givcs you lots of flexibility and control over the libraries. The tutorial helps to show you the CommoN or easy ways to do things so that you can just make things work. To learn more about the alternative possibilities, look in the Cookbook(Chapter 12, this has some cools tricks and tips), the Advanced section(Chapter 13), the built in"docstrings"(via the python help command, or the APi documentation or ultimately the code itself 2.2 Working with sequences Disputably(of course!), the central object in bioinformatics is the sequence. Thus, we'll start with a quick introduction to the Biopython mechanisms for dealing with sequences, the Seq object, which we'll discuss in ore detail in Chapter 3 Most of the time when we thin k about sequences we have in my mind a string of letters like'AGTACACTGGT You can create such Seq object with this sequence as follows-the >> represents the python prompt followed by what you would type in >> from Bio Seq import Seg >> my_seq Seq ("AGTACACTGGT") >> myseq Seq('AGTACACtGGT', Alphabet)) >> print my-seq AGTACACTGGT >>> my_seq. alphabet Alphabet o What we have here is a sequence object with a generic alphabet. -reflecting t,he fact. we have not spec ified if this is a DNA or protein sequence(okay, a protein with a lot of Alanines, Glycines, Cysteines and Threonines! ) We'll talk more about alphabets in Chapter 3 In addition to having an alphabet, the Seq object differs from the python string in the methods it supports. You can't do this with a plain string >>> my_seq Seq('AGTACActGGT,, Alphabet O) >>>my_seq. complemento) Seq( TCATGTGACCA', Alphabet)) >>>my_seq. reverse_complementO) Seq('ACCAGTGTACT,, Alphabet O) The next most important class is the SeqRecord or Sequence Record. This holds a sequence(as a Seq object) with additional annotation including an identifier, name and description. The Bio. SeqIO module or readin ng and writing sequence file formats works with SeqRecord objects, which will be introduced below and covered in Imore detail by Chapter 4 This covers the basic features and uses of the Biopython sequence class. Now that you've got some idea of what it is like to interact with thc Biopython libraries, it's timc to delve into thc fun, fun world of dcaling with biological file formats! 2.3 A usage example Before we jump right into parsers and every thing else to do with Biopython, let's set up an exampl motivate everything we do and make life more interesting. After all, if there wasn't any biology in this tutorial, why would you want you read it Qn Since I love plants, I think we're just going to have to have a plant based example (sorry to all the fans of other organisms out there! ) Having just completed a recent trip to our local greenhouse, we've suddenly developed an incredible obsession with Lady Slipper Orchids(if you wonder why, have a look at some Lady Slipper Orchids photos on Flickr, or try a google Image Search) Of course, orchids are not only beautiful to look at, they are also extremely interesting for people studying evolution and systematics. So Ict's supposc worc thinking about writing a funding proposal to do a molccular study of Lady Slipper evolutiOn, and would like to see what kind of research has already been done alld how we can add to that After a little bit of reading up we discover that the Lady Slipper Orchids are in the Orchidaceae family and the Cypripedioideae sub-family and are made up of 5 genera: Cypripedium, Paphiopedilum, Phragmipedium Selenipedium and mexipedium That gives us enough to get started delving for more information. So, let's look at how the Biopython tools can help us. We'll start with sequence parsing in Section 2.4, but the orchids will be back later on as well-for example welI search PubMed for papers about orchids and extract sequence data frOln GenBank il Chapter 7, extract data from Swiss-Prot from certain orchid proteins in Chapter 8, and work with Clustalw multiple sequence alignments of orchid proteins in Section 12.2.1
(系统自动生成,下载前可以参看下载内容)

下载文件列表

相关说明

  • 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
  • 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度
  • 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
  • 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
  • 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
  • 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.
 相关搜索: biopython下载
 输入关键字,在本站1000多万海量源码库中尽情搜索: