利用ClusterW2构建多重比对的web服务:如何实现批量比对

ClusterW是最长用的一个多重比对工具,如何利用ClusterW构建一个多重比对的Web 服务,遇到的第二个问题就是如何根据用户的选择现实批量比对。我一直在琢磨着这件事情,时不时搜索一番,始终没有一个好的方案,却看到网上有很多这样的服务,却不开放源代码。这一次终于醒悟了,ClusterW提供了很多的参数,本来就可以实现,我却一直认为其只能交互式的使用,原来就在眼前,我却在苦苦寻找。

哦,为什么要建立web服务,有这么几点考虑:

  1. 为一个群体提供一个方便的、安全的易用的多重比对的计算机资源;
  2. 丰富的易用的界面,减少用户的学习成本;
  3. 或者某种业绩吧,哈。

命令示例

clustalw2 –infile=dna.fa
clustalw2 –infile=dna.fa –type=dna –output=gcg –outfile=align.gcg -align
clustalw2 -infile=test1.fas -type=dna -gapopen=10 -gapext=2 -output=gcg \
-outfile=align.gcg -align

参数详细说明

DATA (sequences)

-INFILE=file.ext                             :input sequences.
-PROFILE1=file.ext  and  -PROFILE2=file.ext  :profiles (old alignment).

VERBS (do things)

-OPTIONS            :list the command line parameters
-HELP  or -CHECK    :outline the command line params.
-FULLHELP           :output full help content.
-ALIGN              :do full multiple alignment.
-TREE               :calculate NJ tree.
-BOOTSTRAP(=n)      :bootstrap a NJ tree (n= number of bootstraps; def. = 1000).
-CONVERT            :output the input sequences in a different file format.

PARAMETERS (set things)

***General settings:****
-INTERACTIVE :read command line, then enter normal interactive menus
-QUICKTREE   :use FAST algorithm for the alignment guide tree
-TYPE=       :PROTEIN or DNA sequences
-NEGATIVE    :protein alignment with negative values in matrix
-OUTFILE=    :sequence alignment file name
-OUTPUT=     :GCG, GDE, PHYLIP, PIR or NEXUS
-OUTORDER=   :INPUT or ALIGNED
-CASE        :LOWER or UPPER (for GDE output only)
-SEQNOS=     :OFF or ON (for Clustal output only)
-SEQNO_RANGE=:OFF or ON (NEW: for all output formats)
-RANGE=m,n   :sequence range to write starting m to m+n
-MAXSEQLEN=n :maximum allowed input sequence length
-QUIET       :Reduce console output to minimum
-STATS=      :Log some alignents statistics to file

***Fast Pairwise Alignments:***
-KTUPLE=n    :word size
-TOPDIAGS=n  :number of best diags.
-WINDOW=n    :window around best diags.
-PAIRGAP=n   :gap penalty
-SCORE       :PERCENT or ABSOLUTE

***Slow Pairwise Alignments:***
-PWMATRIX=    :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename
-PWDNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename
-PWGAPOPEN=f  :gap opening penalty
-PWGAPEXT=f   :gap opening penalty

***Multiple Alignments:***
-NEWTREE=      :file for new guide tree
-USETREE=      :file for old guide tree
-MATRIX=       :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename
-DNAMATRIX=    :DNA weight matrix=IUB, CLUSTALW or filename
-GAPOPEN=f     :gap opening penalty
-GAPEXT=f      :gap extension penalty
-ENDGAPS       :no end gap separation pen.
-GAPDIST=n     :gap separation pen. range
-NOPGAP        :residue-specific gaps off
-NOHGAP        :hydrophilic gaps off
-HGAPRESIDUES= :list hydrophilic res.
-MAXDIV=n      :% ident. for delay
-TYPE=         :PROTEIN or DNA
-TRANSWEIGHT=f :transitions weighting
-ITERATION=    :NONE or TREE or ALIGNMENT
-NUMITER=n     :maximum number of iterations to perform
-NOWEIGHTS     :disable sequence weighting

***Profile Alignments:***
-PROFILE      :Merge two alignments by profile alignment
-NEWTREE1=    :file for new guide tree for profile1
-NEWTREE2=    :file for new guide tree for profile2
-USETREE1=    :file for old guide tree for profile1
-USETREE2=    :file for old guide tree for profile2

***Sequence to Profile Alignments:***
-SEQUENCES   :Sequentially add profile2 sequences to profile1 alignment
-NEWTREE=    :file for new guide tree
-USETREE=    :file for old guide tree

***Structure Alignments:***
-NOSECSTR1     :do not use secondary structure-gap penalty mask for profile 1
-NOSECSTR2     :do not use secondary structure-gap penalty mask for profile 2
-SECSTROUT=STRUCTURE or MASK or BOTH or NONE   :output in alignment file
-HELIXGAP=n    :gap penalty for helix core residues
-STRANDGAP=n   :gap penalty for strand core residues
-LOOPGAP=n     :gap penalty for loop regions
-TERMINALGAP=n :gap penalty for structure termini
-HELIXENDIN=n  :number of residues inside helix to be treated as terminal
-HELIXENDOUT=n :number of residues outside helix to be treated as terminal
-STRANDENDIN=n :number of residues inside strand to be treated as terminal
-STRANDENDOUT=n:number of residues outside strand to be treated as terminal

***Trees:***
-OUTPUTTREE=nj OR phylip OR dist OR nexus
-SEED=n        :seed number for bootstraps.
-KIMURA        :use Kimura's correction.
-TOSSGAPS      :ignore positions with gaps.
-BOOTLABELS=node OR branch :position of bootstrap values in tree display
-CLUSTERING=   :NJ or UPGMA

新书推荐

One thought on “利用ClusterW2构建多重比对的web服务:如何实现批量比对

  1. 您好,这位老师,自己也是摸索写了程序进行批量比对,但是不知道常用的参数是如何设置的呢,在下的是蛋白的序列

发表评论

电子邮件地址不会被公开。 必填项已用*标注

请启用Javascript,以完成验证!