Blastz安装与使用说明

安装(linux)

Blastz是由C写的,源代码可以从这里下载,解压后,make一下就可以使用了。

 $ wget http://www.bx.psu.edu/miller_lab/dist/blastz-2004-12-27.tar.gz
 $ tar -zxvf blastz-2004-12-27.tar.gz
 $ cd blastz-source/
 $ make
cc -O bz_main.c bz_align.c bz_extend.c bz_chain.c bz_dna.c bz_print.c bz_table.c bz_census.c bz_hit19.c bz_inner.c util.c seq.c args.c edit.c dna.c charvec.c nib.c astack.c \
          \
         -lm \
         -o blastz
 $ ls
blastz编译后文件列表

多重比对序列的格式及其应用

这里对多重序列比对格式(Multiple sequence alignment – MSA)进行总结。在做系统演化分析、序列功能分析、基因预测等,都需要涉及到多重序列比对。特别是当需要用不同软件对多重比对序列进行批量操作时,会遇到各种的格式,而这些格式是如何产生的,有什么区别,格式之间如何转换,从哪里可以下载到相关的格式序列,不同的格式又有什么特殊的用途等,本篇文章将就这些问题进行总结与讨论。因为涉及内容较多,不足之处,欢迎大家补充或者批判。

生物信息学的基础是基于这样的一个假设:序列相似,结构相似,功能相似。所以相似的一组序列,就可能同属于一个基因家族,而这样的一组序列相似的部分,就可能使其功能之所在,称其为结构域。这是对于基因家族分类的一种方式,将结构与功能进行联系,从而实现从结构预测功能(序列称为一级结构)。

多重序列数据分析流程
继续阅读

The CGView Server: a comparative genomics tool for circular genomes.

Sample output from the CGView Server

Sample output from the CGView Server

The CGView Server generates graphical maps of circular genomes that show sequence features, base composition plots, analysis results and sequence similarity plots. Sequences can be supplied in raw, FASTA, GenBank or EMBL format. Additional feature or analysis information can be submitted in the form of GFF (General Feature Format) files. The server uses BLAST to compare the primary sequence to up to three comparison genomes or sequence sets. The BLAST results and feature information are converted to a graphical map showing the entire sequence, or an expanded and more detailed view of a region of interest. Several options are included to control which types of features are displayed and how the features are drawn. The CGView Server can be used to visualize features associated with any bacterial, plasmid, chloroplast or mitochondrial genome, and can aid in the identification of conserved genome segments, instances of horizontal gene transfer, and differences in gene copy number. Because a collection of sequences can be used in place of a comparison genome, maps can also be used to visualize regions of a known genome covered by newly obtained sequence reads. The CGView Server can be accessed at http://stothard.afns.ualberta.ca/cgview_server/
CGView是一种画图工具,生成展示序列特性、基本组成片段、分析相似片段的圆形基因图解图,需要提供的序列为raw、FASTA、GenBank和EMBL格式, 另增加特性或者分析信息请在提交的GFF文件中添加相关信息。此服务使用blast比对相似序列从而建立三组比较基因组或者基因组序列套件。 blast比对结果和特性信息转换到展示整个序列或者扩展更多细节的趣味片段区域的基因图解图中。 特性的展示和怎么来绘制可以通过一些选项来控制。该服务显示与细菌、质粒、叶绿体、线粒体基因组有关的特性,帮助鉴定基因组储存段、举证基因水平转运和找出基因复制数量不同的图谱。由于收集的序列可被也在比较基因组,图谱可被用在读取新获取的序列被已知基因组覆盖的可视区域 。CGView 服务请访问http://stothard.afns.ualberta.ca/cgview_server/

Circos: An information aesthetic for comparative genomics.

We created a visualization tool, called Circos, to facilitate the identification and analysis of similarities and differences arising from comparisons of genomes. Our tool is effective in displaying variation in genome structure and, generally, any other kind of positional relationships between genomic intervals. Such data are routinely produced by sequence alignments, hybridization arrays, genome mapping, and genotyping studies. Circos uses a circular ideogram layout to facilitate the display of relationships between pairs of positions by the use of ribbons, which encode the position, size, and orientation of related genomic elements. Circos is capable of displaying data as scatter, line and histogram plots, heat maps, tiles, connectors and text. Bitmap or vector images can be created from GFF-style data inputs and hierarchical configuration files, which can be easily generated by automated tools, making Circos suitable for rapid deployment in data analysis and reporting pipelines. 继续阅读

重建微生物转录调控网络

虽然通过比较基因组的方法可以很容易的建立生物的代谢网络,但是,由于不同生物之间的分子在进化的保守性方面任然存在一定的差异,所以,通过比较基因组构建的调控网络存在一定的局限性。

最近由于实验技术的发展,大量的有关转录调控网络的实验数据不断产生,加上以往人们对转录调控网络的经典认识,使构建基因组范围的转录调控网络成为可能。 继续阅读