<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet href="/css/rss20.xsl" type="text/xsl"?>
<rss version="2.0" xmlns:pheedo="http://www.pheedo.com/namespace/pheedo">
	<channel>
		<title>IEEE/ACM Transactions on Computational Biology and Bioinformatics</title>
		<link>http://www.computer.org/tcbb</link>
		<description>The IEEE/ACM Transactions on Computational Biology and Bioinformatics is a new quarterly that will publish archival research results related to the algorithmic, mathematical, statistical, and computational methods that are central in bioinformatics and computational biology; the development and testing of effective computer programs in bioinformatics; the development and optimization of biological databases; and important biological results that are obtained from the use of these methods, programs, and databases.	</description>
		<language>en-us</language>
		<pubDate>Sat, 13 Mar 2010 11:00:01 GMT</pubDate>
		<image>
			<url>http://csdl.computer.org/common/images/logos/tcbb.gif</url>
			<title>IEEE Computer Society</title>
			<description>List of recently published journal articles</description>
			<link>http://www.computer.org/tcbb</link>
		</image>
		<item>
			<title>PrePrint: Prediction of Protein Functions with Gene Ontology and Inter-Species Protein Homology Data</title>
			<link>http://www.pheedcontent.com/click.phdo?i=a4f141fd56100c4edc0009f81b3b451b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.15</pheedo:origLink>
			<description>Accurate computational prediction of protein functions increasingly relies on network-inspired models for the protein function transfer. This task can become challenging for proteins isolated in their own network or those with poor or uncharacterized neighborhoods. Here, we present a novel probabilistic chain-graph based approach for predicting protein functions that builds on connecting networks of two (or more) different species by links of high inter-species sequence homology. In this way, proteins are able to "exchange" functional information with their neighbors-homologs from a different species. The knowledge of inter-species relationships, such as the sequence homology, can become crucial in cases of limited information from other sources of data, including the protein-protein interactions or cellular locations of proteins. We further enhance our model to account for the Gene Ontology dependencies by linking multiple but related functional ontology categories within and across multiple species. The resulting networks are of significantly higher complexity than most traditional protein network models. We comprehensively benchmark our method by applying it to two largest protein networks, the Yeast and the Fly. The joint Fly-Yeast network provides substantial improvements in precision, accuracy, and false positive rate over networks that consider either of the sources in isolation.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=a4f141fd56100c4edc0009f81b3b451b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=a4f141fd56100c4edc0009f81b3b451b&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.15</guid>
		</item>
		<item>
			<title>PrePrint: Simultaneous Identification of Duplications and Lateral Gene Transfers</title>
			<link>http://www.pheedcontent.com/click.phdo?i=53d1bb1062943873c328aadf7e512d14</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.14</pheedo:origLink>
			<description>The incongruency between a gene tree and a corresponding species tree can be attributed to evolutionary events such as gene duplication and gene loss. This paper describes a combinatorial model where so-called DTL-scenarios are used to explain the differences between a gene tree and a corresponding species tree taking into account gene duplications, gene losses, and lateral gene transfers (also known as horizontal gene transfers). The reasonable biological constraint that a lateral gene transfer may only occur between contemporary species leads to the notion of acyclic DTL-scenarios. Parsimony methods are introduced by defining appropriate optimization problems. We show that finding most parsimonious acyclic DTL-scenarios is NP-hard. However, by dropping the condition of acyclicity, the problem becomes tractable, and we provide a dynamic programming algorithm as well as a fixed-parameter-tractable algorithm for finding most parsimonious DTL-scenarios.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=53d1bb1062943873c328aadf7e512d14&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=53d1bb1062943873c328aadf7e512d14&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.14</guid>
		</item>
		<item>
			<title>PrePrint: ICGA-PSO-ELM approach for Accurate Multiclass Cancer Classification Resulting in Reduced Gene Sets in which Genes Encoding Secreted Proteins are Highly Represented</title>
			<link>http://www.pheedcontent.com/click.phdo?i=215a91a7d7a22122b2bee81052781b34</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.13</pheedo:origLink>
			<description>A combination of Integer Coded Genetic Algorithm (ICGA) and Particle Swarm Optimization (PSO), coupled with the neural network based Extreme Learning Machine (ELM) is used for gene selection and cancer classification. ICGA is used with PSO-ELM to select an optimal set of genes which are then used to build a classifier to develop an algorithm (ICGA_PSO_ELM) that can handle sparse data and sample imbalance. We evaluate the performance of ICGA-PSO-ELM and compare our results with existing methods in the literature. An investigation into the functions of the selected genes, using a systems biology approach, revealed that many of the identified genes are involved in cell signaling and proliferation. An analysis of these gene sets shows a larger representation of genes that encode secreted proteins than found in randomly selected gene sets. Secreted proteins constitute a major means by which cells interact with their surroundings. Mounting biological evidence has identified the tumor microenvironment as a critical factor that determines tumor survival and growth. Thus, the genes identified by this study that encode secreted proteins might provide important insights to the nature of the critical biological features in the microenvironment of each tumor type that allow these cells to thrive and proliferate.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=215a91a7d7a22122b2bee81052781b34&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=215a91a7d7a22122b2bee81052781b34&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.13</guid>
		</item>
		<item>
			<title>PrePrint: Semantics and Ambiguity of Stochastic RNA Family Models</title>
			<link>http://www.pheedcontent.com/click.phdo?i=0efae3459a6b9adfbe01d9d08249b024</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.12</pheedo:origLink>
			<description>Stochastic models such as HMMs or SCFGs fail to return the correct, maximum likelihood solution in the case of semantic ambiguity. This problem arises when the algorithm implementing the model inspects the same solution in different guises. It is a difficult problem in the sense that proving semantic non-ambiguity has been shown to be algorithmically undecidable, while compensating for it (by coalescing scores of equivalent solutions) has been shown to be NP-hard. For SCFGs modeling RNA secondary structure, it has been shown that the distortion of results can be quite severe. Much less is known about the case when SCFGs model the matching of a query sequence to an implicit consensus structure for an RNA family. We find that three different, meaningful semantics can be associated with the matching of a query against the model - a structural, an alignment, and a trace semantics. Rfam models correctly implement the alignment semantics, and are ambiguous with respect to the other two semantics, which are more abstract. We show how provably correct models can be generated for the trace semantics. We propose that both the structure and the trace semantics are worth-while concepts for further study, possibly better suited to capture remotely related family members.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=0efae3459a6b9adfbe01d9d08249b024&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=0efae3459a6b9adfbe01d9d08249b024&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.12</guid>
		</item>
		<item>
			<title>IEEE/ACM Transactions on Computational Biology and Bioinformatics - January-March 2010 (Vol. 7, No. 1)</title>
			<link>http://opac.ieeecomputersociety.org/opac?year=2010&amp;volume=7&amp;issue=01&amp;acronym=tcbb</link>
			<description>IEEE/ACM Transactions on Computational Biology and Bioinformatics</description>
			<guid isPermaLink="true">http://www.computer.org/portal/site/tcbb/</guid>
		</item>
		<item>
			<title>PrePrint: A Fast Algorithm for Computing Geodesic Distances in Tree Space</title>
			<link>http://www.pheedcontent.com/click.phdo?i=5ab2340914548dbe2f1c1a7b10f03a2e</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.3</pheedo:origLink>
			<description>Comparing and computing distances between phylogenetic trees are important biological problems, especially for models where edge lengths play an important role. The geodesic distance measure between two phylogenetic trees with edge lengths is the length of the shortest path between them in the continuous tree space introduced by Billera, Holmes, and Vogtmann. This tree space provides a powerful tool for studying and comparing phylogenetic trees, both in exhibiting a natural distance measure and in providing a Euclidean-like structure for solving optimization problems on trees. An important open problem is to find a polynomial time algorithm for finding geodesics in tree space. This paper gives such an algorithm, which starts with a simple initial path and moves through a series of successively shorter paths until the geodesic is attained.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=5ab2340914548dbe2f1c1a7b10f03a2e&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=5ab2340914548dbe2f1c1a7b10f03a2e&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.3</guid>
		</item>
		<item>
			<title>PrePrint: Exact Computation of Coalescent Likelihood for Panmictic and Subdivided Populations Under the Infinite Sites Model</title>
			<link>http://www.pheedcontent.com/click.phdo?i=240acf4b994fa98cd407ef040ec54a1c</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.2</pheedo:origLink>
			<description>Coalescent likelihood is the probability of observing the given population sequences under the coalescent model. Computation of coalescent likelihood under the infinite sites model is a classic problem in coalescent theory. Existing methods are based on either importance sampling or Markov chain Monte Carlo and are inexact. In this paper, we develop a simple method that can compute the exact coalescent likelihood for many datasets of moderate size, including a real biological data whose likelihood was previously thought to be difficult to compute exactly. Our method works for both panmictic and subdivided populations. Simulations demonstrate that the practical range of exact coalescent likelihood computation for panmictic populations is significantly larger than what was previously believed. We investigate the application of our method in estimating mutation rates by maximum likelihood. A main application of the exact method is comparing the accuracy of approximate methods. To demonstrate the usefulness of the exact method, we evaluate the accuracy of program Genetree in computing the likelihood for subdivided populations.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=240acf4b994fa98cd407ef040ec54a1c&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=240acf4b994fa98cd407ef040ec54a1c&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.2</guid>
		</item>
		<item>
			<title>PrePrint: Visual Exploration Across Biomedical Databases</title>
			<link>http://www.pheedcontent.com/click.phdo?i=c3471af59b6b8de9e7b49d7d166f1709</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.1</pheedo:origLink>
			<description>Though biomedical research often draws on knowledge from a wide variety of fields, few visualization methods for biomedical data incorporate meaningful cross-database exploration. A new approach is offered for visualizing and exploring a query-based subset of multiple heterogeneous biomedical databases. Databases are modeled as an entity-relation graph containing nodes (database records) and links (relationships between records). Users specify a keyword search string to retrieve an initial set of nodes, and then explore intra- and inter-database links. Results are visualized with user-defined semantic substrates to take advantage of the rich set of attributes usually present in biomedical data. Comments from domain experts indicate that this visualization method is potentially advantageous for biomedical knowledge exploration.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=c3471af59b6b8de9e7b49d7d166f1709&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=c3471af59b6b8de9e7b49d7d166f1709&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.1</guid>
		</item>
		<item>
			<title>PrePrint: Identifying Relevant Data for a Biological Database: Handcrafted Rules Versus Machine Learning</title>
			<link>http://www.pheedcontent.com/click.phdo?i=f9caa50dd7c9ebde1f5715b074903561</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.83</pheedo:origLink>
			<description>With well over one thousand specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss-Prot/TrEMBL protein records, for incorporation into a specialized biological database of transport proteins named TCDB. We show that both learning approaches outperform rules created by hand by a human expert. As one of the first case studies involving two different approaches to updating a deployed database, both the methods compared and the results will be of interest to curators of many specialized databases.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=f9caa50dd7c9ebde1f5715b074903561&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=f9caa50dd7c9ebde1f5715b074903561&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.83</guid>
		</item>
		<item>
			<title>PrePrint: Discriminative Motif Finding for Predicting Protein Subcellular Localization</title>
			<link>http://www.pheedcontent.com/click.phdo?i=f43cdd8ea0e6d89b9244b0ed4f44968a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.82</pheedo:origLink>
			<description>Many methods have been described to predict the subcellular location of proteins from sequence information. However, most of these methods either rely on global sequence properties or use a set of known protein targeting motifs to predict protein localization. Here we develop and test a novel method that identifies potential targeting motifs using a discriminative approach based on hidden Markov models (discriminative HMMs). These models search for motifs that are present in a compartment but absent in other, nearby, compartments by utilizing an hierarchical structure that mimics the protein sorting mechanism. We show that both discriminative motif finding and the hierarchical structure improves localization prediction on a benchmark dataset of yeast proteins. The motifs identified can be mapped to known targeting motifs and they are more conserved than the average protein sequence. Using our motif-based predictions we can identify potential annotation errors in public databases for the location of some of the proteins. A software implementation and the dataset described in this paper are available from http://murphylab.web.cmu.edu/software/2009_TCBB_motif/&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=f43cdd8ea0e6d89b9244b0ed4f44968a&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=f43cdd8ea0e6d89b9244b0ed4f44968a&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.82</guid>
		</item>
		<item>
			<title>PrePrint: Molecular Function Prediction Using Neighborhood Features</title>
			<link>http://www.pheedcontent.com/click.phdo?i=97bb93a4e0fc9f420927c816eeedaa3c</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.81</pheedo:origLink>
			<description>The recent advent of high throughput methods has generated large amounts of gene interaction data. This has allowed the construction of genome-wide networks. A significant number of genes in such networks remain uncharacterized and predicting the molecular function of these genes remains a major challenge. A number of existing techniques assume that genes with similar functions are topologically close in the network. Our hypothesis is that genes with similar functions observe similar annotation patterns in their neighborhood, regardless of the distance between them in the interaction network. We thus predict molecular functions of uncharacterized genes by comparing their functional neighborhoods to genes of known function. We propose a two-phase approach. First we extract functional neighborhood features of a gene using Random Walks with Restarts. We then employ a KNN classifier to predict the function of uncharacterized genes based on the computed neighborhood features. We perform leave-one-out validation experiments on two S. cerevisiae interaction networks revealing significant improvements over previous techniques. Our technique also provides a natural control of the trade-off between accuracy and coverage of prediction.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=97bb93a4e0fc9f420927c816eeedaa3c&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=97bb93a4e0fc9f420927c816eeedaa3c&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.81</guid>
		</item>
		<item>
			<title>PrePrint: GPD: A Graph Pattern Diffusion Kernel for Accurate Graph Classification with Applications in Cheminformatics</title>
			<link>http://www.pheedcontent.com/click.phdo?i=d2e37747a8139c2b8512920d8b83a948</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.80</pheedo:origLink>
			<description>Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogenous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully explored in the data mining community. In this paper, we demonstrate a novel technique called graph pattern diffusion kernel (GPD) with applications in cheminformatics. Our idea is to leverage existing frequent pattern discovery methods and to explore the application of kernel classifier (e.g. support vector machine) in building highly accurate graph classification. In our method, we first identify all frequent patterns from a graph database. We then map subgraphs to graphs in the graph database and use a process we call &amp;#x201C;pattern diffusion&amp;#x201D; to label nodes in the graphs. Finally we designed a novel graph alignment algorithm to compute the inner product of two graphs. We have tested our algorithm using a number of chemical structure data. The experimental results demonstrate that our method is significantly better than competing methods such as those kernel functions based on paths, cycles, and subgraphs.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=d2e37747a8139c2b8512920d8b83a948&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=d2e37747a8139c2b8512920d8b83a948&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.80</guid>
		</item>
		<item>
			<title>PrePrint: Microarray Time Course Experiments: Finding Profiles</title>
			<link>http://www.pheedcontent.com/click.phdo?i=5df44e363458a67000bed8e9b9f5417f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.79</pheedo:origLink>
			<description>Time-course studies with microarray techniques and experimental replicates are very useful in biomedical research. We present, in replicate experiments, an alternative approach to select and cluster genes according to a new measure for association between genes. First the procedure normalizes and standardizes the expression profile of each gene and then identifies scaling parameters that will further minimize the distance between replicates of the same gene. Then, the procedure filters out genes with a flat profile, detects differences between replicates and separates genes without significant differences from the rest. For this last group of genes, we define a mean profile for each gene and use it to compute the distance between two genes. Next, a hierarchical clustering procedure is proposed, a statistic is computed for each cluster to determine its compactness and the total number of classes is determined. For the rest of the genes, those with significant differences between replicates, the procedure detects where the differences between replicates lie, and assigns each gene to the best fitting previously identified profile or defines a new profile. We illustrate this new procedure using simulated data and a representative data set arising from a microarray experiment with replication, and we report interesting results.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=5df44e363458a67000bed8e9b9f5417f&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=5df44e363458a67000bed8e9b9f5417f&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.79</guid>
		</item>
		<item>
			<title>PrePrint: Estimating Haplotype Frequencies by Combining Data from Large DNA Pools with Database Information</title>
			<link>http://www.pheedcontent.com/click.phdo?i=5a189a3005455c2b139b1cb9a7176f08</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.71</pheedo:origLink>
			<description>We assume that allele frequency data have been extracted from several large DNA pools, each containing genetic material of up to hundreds of ascertained individuals. Our goal is to estimate the haplotype frequencies among the ascertained individuals by combining the pooled allele frequency data with prior knowledge about the possible haplotypes. Such prior information can be obtained, for example, from a database such as HapMap. We present a Bayesian haplotyping method for pooled DNA based on a continuous approximation of the multinomial distribution. The proposed method is applicable when the sizes of the DNA pools and/or the number of considered loci exceed the limits of several earlier methods. In the example analyses the proposed model clearly outperforms a deterministic greedy algorithm on real data from the HapMap database. With a small number of loci the proposed method performs similarly to an EM-algorithm which uses a multinormal approximation for the pooled allele frequencies, but which does not utilize prior information about the haplotypes. The method has been implemented in a Matlab code which is available upon request from the authors.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=5a189a3005455c2b139b1cb9a7176f08&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=5a189a3005455c2b139b1cb9a7176f08&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.71</guid>
		</item>
		<item>
			<title>PrePrint: A Markov Blanket-Based Model for Gene Regulatory Network Inference</title>
			<link>http://www.pheedcontent.com/click.phdo?i=ad112f4d804f9380d74fe1bce1243b09</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.70</pheedo:origLink>
			<description>An efficient two step Markov blanket method for modeling and inferring complex regulatory networks from large-scale microarray datasets is presented. The inferred gene regulatory network is based on the time series gene expression data capturing the underlying gene interactions. For constructing a highly accurate GRN, the proposed method performs (i) discovery of a gene's Markov Blanket (MB), (ii) formulation of a flexible measure to determine the network's quality, (iii) efficient searching with the aid of a guided genetic algorithm, (iv) pruning to obtain a minimal set of correct interactions. Investigations are carried out using both synthetic as well as yeast cell-cycle gene expression data sets. The realistic synthetic datasets validate the robustness of the method by varying topology, sample size, time-delay, noise, vertex in-degree and presence of hidden nodes. It is shown that the proposed approach has excellent inferential capabilities and high accuracy even in the presence of noise. The gene network inferred from yeast cell-cycle data is investigated for its biological relevance using well known interactions, sequence analysis, motif patterns and GO data. Further, novel interactions are predicted for the unknown genes of the network and their influence on other genes is also discussed.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=ad112f4d804f9380d74fe1bce1243b09&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=ad112f4d804f9380d74fe1bce1243b09&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.70</guid>
		</item>
		<item>
			<title>PrePrint: Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific Substitution Matrices</title>
			<link>http://www.pheedcontent.com/click.phdo?i=54bb8ab8a6ffa8713de61e2fcddb3bc8</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.69</pheedo:origLink>
			<description>Pairwise sequence alignment is a central problem in bioinformatics which forms the basis of many other applications. Two related sequences are expected to have a high alignment score, but relatedness is usually judged by statistical significance rather than by alignment score. Recently, it was shown that pairwise statistical significance gives promising results as an alternative to database statistical significance for getting individual significance estimates of pairwise alignment scores. The improvement was mainly attributed to making the statistical significance estimation process more sequence-specific and database-independent. In this paper, we use sequence-specific and position-specific substitution matrices to derive the estimates of pairwise statistical significance, which is expected to use more sequence-specific information in estimating pairwise statistical significance. Experiments on a benchmark database with sequence-specific substitution matrices at different levels of sequence-specific contribution were conducted, and results confirm that using sequence-specific substitution matrices for estimating pairwise statistical significance is significantly better than using a standard matrix like BLOSUM62, and than database statistical significance estimates reported by popular database search programs like BLAST, PSI-BLAST (without pre-trained PSSMs) and SSEARCH on a benchmark database, but with pre-trained PSSMs, PSI-BLAST results are significantly better. Further, using position-specific substitution matrices for estimating pairwise statistical significance gives significantly better results even than PSI-BLAST using pre-trained PSSMs.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=54bb8ab8a6ffa8713de61e2fcddb3bc8&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=54bb8ab8a6ffa8713de61e2fcddb3bc8&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.69</guid>
		</item>
		<item>
			<title>PrePrint: The Impact of Multiple Protein Sequence Alignment on Phylogenetic Estimation</title>
			<link>http://www.pheedcontent.com/click.phdo?i=af861cef704dd5e1eb5f419cfa6c8ed0</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.68</pheedo:origLink>
			<description>Multiple sequence alignment is typically the first step in estimating phylogenetic trees, with the assumption being that as alignments improve, so will phylogenetic reconstructions. Over the last decade or so, new multiple sequence alignment methods have been developed to improve comparative analyses of protein structure, but these new methods have not been typically used in phylogenetic analyses. In this paper, we report on a simulation study that we performed to evaluate the consequences of using these new multiple sequence alignment methods in terms of the resultant phylogenetic reconstruction. We find that while alignment accuracy is positively correlated with phylogenetic accuracy, the amount of improvement in phylogenetic estimation that results from an improved alignment can range from quite small to substantial. We observe that phylogenetic accuracy is most highly correlated with alignment accuracy when sequences are most difficult to align, and that variation in alignment accuracy can have little impact on phylogenetic accuracy when alignment error rates are generally low. We discuss these observations and implications for future work.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=af861cef704dd5e1eb5f419cfa6c8ed0&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=af861cef704dd5e1eb5f419cfa6c8ed0&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.68</guid>
		</item>
		<item>
			<title>PrePrint: A Weighted Principal Component Analysis and Its Application to Gene Expression Data</title>
			<link>http://www.pheedcontent.com/click.phdo?i=e28b161f244912118e66a1cdae7767e5</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.61</pheedo:origLink>
			<description>In this work we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select  variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may  be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this  context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the  features of our WPCA and compare it with the usual PCA, we consider the problem of analysing gene expression datasets. In the second part of this work we propose  a new PCA-based algorithm to iteratively select the most important genes in a microarray dataset. We show that this algorithm produces better results when our  WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays  algorithm.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=e28b161f244912118e66a1cdae7767e5&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=e28b161f244912118e66a1cdae7767e5&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.61</guid>
		</item>
		<item>
			<title>PrePrint: Topology Improves Phylogenetic Motif Functional Site Predictions</title>
			<link>http://www.pheedcontent.com/click.phdo?i=acaaecf5df2ae9b0bbf79ab5293a2a12</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.60</pheedo:origLink>
			<description>Prediction of protein functional sites from sequence-derived data remains an open bioinformatics problem. We have developed a phylogenetic motif (PM) functional site prediction approach that identifies functional sites from alignment fragments that parallel the evolutionary patterns of the family. In our approach, PMs are identified by comparing tree topologies of each alignment fragment to that of the complete phylogeny. Herein, we bypass the phylogenetic reconstruction step and identify PMs directly from distance matrix comparisons. In order to optimize the new algorithm, we consider three different distance matrices and thirteen different matrix similarity scores. We assess the performance of the various approaches on a structurally non-redundant dataset that includes three types of functional site definitions. Without exception, the predictive power of the original approach outperforms the distance matrix variants. While the distance matrix methods fail to improve upon the original approach, our results are important because they clearly demonstrate that the improved predictive power is based on the topological comparisons. Meaning, phylogenetic trees are a straightforward, yet powerful way to improve functional site prediction accuracy. While complementary studies have shown that topology improves predictions of protein-protein interactions, this report represents the first demonstration that trees improve functional site predictions as well.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=acaaecf5df2ae9b0bbf79ab5293a2a12&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=acaaecf5df2ae9b0bbf79ab5293a2a12&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.60</guid>
		</item>
		<item>
			<title>PrePrint: On the Characterization and Selection of Diverse Conformational Ensembles, with Applications to Flexible Docking</title>
			<link>http://www.pheedcontent.com/click.phdo?i=e21e4bc7ed54dfb974f1fe04f1bb3bcb</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.59</pheedo:origLink>
			<description>To address challenging flexible docking problems, a number of docking algorithms pre-generate large collections of candidate conformers. To remove the redundancy from such ensembles, a central problem in this context is to report a selection of conformers maximizing some geometric diversity criterion. We make three contributions to this problem. First, we resort to geometric optimization so as to report selections maximizing the molecular volume or molecular surface area (MSA) of the selection. Greedy strategies are developed, together with approximation bounds. Second, to assess the efficacy of our algorithms, we investigate two conformer ensembles corresponding to a flexible loop of four protein complexes. By focusing on the MSA of the selection, we show that our strategy matches the MSA of standard selection methods, but resorting to a number of conformers between one and two orders of magnitude smaller. This observation is qualitatively explained using the Betti numbers of the union of balls of the selection. Finally, we replace the conformer selection problem in the context of multiple-copy flexible docking. On the afore-mentioned systems, we show that using the loops selected by our strategy can improve the result of the docking process.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=e21e4bc7ed54dfb974f1fe04f1bb3bcb&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=e21e4bc7ed54dfb974f1fe04f1bb3bcb&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.59</guid>
		</item>
		<item>
			<title>PrePrint: Influence of Prior Knowledge in Constraint-Based Learning of Gene Regulatory Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=29ada1b4438423e66720d5d36f421759</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.58</pheedo:origLink>
			<description>Constraint-based structure learning algorithms generally perform well on sparse graphs. Although sparsity is not uncommon, there are some domains where the underlying graph can have some dense regions; one of these domains is gene regulatory networks, which is the main motivation to undertake the study described in this paper. We propose a new constraint-based algorithm that can both increase the quality of output and decrease the computational requirements for learning the structure of gene regulatory networks. The algorithm is based on and extends the PC algorithm. Two different types of information are derived from the prior knowledge; one is the probability of existence of edges, and the other is the nodes that seem to be dependent on a large number of nodes compared to other nodes in the graph. Also a new method based on Gene Ontology for gene regulatory network validation is proposed. We demonstrate the applicability and effectiveness of the proposed algorithms on both synthetic and real data sets.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=29ada1b4438423e66720d5d36f421759&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=29ada1b4438423e66720d5d36f421759&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.58</guid>
		</item>
		<item>
			<title>PrePrint: F&amp;#xb2;Dock: Fast Fourier Protein-Protein Docking</title>
			<link>http://www.pheedcontent.com/click.phdo?i=f7381028fb6ab2813671f4e5f41f20d4</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.57</pheedo:origLink>
			<description>The functions of proteins is often realized through their mutual interactions. Determining a relative transformation for a pair of proteins and their conformations which form a stable complex, reproducible in nature, is known as docking. It is an important step in drug design, structure determination and understanding function and structure relationships. We provide a scoring model for rigid docking and error-bounded approximation algorithms to predict docking sites. Translational search is sped up using the Fourier domain. Shape based interactions is shown to give good results for a large range of pairs of proteins.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=f7381028fb6ab2813671f4e5f41f20d4&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=f7381028fb6ab2813671f4e5f41f20d4&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.57</guid>
		</item>
		<item>
			<title>PrePrint: Peak Tree: A New Tool for Multiscale Hierarchical Representation and Peak Detection of Mass Spectrometry Data</title>
			<link>http://www.pheedcontent.com/click.phdo?i=e045badaff523f1c6ea3b8696b263943</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.56</pheedo:origLink>
			<description>In mass spectrometry (MS) analysis, false peak detection results are unavoidable due to severe spectrum variations. However, most current peak detection methods are neither robust enough to resist spectrum variations nor flexible enough to revise false detection results. To improve flexibility, we introduce peak tree to represent the peak information in MS spectra. Each tree node is a peak judgment on a range of scales, and each tree decomposition, as a set of nodes, is a candidate peak detection result. To improve robustness, we combine peak detection and common peak alignment into a closed-loop framework, which finds the optimal decomposition considering both peak intensity and common peak information. The common peak information is derived from the density clustering of peaks detected throughout the MS database and loopily refined to direct peak tree decomposition. Finally, we present an improved ant colony optimization (ACO) biomarker selection method to build a MS analysis system based on peak tree. Experiment shows that our peak detection method can better resist spectrum variations and provide higher sensitivity and lower false detection rates than conventional methods. The benefits from our peak tree based system for MS disease analysis are also proved on real SELDI data&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=e045badaff523f1c6ea3b8696b263943&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=e045badaff523f1c6ea3b8696b263943&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.56</guid>
		</item>
		<item>
			<title>PrePrint: Predicting Metabolic Fluxes Using Gene Expression Differences as Constraints</title>
			<link>http://www.pheedcontent.com/click.phdo?i=a28537be4af70b9f9e89992c5afce29b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.55</pheedo:origLink>
			<description>A standard approach to estimate intracellular fluxes on a genome-wide scale is flux balance analysis (FBA), which optimizes an objective function subject to constraints on (relations between) fluxes. The performance of FBA models heavily depends on the relevance of the formulated objective function and the completeness of the defined constraints. Previous studies indicated that FBA predictions can be improved by adding regulatory on/off constraints. These constraints were imposed based on either absolute (Shlomi2007a,Covert2004) or relative (Shlomi2008) gene expression values. We provide a new algorithm that directly uses regulatory up/down constraints based on gene expression data in FBA optimization (tFBA). Our assumption is that if the activity of a gene drastically changes from one condition to the other, the flux through the reaction controlled by that gene will change accordingly. The potential of the proposed method, tFBA, is demonstrated through the analysis of fluxes in yeast under nine different cultivation conditions. We illustrate that changes in gene expression are predictive for changes in fluxes. We compare tFBA and FBA predictions to show that our approach yields more biologically relevant results.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=a28537be4af70b9f9e89992c5afce29b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=a28537be4af70b9f9e89992c5afce29b&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.55</guid>
		</item>
		<item>
			<title>PrePrint: A Partial Set Covering Model for Protein Mixture Identification Using Mass Spectrometry Data</title>
			<link>http://www.pheedcontent.com/click.phdo?i=c3608345a2a4d21dd36596e069c888a7</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.54</pheedo:origLink>
			<description>Protein identification is a key and essential step in mass spectrometry (MS) based proteome research. To date, there are many protein identification strategies that employ either MS data or MS/MS data for database searching. While MS-based methods provide wider coverage than MS/MS-based methods, their identification accuracy is lower since MS data have less information than MS/MS data. Thus, it is desired to design more sophisticated algorithms that achieve higher identification accuracy using MS data. Peptide Mass Fingerprinting (PMF) has been widely used to identify single purified proteins from MS data for many years. In this paper, we extend this technology to protein mixture identification. First, we formulate the problem of protein mixture identification as a Partial Set Covering (PSC) problem. Then, we present several algorithms that can solve the PSC problem efficiently. Finally, we extend the partial set covering model to both MS/MS data and the combination of MS data and MS/MS data. The experimental results on simulated data and real data demonstrate the advantages of our method: (1) it outperforms previous MS-based approaches significantly; (2) it is useful in the MS/MS-based protein inference; and (3) it combines MS data and MS/MS data in a unified model such that the identification performance is further improved.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=c3608345a2a4d21dd36596e069c888a7&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=c3608345a2a4d21dd36596e069c888a7&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.54</guid>
		</item>
		<item>
			<title>PrePrint: Fast Surface-Based Travel Depth Estimation Algorithm for Macromolecule Surface Shape Description</title>
			<link>http://www.pheedcontent.com/click.phdo?i=33b26cc8a59813d0ba68230b804ad07f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.53</pheedo:origLink>
			<description>Travel Depth, introduced by Coleman and Sharp in 2006, is a physical interpretation of molecular depth, term frequently used to describe the shape of a molecular active site or binding site. Travel Depth can be seen as the physical distance a solvent molecule would have to travel from a point of the surface, i.e., the Solvent Excluded Surface (SES), to its convex hull. Existing algorithms providing an estimation of the Travel Depth are based on a regular sampling of the molecule volume and on the use of the Dijkstra&#x2019;s shortest path algorithm. Since Travel Depth is only defined on the molecular surface, this volume-based approach is characterized by a large computational complexity due to the processing of unnecessary samples lying inside or outside the molecule. In this paper, we propose a surface-based approach that restricts the processing to data defined on the SES. This algorithm significantly reduces the complexity of Travel Depth estimation and makes possible the analysis of large macromolecule surface shape description with high resolution. Experimental results show that compared to existing methods, the proposed algorithm achieves accurate estimations with considerably reduced processing times.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=33b26cc8a59813d0ba68230b804ad07f&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=33b26cc8a59813d0ba68230b804ad07f&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.53</guid>
		</item>
		<item>
			<title>PrePrint: Linear-Time Algorithms for the Multiple Gene Duplication Problems</title>
			<link>http://www.pheedcontent.com/click.phdo?i=d5b323521528115af9755138842c7885</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.52</pheedo:origLink>
			<description>A fundamental problem arising in the evolutionary molecular biology is to discover the locations of gene duplications and multiple gene duplication episodes based on the phylogenetic information. The solutions to the Multiple Gene Duplication problems can provide useful clues to place the gene duplication events onto the locations of a species tree and to expose the multiple gene duplication episodes. In this paper, we study two variations of the Multiple Gene Duplication problems: the Episode-Clustering (EC) problem and the Minimum Episodes (ME) problem. For the EC problem, we improve the results of Burleigh et~al. with an optimal linear-time algorithm. For the ME problem, on the basis of the algorithm presented by Bansal and Eulenstein, we propose an optimal linear-time algorithm.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=d5b323521528115af9755138842c7885&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=d5b323521528115af9755138842c7885&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.52</guid>
		</item>
		<item>
			<title>PrePrint: A General Framework for Analyzing Data from Two Short Time-Series Microarray Experiments</title>
			<link>http://www.pheedcontent.com/click.phdo?i=15c8195d5f41cb1334cce27518400c2b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.51</pheedo:origLink>
			<description>We propose a general theoretical framework for analyzing differentially expressed genes and behavior patterns from two homogenous short time-course data. The framework generalizes the recently proposed Hilbert Schmidt Independence Criterion (HSIC) based framework adapting it to the time-series scenario by utilizing tensor analysis for data transformation. The proposed framework is effective in yielding criteria that can identify both the differentially expressed genes and time-course patterns of interest between two time series experiments without requiring to explicitly cluster the data. The results, obtained by applying the proposed framework with a linear kernel formulation, on various datasets, are found to be both biologically meaningful and consistent with published studies.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=15c8195d5f41cb1334cce27518400c2b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=15c8195d5f41cb1334cce27518400c2b&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.51</guid>
		</item>
		<item>
			<title>PrePrint: Fuzzy ARTMAP Prediction of Biological Activities for Potential HIV-1 Protease Inhibitors Using A Small Molecular Dataset</title>
			<link>http://www.pheedcontent.com/click.phdo?i=2d791fcbd2169bb2c4e18e3954bd9a94</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.50</pheedo:origLink>
			<description>We focus on the neuro-fuzzy prediction of biological activities of HIV-1 protease inhibitory compounds when inferring from small training sets. We propose two computational intelligence prediction techniques which are suitable for small training sets, at the expense of some computational overhead. Both techniques are based on the FAMR model. The FAMR is a Fuzzy ARTMAP (FAM) incremental learning system used for classification and probability estimation. During the learning phase, each sample pair is assigned a relevance factor proportional to the importance of that pair. The two proposed algorithms in this paper are: 1. The GA-FAMR algorithm, which is new, uses a genetic algorithm to optimize the relevances assigned to the training data. 2. The Ordered FAMR is derived from a known algorithm. Instead of optimizing relevances, it optimizes the order of data presentation using the algorithm of Dagher et al. In our experiments, we compare these two algorithms with an algorithm not based on the FAM, the FS-GA-FNN. We conclude that when inferring from small training sets, both techniques are efficient, in terms of generalization capability and execution time. The computational overhead introduced is compensated by the better accuracy obtained. Finally, the proposed techniques are used to predict the biological activities of newly designed potential HIV-1 protease inhibitors.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=2d791fcbd2169bb2c4e18e3954bd9a94&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=2d791fcbd2169bb2c4e18e3954bd9a94&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.50</guid>
		</item>
		<item>
			<title>PrePrint: Model Reduction Using Piecewise-Linear Approximations Preserves Dynamic Properties of the Carbon Starvation Response in Escherichia coli</title>
			<link>http://www.pheedcontent.com/click.phdo?i=535ac49c52fde0e7f3d87da69b95edd7</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.49</pheedo:origLink>
			<description>The adaptation of the bacterium Escherichia coli to carbon starvation is controlled by a large network of biochemical reactions involving genes, mRNAs, proteins, and signalling molecules. The dynamics of these networks is difficult to analyze, notably due to a lack of quantitative information on parameter values. To overcome these limitations, model reduction approaches based on quasi-steady-state (QSS) and piecewise-linear (PL) approximations have been proposed, resulting in models that are easier to handle mathematically and computationally. The approximations are not supposed to affect the capability of the model to account for essential dynamical properties of the system, but the validity of this assumption has not been systematically tested. In this paper we carry out such a study by evaluating a large and complex PL model of the carbon starvation response in E. coli using an ensemble approach. The results show that, in comparison with conventional nonlinear models, the PL approximations generally preserve the dynamics of the carbon starvation response network, although with some deviations concerning notably the quantitative precision of the model predictions. This encourages the application of PL models to the qualitative analysis of bacterial regulatory networks, in situations where the reference time-scale is that of protein synthesis and degradation.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=535ac49c52fde0e7f3d87da69b95edd7&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=535ac49c52fde0e7f3d87da69b95edd7&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.49</guid>
		</item>
		<item>
			<title>PrePrint: Learning Genetic Regulatory Network Connectivity From Time Series Data</title>
			<link>http://www.pheedcontent.com/click.phdo?i=242b7fb897a29dcf8088ca1213468476</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.48</pheedo:origLink>
			<description>Recent experimental advances facilitate the collection of time series data that indicate which genes in a cell are expressed. This information can be used to understand the genetic regulatory network that generates the data. Typically, Bayesian analysis approaches are applied which neglect the time series nature of the experimental data, have difficulty in determining the direction of causality, and do not perform well on networks with tight feedback. This paper presents a method to learn genetic network connectivity which exploits the time series nature of experimental data to achieve better causal predictions. This method breaks up the data into bins, and determines an initial set of potential influence vectors for each gene based upon the probability of the gene&#x2019;s expression increasing in the next time step. These vectors are then combined to form new vectors with better scores and are competed against each other to determine the final influence vector for each gene. The result is a directed graph representation of the genetic network&#x2019;s repression and activation connections. Results are reported for several synthetic networks with tight feedback showing significant improvements over another dynamic Bayesian approach. Promising results are reported for genes involved in the yeast cell cycle.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=242b7fb897a29dcf8088ca1213468476&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=242b7fb897a29dcf8088ca1213468476&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.48</guid>
		</item>
		<item>
			<title>PrePrint: Efficient Formulations for Exact Stochastic Simulation of Chemical Systems</title>
			<link>http://www.pheedcontent.com/click.phdo?i=14b7eec17f212c44de1ffdd30ae033e5</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.47</pheedo:origLink>
			<description>One can generate trajectories to simulate a system of chemical reactions using either Gillespie's direct method or Gibson and Bruck's next reaction method. Because one usually needs many trajectories to understand the dynamics of a system, performance is important. In this paper we present new formulations of these methods that improve the computational complexity of the algorithms. We present optimized implementations, available from http://cain.sourceforge.net, that offer better performance than previous work. There is no single method that is best for all problems. Simple formulations often work best for systems with a small number of reactions, while some sophisticated methods offer the best performance for large problems and scale well asymptotically. We investigate the performance of each formulation on simple biological systems using a wide range of problem sizes. We also consider the numerical accuracy of the direct and the next reaction method. We have found that special precautions must be taken in order to ensure that randomness is not discarded during the course of a simulation.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=14b7eec17f212c44de1ffdd30ae033e5&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=14b7eec17f212c44de1ffdd30ae033e5&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.47</guid>
		</item>
		<item>
			<title>PrePrint: Genetic Networks and Soft Computing</title>
			<link>http://www.pheedcontent.com/click.phdo?i=b0c6cf24aa461fd2581455282fc6f667</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.39</pheedo:origLink>
			<description>Analysis of gene regulatory networks provides enormous information on various fundamental cellular processes involving growth, development, hormone secretion and cellular communication. Their extraction from available gene expression profiles is a challenging problem. Such reverse engineering of genetic networks offers insight into cellular activity, and towards prediction of adverse effects of new drugs or possible identification of new drug targets. Tasks like classification, clustering and feature selection enable efficient mining of knowledge about gene interactions in the form of networks. It is known that biological data is prone to different kinds of noise and ambiguity. Soft computing tools like fuzzy sets, evolutionary strategies and neurocomputing have been found to help in providing low cost, acceptable solutions in the presence of various types of uncertainties. In this article we survey the role of these soft methodologies and their hybridizations, for the purpose of generating genetic networks.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=b0c6cf24aa461fd2581455282fc6f667&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=b0c6cf24aa461fd2581455282fc6f667&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.39</guid>
		</item>
		<item>
			<title>PrePrint: Probabilistic Analysis of Probe Reliability in Differential Gene Expression Studies with Short Oligonucleotide Arrays</title>
			<link>http://www.pheedcontent.com/click.phdo?i=d3969f08c9f429a4e9dbca7468ab1214</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.38</pheedo:origLink>
			<description>Probe defects are a major source of noise in gene expression studies. While existing approaches detect noisy probes based on external information such as genomic alignments, we introduce and validate a targeted probabilistic method for analyzing probe reliability directly from expression data and independently of the noise source. This provides insights into the various sources of probe-level noise and gives tools to guide probe design.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=d3969f08c9f429a4e9dbca7468ab1214&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=d3969f08c9f429a4e9dbca7468ab1214&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.38</guid>
		</item>
		<item>
			<title>PrePrint: Identification and Modeling of Genes with Diurnal Oscillations from Microarray Time Series Data</title>
			<link>http://www.pheedcontent.com/click.phdo?i=b1d3a68d7a36910bed4511963c68fd03</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.37</pheedo:origLink>
			<description>Behavior of living organisms is strongly modulated by the day and night cycle giving rise to a cyclic pattern of activities. Such a pattern helps the organism to coordinate their activities and maintain a balance between what could be performed during the 'day' and what could be relegated to 'night'. This cyclic pattern, called the 'Circadian Rhythm', is a biological phenomenon observed in a large number of organisms. In this paper, our goal is to analyze transcriptome data from Cyanothece for the purpose of discovering genes whose expressions are rhythmic. We cluster these genes into groups that are close in terms of their phases and show that genes from a specific metabolic functional category are tightly clustered, indicating perhaps a 'preferred time of the day/night' when the organism performs this function. The proposed analysis is applied to two sets of micro array experiments performed under varying incident light patterns. Subsequently we propose a model with a network of three phase oscillators together with a central master clock and use it to approximate a set of 'circadian controlled genes' that can be approximated closely.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=b1d3a68d7a36910bed4511963c68fd03&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=b1d3a68d7a36910bed4511963c68fd03&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.37</guid>
		</item>
		<item>
			<title>PrePrint: Nonnegative Principal Component Analysis for Cancer Molecular Pattern Discovery</title>
			<link>http://www.pheedcontent.com/click.phdo?i=0ac0ff807f35bc85d082a839f6be6dc2</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.36</pheedo:origLink>
			<description>As a well established feature selection algorithm, principal component analysis (PCA) is often combined with state-of-the-art classification algorithms to identify cancer molecular patterns in microarray data. However, its global feature selection mechanism prevents it from effectively capturing the latent data structures in the high dimensional data. In this study, we investigate the benefit of adding nonnegative constraints on PCA and develop a nonnegative principal component analysis algorithm (NPCA) to overcome the global nature of PCA. A novel classification algorithm NPCA-SVM is proposed for microarray data pattern discovery. We report strong classification results from the NPCA-SVM algorithm on five benchmark microarray datasets by direct comparison with other related algorithms. We have also proved mathematically and interpreted biologically that microarray data will inevitably encounter over-fitting for a SVM/PCA-SVM learning machine under a Gaussian kernel. In addition, we demonstrate nonnegative principal component analysis can be used to capture meaningful biomarkers effectively.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;hr /&gt;
&lt;div style=&quot;font-size:xx-small;color:gray;padding-bottom:.5em&quot;&gt;Presented By:&lt;/div&gt;
&lt;div&gt;&lt;a href=&quot;http://www.pheedo.com/feeds/ht.php?t=c&amp;amp;i=0ac0ff807f35bc85d082a839f6be6dc2&amp;amp;p=1&quot;&gt;Inside Guantanamo: Sunday at 9P e/p&lt;/a&gt;&lt;/div&gt;
&lt;table border=&quot;0&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot;&gt;
&lt;tr&gt;&lt;td valign=&quot;top&quot;&gt;&lt;embed src=&quot;http://c.brightcove.com/services/viewer/federated_f9/17831997001?isVid=1&amp;publisherID=1660622131&quot; bgcolor=&quot;#FFFFFF&quot; flashVars=&quot;@videoPlayer=17854499001&amp;playerID=17831997001&amp;domain=embed&amp;&quot; base=&quot;http://admin.brightcove.com&quot; name=&quot;flashObj&quot; width=&quot;300&quot; height=&quot;250&quot; seamlesstabbing=&quot;false&quot; type=&quot;application/x-shockwave-flash&quot; allowFullScreen=&quot;true&quot; swLiveConnect=&quot;true&quot; allowScriptAccess=&quot;always&quot; pluginspage=&quot;http://www.macromedia.com/shockwave/download/index.cgi?P1_Prod_Version=ShockwaveFlash&quot;&gt;&lt;/embed&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;http://images.pheedo.com/g/ngc_bluewhale/brand_logo_80x60.png&quot;&gt;&lt;br /&gt;&lt;font size=&quot;2&quot; face=&quot;tahoma&quot; &gt;Guantanamo Bay is one of the world's controversial prisons. This may be its final chapter.  With unprecedented access, National Geographic has the story you haven't heard.  Both sides, told from the inside, before its doors close forever. Click to learn more and go Inside Guantanamo &gt;&gt;&lt;br /&gt;&lt;/font&gt;&lt;a href=&quot;http://www.pheedo.com/click.phdo?a=v3%3Aa271cee67dfff482f0d65fb1ab2dbeb4%3AMr%2Bh0MpnVRLPNJdcAt9CNC9V4bldEKN7LJct7xOR4Qasw2TqiPSywbekHkNSMJBXoLLTgxjqJ6GFDjQrWKxDTti%2BExxPSgB53ImQxT%2Fv%2F65baGhOO2fHMoDRL2wRGFtyEd9rjTRarteEV4MpZVASTMH%2BQlzbT04u%2FQ%3D%3D&quot;target=&quot;_blank&quot;&gt;&lt;font size=&quot;2&quot; font color=&quot;007DC3&quot; face=&quot;tahoma&quot; &gt;&lt;U&gt;natgeotv.com/guantanamo&lt;U&gt;&lt;/font&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;div style=&quot;font-size:xx-small; padding-top: 1em;&quot;&gt;&lt;span style=&quot;border-top: 1px solid&quot;&gt;
&lt;br style=&quot;display:none&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/&quot;&gt;Ads by Pheedo&lt;/a&gt;
&lt;/span&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0; height: 1px; width: 1px;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; src=&quot;http://www.pheedo.com/feeds/ht.php?t=v&amp;amp;i=0ac0ff807f35bc85d082a839f6be6dc2&amp;amp;p=1&quot;/&gt;
&lt;br/&gt;
&lt;/div&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.36</guid>
		</item>
		<item>
			<title>PrePrint: Finding Significant Matches of Position Weight Matrices in Linear Time</title>
			<link>http://www.pheedcontent.com/click.phdo?i=9f8e942600e246c1f3be64d8d3b2898f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.35</pheedo:origLink>
			<description>Position weight matrices are an important method for modeling signals or motifs in biological sequences, both in DNA and protein contexts. In this paper we present fast algorithms for the problem of finding significant matches of such matrices. Our algorithms are of the on--line type, and they generalize classical multi-pattern matching, filtering, and super-alphabet techniques of combinatorial string matching to the problem of weight matrix matching. Several variants of the algorithms are developed, including multiple matrix extensions that perform the search for several matrices in one scan through the sequence database. Experimental performance evaluation is provided to compare the new techniques against each other as well as against some other on--line and index--based algorithms proposed in the literature. Compared to the brute-force $O(mn)$ approach, our solutions can be faster by a factor that is proportional to the matrix length $m$. Our multiple-matrix filtration algorithm had the best performance in the experiments. On a current PC, this algorithm finds significant matches ($p$ = 0.0001) of the 123 JASPAR matrices in the human genome in about 18 minutes.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=9f8e942600e246c1f3be64d8d3b2898f&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=9f8e942600e246c1f3be64d8d3b2898f&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.35</guid>
		</item>
		<item>
			<title>PrePrint: Twin Removal in Genetic Algorithms for Protein Structure Prediction Using Low Resolution Model</title>
			<link>http://www.pheedcontent.com/click.phdo?i=c193c885d56205517c93cb80aeec2d0c</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.34</pheedo:origLink>
			<description>This paper presents the impact of twins and the measures for their removal from the population of genetic algorithm (GA) when applied to effective conformational searching. It is conclusively shown that a twin removal strategy for a GA provides considerably enhanced performance when investigating solutions to complex ab initio protein structure prediction (PSP) problems in low resolution model. Without twin removal, GA crossover and mutation operations can become ineffectual as generations lose their ability to produce significant differences which can lead to the solution stalling. The paper relaxes the definition of chromosomal twins in the removal strategy to not only encompass identical, but also highly-correlated chromosomes within the GA population, with empirical results consistently exhibiting significant improvements solving PSP problems.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=c193c885d56205517c93cb80aeec2d0c&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=c193c885d56205517c93cb80aeec2d0c&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.34</guid>
		</item>
		<item>
			<title>PrePrint: Quantifying the Degree of Self-Nestedness of Trees: Application to the Structural Analysis of Plants</title>
			<link>http://www.pheedo.com/click.phdo?i=b26f39549853d4d1e7ae8ea7b8fdcf70</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.29</pheedo:origLink>
			<description>In this paper we are interested in the problem of approximating trees by trees with a particular self-nested structure. Self-nested trees are such that all their subtrees of a given height are isomorphic. We show that these trees present remarkable compression properties, with high compression rates. In order to measure how far a tree is from being a self-nested tree, we then study how to quantify the degree of self-nestedness of any tree. For this, we define a measure of the self-nestedness of a tree by constructing a self-nested tree that minimizes the distance of the original tree to the set of self-nested trees that embed the initial tree. We show that this measure can be computed in polynomial time and depict the corresponding algorithm. The distance to this nearest embedding self-nested tree (NEST) is then used to define compression coefficients that reflect the compressibility of a tree. To illustrate this approach, we then apply these notions to the analysis of plant branching structures. The approach is characterized on both a database of artificial plants with varying degrees of self-nestedness and on a real plant structure. We finally show that the NEST may reveal important aspects of the plant growth.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=b26f39549853d4d1e7ae8ea7b8fdcf70&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=b26f39549853d4d1e7ae8ea7b8fdcf70&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.29</guid>
		</item>
		<item>
			<title>PrePrint: TRIAL: A Tool for Finding Distant Structural Similarities</title>
			<link>http://www.pheedo.com/click.phdo?i=7b1ee0cd68ffc86bfda67f86fb5f210b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.28</pheedo:origLink>
			<description>Finding structural similarities in distant proteins can reveal functional relationships that can not be identified using sequence comparison. Given two proteins A and B and threshold &#x03B5; &#x00C5;, we develop an algorithm, TRiplet-based Iterative ALignment (TRIAL) for computing the transformation of B that maximizes the number of aligned residues such that the root mean square distance of the alignment is at most &#x03B5; &#x00C5;. Our algorithm is designed with the specific goal of effectively handling proteins with low similarity in primary structure, where existing algorithms perform particularly poorly. Experiments show that our method outperforms existing methods. TRIAL alignment brings the secondary structures of distant proteins to similar orientations. It also finds more number of secondary structure matches at lower RMSD (Root Mean Square Deviation) values and increased overall alignment lengths. Its classification accuracy is up to 63% better than other methods, including CE and DALI. TRIAL successfully aligns 83% of the residues from the smaller protein in reasonable time while other methods align only 29 to 65% of the residues for the same set of proteins.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=7b1ee0cd68ffc86bfda67f86fb5f210b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=7b1ee0cd68ffc86bfda67f86fb5f210b&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.28</guid>
		</item>
		<item>
			<title>PrePrint: New Methods for Inference of Local Tree Topologies with Recombinant SNP Sequences in Populations</title>
			<link>http://www.pheedo.com/click.phdo?i=b1e067d11072eed44a65630fe46ab730</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.27</pheedo:origLink>
			<description>Partly due to ecombination, genealogical history of a set of DNA sequences in a population usually can not be represented by a single tree. Instead, genealogy is better represented by a genealogical network, which is a compact representation of a set of correlated local genealogical trees, each for a short region of genome and possibly with different topology. Inference of genealogical history for a set of DNA sequences under recombination has many potential applications, including association mapping of complex diseases. In this paper, we present two new methods for reconstructing local tree topologies with the presence of recombination, which extend and improve the previous work. We first show that the "tree scan" method can be converted to a probabilistic inference method based a hidden Markov model. We then focus on developing a novel local tree inference method called RENT that is both accurate and scalable to larger data. Through simulation, we demonstrate the usefulness of our methods by showing that the hidden Markov model-based method is comparable with the original method in terms of accuracy. We also show that RENT is competitive with other methods in terms of inference accuracy, and its inference error rate is often lower and can handle large data.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=b1e067d11072eed44a65630fe46ab730&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=b1e067d11072eed44a65630fe46ab730&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=b1e067d11072eed44a65630fe46ab730&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.27</guid>
		</item>
		<item>
			<title>PrePrint: The Metropolized Partial Importance Sampling MCMC Mixes Slowly on Minimum Reversal Rearrangement Paths</title>
			<link>http://www.pheedo.com/click.phdo?i=907e3d881f65e5c9eae95575feacd359</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.26</pheedo:origLink>
			<description>Markov chain Monte Carlo has been the standard technique for inferring the posterior distribution of genome rearrangement scenarios under a Bayesian approach. We present here a negative result on the rate of convergence of the generally used Markov chains. We prove that the relaxation time of the Markov chains walking on the optimal reversal sorting scenarios might grow exponentially with the size of the signed permutations, namely, with the number of syntheny blocks.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=907e3d881f65e5c9eae95575feacd359&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=907e3d881f65e5c9eae95575feacd359&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=907e3d881f65e5c9eae95575feacd359&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.26</guid>
		</item>
		<item>
			<title>PrePrint: A Cluster Refinement Algorithm for Motif Discovery</title>
			<link>http://www.pheedo.com/click.phdo?i=28f5fb3bfec40228d8c5d246580f262a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.25</pheedo:origLink>
			<description>Finding Transcription Factor Binding Sites, i.e., motif discovery, is crucial for understanding the gene regulatory relationship. Motifs are weakly conserved and motif discovery is a NP-hard problem. We propose a new approach called Cluster Refinement Algorithm for Motif Discovery (CRMD). CRMD employs a flexible statistical motif model allowing a variable number of motifs and motif instances. CRMD first uses a novel entropy-based clustering to find complete and good starting candidate motifs from the DNA sequences. CRMD then uses an effective greedy refinement to search for optimal motifs from the candidate motifs. The refinement is fast, and it changes the number of motif instances based on the adaptive thresholds. The performance of CRMD is further enhanced if the problem has one occurrence of motif instance per sequence. Using an appropriate similarity test of motifs, CRMD is also able to find multiple motifs. CRMD has been tested extensively on synthetic and real datasets. The experimental results verify that CRMD usually outperforms four other state-of-the-art algorithms in terms of the qualities of the solutions with competitive computing time. It finds a good balance between finding true motif instances and screening false motif instances, and is robust on problems of various levels of difficulty.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=28f5fb3bfec40228d8c5d246580f262a&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=28f5fb3bfec40228d8c5d246580f262a&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=28f5fb3bfec40228d8c5d246580f262a&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.25</guid>
		</item>
		<item>
			<title>PrePrint: A Genetic Optimization Approach for Isolating Translational Efficiency Bias</title>
			<link>http://www.pheedo.com/click.phdo?i=b73832bb5cd897347535b40e25be3cf0</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.24</pheedo:origLink>
			<description>The study of codon usage bias is an important research area that contributes to our understanding of molecular evolution, phylogenetic relationships, respiratory lifestyle, and other characteristics. Translational efficiency bias is perhaps the most well studied codon usage bias, as it is frequently utilized to predict relative protein expression levels. We present a novel approach to isolating translational efficiency bias in microbial genomes. There are several existent methods for isolating translational efficiency bias. Previous approaches are susceptible to the confounding influences of other potentially dominant biases. Additionally, existing approaches to identifying translational efficiency bias generally require both genomic sequence information and prior knowledge of a set of highly expressed genes. This novel approach provides more accurate results from sequence information alone by resisting the confounding effects of other biases. We validate this increase in accuracy in isolating translational efficiency bias on ten microbial genomes, five of which have proven particularly difficult for existing approaches due to the presence of strong confounding biases.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=b73832bb5cd897347535b40e25be3cf0&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=b73832bb5cd897347535b40e25be3cf0&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=b73832bb5cd897347535b40e25be3cf0&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.24</guid>
		</item>
		<item>
			<title>PrePrint: A Sparse Learning Machine for High-Dimensional Data with Application to Microarray Gene Analysis</title>
			<link>http://www.pheedo.com/click.phdo?i=c6b0d2ceafd37073b8639dc3ff6573ab</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.8</pheedo:origLink>
			<description>Extracting features from high-dimensional data is a critically important task for pattern recognition and machine learning applications. High-dimensional data typically have much more variables than observations, and contain significant noise, missing components, or outliers. Features extracted from high-dimensional data need to be discriminative, sparse, and can capture essential characteristics of the data. In this paper, we present a way to constructing multivariate features and then classify the data into proper classes. The resulting small subset of features is nearly the best in the sense of Greenshtein's persistence, however, the estimated feature weights may be biased. We take a systematic step to correct the biases. We use conjugate gradient based primal-dual interior-point techniques for large-scale problems. We apply our procedure to microarray gene analysis. The effectiveness of our method is confirmed by experimental results.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=c6b0d2ceafd37073b8639dc3ff6573ab&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=c6b0d2ceafd37073b8639dc3ff6573ab&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=c6b0d2ceafd37073b8639dc3ff6573ab&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.8</guid>
		</item>
		<item>
			<title>PrePrint: Data Mining on DNA Sequences of Hepatitis B Virus</title>
			<link>http://www.pheedo.com/click.phdo?i=6788101bd48a123c676799880154416c</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.6</pheedo:origLink>
			<description>In this study, a data mining framework which includes molecular evolution analysis, clustering, feature selection, classifier learning and classification, is introduced. Our research group has collected HBV DNA sequences, either genotype B or C, from over 200 patients specifically for this project. In the molecular evolution analysis and clustering, three subgroups have been identified in genotype C and a clustering method has been developed to separate the subgroups. In the feature selection process, potential markers are selected based on Information Gain for further classifier learning. Then meaningful rules are learnt by our algorithm called the Rule Learning which is based on Evolutionary Algorithm. Also, a new classification method by Nonlinear Integral has been developed. Good performance of this method comes from the use of the fuzzy measure and the relevant nonlinear integral. The nonadditivity of the fuzzy measure reflects the importance of the feature attributes as well as their interactions. These two classifiers give explicit information on the importance of the individual mutated sites and their interactions towards the classification (potential causes to liver cancer in our case). A thorough comparison study of these two methods with existing methods is detailed.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=6788101bd48a123c676799880154416c&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=6788101bd48a123c676799880154416c&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=6788101bd48a123c676799880154416c&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.6</guid>
		</item>
		<item>
			<title>PrePrint: A Metric on the Space of Reduced Phylogenetic Networks</title>
			<link>http://www.pheedo.com/click.phdo?i=3922845ce88b844a3b3074ca31802866</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.2</pheedo:origLink>
			<description>Phylogenetic networks are leaf-labeled, rooted, acyclic, directed graphs, that model reticulate evolutionary histories. Several measures for quantifying the topological dissimilarity between two phylogenetic networks have been devised for various classes of phylogenetic networks. A biologically-motivated class of phylogenetic networks, namely reduced phylogenetic networks, was recently introduced. None of the existing measures is a metric on the space of reduced phylogenetic networks. In this paper, we provide a polynomiallycomputable&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=3922845ce88b844a3b3074ca31802866&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=3922845ce88b844a3b3074ca31802866&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=3922845ce88b844a3b3074ca31802866&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.2</guid>
		</item>
		<item>
			<title>PrePrint: Information-Theoretic Model of Evolution over Protein Communication Channel</title>
			<link>http://www.pheedo.com/click.phdo?i=7ae7a12d8f6921eee5c5211cdf3c406a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.1</pheedo:origLink>
			<description>In this paper, we propose a communication model of evolution and investigate its information-theoretic bounds. The process of evolution is modeled as the retransmission of information over a protein communication channel, where the transmitted message is the organism&#x2019;s proteome encoded in the DNA. We compute the capacity and the rate-distortion functions of the protein communication system for the three domains of life: Archaea, Bacteria and Eukaryotes. The tradeoff between the transmission rate and the distortion in noisy protein communication channels is analyzed. As expected, comparison between the optimal transmission rate and the channel capacity indicates that the biological fidelity does not reach the Shannon optimal distortion. However, the relationship between the channel capacity and rate distortion achieved for different biological domains provides tremendous insight into the dynamics of the evolutionary processes of the three domains of life. We rely on these results to provide a model of genome sequence evolution based on the two major evolutionary driving forces: mutations and unequal crossovers.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=7ae7a12d8f6921eee5c5211cdf3c406a&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=7ae7a12d8f6921eee5c5211cdf3c406a&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=7ae7a12d8f6921eee5c5211cdf3c406a&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.1</guid>
		</item>
		<item>
			<title>PrePrint: Bayesian Models and Algorithms for Protein beta-Sheet Prediction</title>
			<link>http://www.pheedo.com/click.phdo?i=905dda8a2d6878dbad2f75616d70a971</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.140</pheedo:origLink>
			<description>Prediction of the three-dimensional structure greatly benefits from the information related to secondary structure, solvent accessibility, and non-local contacts that stabilize a protein's structure. Prediction of such components is vital to our understanding of the structure and function of a protein. In this paper, we address the problem of beta-sheet prediction. We introduce a Bayesian approach for proteins with six or less beta-strands, in which we model the conformational features in a probabilistic framework. To select the optimum architecture, we analyze the space of possible conformations by efficient heuristics. Furthermore, we employ an algorithm that finds the optimum pairwise alignment between beta-strands using dynamic programming. Allowing any number of gaps in an alignment enables us to model beta-bulges more effectively. Though our main focus is proteins with six or less beta-strands, we are also able to perform predictions for proteins with more than six beta-strands by combining the predictions of BetaPro with the gapped alignment algorithm. We evaluated the accuracy of our method and BetaPro. We performed a 10-fold cross validation experiment on the BetaSheet916 set and we obtained significant improvements in the prediction accuracy.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=905dda8a2d6878dbad2f75616d70a971&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=905dda8a2d6878dbad2f75616d70a971&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=905dda8a2d6878dbad2f75616d70a971&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.140</guid>
		</item>
		<item>
			<title>PrePrint: RDCurve: A Nonparametric Method to Evaluate the Stability of Ranking Procedures</title>
			<link>http://www.pheedo.com/click.phdo?i=de5c9f4165385c776946785b1716438d</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.138</pheedo:origLink>
			<description>Great concerns have been raised about the reproducibility of gene signatures based on high-throughput techniques such as microarray. Studies analyzing similar samples often report poorly overlapping results, and the p-value usually lacks biological context. We propose a non-parametric Re-Discovery-Curve (RDCurve) method, to estimate the frequency of rediscovery of gene signature identified. Given a ranking procedure and a dataset with replicated measurements, the RDCurve bootstraps the dataset and repeatedly applies the ranking procedure, selects a subset of k important genes, and estimates the probability of rediscovery of the selected subset of genes. We also propose a permutation scheme to estimate the confidence band under the Null hypothesis for the significance of the RDCurve. The method is non-parametric and model independent. With the RDCurve we can assess the signal-noise ratio of the data, compare the performance of ranking procedures in term of their expected rediscovery rates, and choose the number of genes to be reported.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=de5c9f4165385c776946785b1716438d&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=de5c9f4165385c776946785b1716438d&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=de5c9f4165385c776946785b1716438d&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.138</guid>
		</item>
	</channel>
</rss>