<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet href="/css/rss20.xsl" type="text/xsl"?>
<rss xmlns:pheedo="http://www.pheedo.com/namespace/pheedo" version="2.0">
	<channel>
		<title>IEEE/ACM Transactions on Computational Biology and Bioinformatics</title>
		<link>http://www.computer.org/tcbb</link>
		<description>The IEEE/ACM Transactions on Computational Biology and Bioinformatics is a new quarterly that will publish archival research results related to the algorithmic, mathematical, statistical, and computational methods that are central in bioinformatics and computational biology; the development and testing of effective computer programs in bioinformatics; the development and optimization of biological databases; and important biological results that are obtained from the use of these methods, programs, and databases.	</description>
		<language>en-us</language>
		<pubDate>Wed, 8 Jul 2009 10:00:02 GMT</pubDate>
		<image>
			<url>http://csdl.computer.org/common/images/logos/tcbb.gif</url>
			<title>IEEE Computer Society</title>
			<description>List of recently published journal articles</description>
			<link>http://www.computer.org/tcbb</link>
		</image>
		<item>
			<title>PrePrint: Efficient peak-labeling algorithms for whole-sample mass spectrometry proteomics</title>
			<link>http://www.pheedo.com/click.phdo?i=39c009bc37f8e019db33e2e55e8b9144</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.31</pheedo:origLink>
			<description>Whole--sample mass spectrometry (MS) proteomics allows for a parallel measurement of hundreds of proteins present in a variety of biospecimens. Unfortunately, the association between MS signals and these proteins is not straightforward. The need to interpret mass spectra demands the development of methods for accurate labeling of ion species in such profiles. To aid this process we have developed a new peak-labeling procedure for associating protein and peptide labels with peaks. This computational method builds upon characteristics of proteins expected to be in the sample, such as the amino sequence, mass weight, and expected concentration within the sample. A new probabilistic score which incorporates this information is proposed. We evaluate and demonstrate our method's ability to label peaks first on simulated MS spectra and then on MS spectra from human serum with a spiked-in calibration mixture.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=39c009bc37f8e019db33e2e55e8b9144&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=39c009bc37f8e019db33e2e55e8b9144&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.31</guid>
		</item>
		<item>
			<title>PrePrint: On the Characterization and Selection of Diverse Conformational Ensembles, with Applications to Flexible Docking</title>
			<link>http://www.pheedcontent.com/click.phdo?i=e21e4bc7ed54dfb974f1fe04f1bb3bcb</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.59</pheedo:origLink>
			<description>To address challenging flexible docking problems, a number of docking algorithms pre-generate large collections of candidate conformers. To remove the redundancy from such ensembles, a central problem in this context is to report a selection of conformers maximizing some geometric diversity criterion. We make three contributions to this problem. First, we resort to geometric optimization so as to report selections maximizing the molecular volume or molecular surface area (MSA) of the selection. Greedy strategies are developed, together with approximation bounds. Second, to assess the efficacy of our algorithms, we investigate two conformer ensembles corresponding to a flexible loop of four protein complexes. By focusing on the MSA of the selection, we show that our strategy matches the MSA of standard selection methods, but resorting to a number of conformers between one and two orders of magnitude smaller. This observation is qualitatively explained using the Betti numbers of the union of balls of the selection. Finally, we replace the conformer selection problem in the context of multiple-copy flexible docking. On the afore-mentioned systems, we show that using the loops selected by our strategy can improve the result of the docking process.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=e21e4bc7ed54dfb974f1fe04f1bb3bcb&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=e21e4bc7ed54dfb974f1fe04f1bb3bcb&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.59</guid>
		</item>
		<item>
			<title>PrePrint: Influence of Prior Knowledge in Constraint-Based Learning of Gene Regulatory Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=29ada1b4438423e66720d5d36f421759</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.58</pheedo:origLink>
			<description>Constraint-based structure learning algorithms generally perform well on sparse graphs. Although sparsity is not uncommon, there are some domains where the underlying graph can have some dense regions; one of these domains is gene regulatory networks, which is the main motivation to undertake the study described in this paper. We propose a new constraint-based algorithm that can both increase the quality of output and decrease the computational requirements for learning the structure of gene regulatory networks. The algorithm is based on and extends the PC algorithm. Two different types of information are derived from the prior knowledge; one is the probability of existence of edges, and the other is the nodes that seem to be dependent on a large number of nodes compared to other nodes in the graph. Also a new method based on Gene Ontology for gene regulatory network validation is proposed. We demonstrate the applicability and effectiveness of the proposed algorithms on both synthetic and real data sets.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=29ada1b4438423e66720d5d36f421759&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=29ada1b4438423e66720d5d36f421759&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.58</guid>
		</item>
		<item>
			<title>PrePrint: F&amp;#xb2;Dock: Fast Fourier Protein-Protein Docking</title>
			<link>http://www.pheedcontent.com/click.phdo?i=f7381028fb6ab2813671f4e5f41f20d4</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.57</pheedo:origLink>
			<description>The functions of proteins is often realized through their mutual interactions. Determining a relative transformation for a pair of proteins and their conformations which form a stable complex, reproducible in nature, is known as docking. It is an important step in drug design, structure determination and understanding function and structure relationships. We provide a scoring model for rigid docking and error-bounded approximation algorithms to predict docking sites. Translational search is sped up using the Fourier domain. Shape based interactions is shown to give good results for a large range of pairs of proteins.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=f7381028fb6ab2813671f4e5f41f20d4&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=f7381028fb6ab2813671f4e5f41f20d4&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.57</guid>
		</item>
		<item>
			<title>PrePrint: Peak Tree: A New Tool for Multiscale Hierarchical Representation and Peak Detection of Mass Spectrometry Data</title>
			<link>http://www.pheedcontent.com/click.phdo?i=e045badaff523f1c6ea3b8696b263943</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.56</pheedo:origLink>
			<description>In mass spectrometry (MS) analysis, false peak detection results are unavoidable due to severe spectrum variations. However, most current peak detection methods are neither robust enough to resist spectrum variations nor flexible enough to revise false detection results. To improve flexibility, we introduce peak tree to represent the peak information in MS spectra. Each tree node is a peak judgment on a range of scales, and each tree decomposition, as a set of nodes, is a candidate peak detection result. To improve robustness, we combine peak detection and common peak alignment into a closed-loop framework, which finds the optimal decomposition considering both peak intensity and common peak information. The common peak information is derived from the density clustering of peaks detected throughout the MS database and loopily refined to direct peak tree decomposition. Finally, we present an improved ant colony optimization (ACO) biomarker selection method to build a MS analysis system based on peak tree. Experiment shows that our peak detection method can better resist spectrum variations and provide higher sensitivity and lower false detection rates than conventional methods. The benefits from our peak tree based system for MS disease analysis are also proved on real SELDI data&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=e045badaff523f1c6ea3b8696b263943&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=e045badaff523f1c6ea3b8696b263943&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.56</guid>
		</item>
		<item>
			<title>PrePrint: Predicting Metabolic Fluxes Using Gene Expression Differences as Constraints</title>
			<link>http://www.pheedcontent.com/click.phdo?i=a28537be4af70b9f9e89992c5afce29b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.55</pheedo:origLink>
			<description>A standard approach to estimate intracellular fluxes on a genome-wide scale is flux balance analysis (FBA), which optimizes an objective function subject to constraints on (relations between) fluxes. The performance of FBA models heavily depends on the relevance of the formulated objective function and the completeness of the defined constraints. Previous studies indicated that FBA predictions can be improved by adding regulatory on/off constraints. These constraints were imposed based on either absolute (Shlomi2007a,Covert2004) or relative (Shlomi2008) gene expression values. We provide a new algorithm that directly uses regulatory up/down constraints based on gene expression data in FBA optimization (tFBA). Our assumption is that if the activity of a gene drastically changes from one condition to the other, the flux through the reaction controlled by that gene will change accordingly. The potential of the proposed method, tFBA, is demonstrated through the analysis of fluxes in yeast under nine different cultivation conditions. We illustrate that changes in gene expression are predictive for changes in fluxes. We compare tFBA and FBA predictions to show that our approach yields more biologically relevant results.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=a28537be4af70b9f9e89992c5afce29b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=a28537be4af70b9f9e89992c5afce29b&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.55</guid>
		</item>
		<item>
			<title>PrePrint: A Partial Set Covering Model for Protein Mixture Identification Using Mass Spectrometry Data</title>
			<link>http://www.pheedcontent.com/click.phdo?i=c3608345a2a4d21dd36596e069c888a7</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.54</pheedo:origLink>
			<description>Protein identification is a key and essential step in mass spectrometry (MS) based proteome research. To date, there are many protein identification strategies that employ either MS data or MS/MS data for database searching. While MS-based methods provide wider coverage than MS/MS-based methods, their identification accuracy is lower since MS data have less information than MS/MS data. Thus, it is desired to design more sophisticated algorithms that achieve higher identification accuracy using MS data. Peptide Mass Fingerprinting (PMF) has been widely used to identify single purified proteins from MS data for many years. In this paper, we extend this technology to protein mixture identification. First, we formulate the problem of protein mixture identification as a Partial Set Covering (PSC) problem. Then, we present several algorithms that can solve the PSC problem efficiently. Finally, we extend the partial set covering model to both MS/MS data and the combination of MS data and MS/MS data. The experimental results on simulated data and real data demonstrate the advantages of our method: (1) it outperforms previous MS-based approaches significantly; (2) it is useful in the MS/MS-based protein inference; and (3) it combines MS data and MS/MS data in a unified model such that the identification performance is further improved.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=c3608345a2a4d21dd36596e069c888a7&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=c3608345a2a4d21dd36596e069c888a7&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.54</guid>
		</item>
		<item>
			<title>PrePrint: Fast Surface-Based Travel Depth Estimation Algorithm for Macromolecule Surface Shape Description</title>
			<link>http://www.pheedcontent.com/click.phdo?i=33b26cc8a59813d0ba68230b804ad07f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.53</pheedo:origLink>
			<description>Travel Depth, introduced by Coleman and Sharp in 2006, is a physical interpretation of molecular depth, term frequently used to describe the shape of a molecular active site or binding site. Travel Depth can be seen as the physical distance a solvent molecule would have to travel from a point of the surface, i.e., the Solvent Excluded Surface (SES), to its convex hull. Existing algorithms providing an estimation of the Travel Depth are based on a regular sampling of the molecule volume and on the use of the Dijkstra&#x2019;s shortest path algorithm. Since Travel Depth is only defined on the molecular surface, this volume-based approach is characterized by a large computational complexity due to the processing of unnecessary samples lying inside or outside the molecule. In this paper, we propose a surface-based approach that restricts the processing to data defined on the SES. This algorithm significantly reduces the complexity of Travel Depth estimation and makes possible the analysis of large macromolecule surface shape description with high resolution. Experimental results show that compared to existing methods, the proposed algorithm achieves accurate estimations with considerably reduced processing times.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=33b26cc8a59813d0ba68230b804ad07f&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=33b26cc8a59813d0ba68230b804ad07f&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.53</guid>
		</item>
		<item>
			<title>PrePrint: Linear-Time Algorithms for the Multiple Gene Duplication Problems</title>
			<link>http://www.pheedcontent.com/click.phdo?i=d5b323521528115af9755138842c7885</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.52</pheedo:origLink>
			<description>A fundamental problem arising in the evolutionary molecular biology is to discover the locations of gene duplications and multiple gene duplication episodes based on the phylogenetic information. The solutions to the Multiple Gene Duplication problems can provide useful clues to place the gene duplication events onto the locations of a species tree and to expose the multiple gene duplication episodes. In this paper, we study two variations of the Multiple Gene Duplication problems: the Episode-Clustering (EC) problem and the Minimum Episodes (ME) problem. For the EC problem, we improve the results of Burleigh et~al. with an optimal linear-time algorithm. For the ME problem, on the basis of the algorithm presented by Bansal and Eulenstein, we propose an optimal linear-time algorithm.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=d5b323521528115af9755138842c7885&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=d5b323521528115af9755138842c7885&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.52</guid>
		</item>
		<item>
			<title>PrePrint: A General Framework for Analyzing Data from Two Short Time-Series Microarray Experiments</title>
			<link>http://www.pheedcontent.com/click.phdo?i=15c8195d5f41cb1334cce27518400c2b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.51</pheedo:origLink>
			<description>We propose a general theoretical framework for analyzing differentially expressed genes and behavior patterns from two homogenous short time-course data. The framework generalizes the recently proposed Hilbert Schmidt Independence Criterion (HSIC) based framework adapting it to the time-series scenario by utilizing tensor analysis for data transformation. The proposed framework is effective in yielding criteria that can identify both the differentially expressed genes and time-course patterns of interest between two time series experiments without requiring to explicitly cluster the data. The results, obtained by applying the proposed framework with a linear kernel formulation, on various datasets, are found to be both biologically meaningful and consistent with published studies.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=15c8195d5f41cb1334cce27518400c2b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=15c8195d5f41cb1334cce27518400c2b&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.51</guid>
		</item>
		<item>
			<title>PrePrint: Fuzzy ARTMAP Prediction of Biological Activities for Potential HIV-1 Protease Inhibitors Using A Small Molecular Dataset</title>
			<link>http://www.pheedcontent.com/click.phdo?i=2d791fcbd2169bb2c4e18e3954bd9a94</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.50</pheedo:origLink>
			<description>We focus on the neuro-fuzzy prediction of biological activities of HIV-1 protease inhibitory compounds when inferring from small training sets. We propose two computational intelligence prediction techniques which are suitable for small training sets, at the expense of some computational overhead. Both techniques are based on the FAMR model. The FAMR is a Fuzzy ARTMAP (FAM) incremental learning system used for classification and probability estimation. During the learning phase, each sample pair is assigned a relevance factor proportional to the importance of that pair. The two proposed algorithms in this paper are: 1. The GA-FAMR algorithm, which is new, uses a genetic algorithm to optimize the relevances assigned to the training data. 2. The Ordered FAMR is derived from a known algorithm. Instead of optimizing relevances, it optimizes the order of data presentation using the algorithm of Dagher et al. In our experiments, we compare these two algorithms with an algorithm not based on the FAM, the FS-GA-FNN. We conclude that when inferring from small training sets, both techniques are efficient, in terms of generalization capability and execution time. The computational overhead introduced is compensated by the better accuracy obtained. Finally, the proposed techniques are used to predict the biological activities of newly designed potential HIV-1 protease inhibitors.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=2d791fcbd2169bb2c4e18e3954bd9a94&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=2d791fcbd2169bb2c4e18e3954bd9a94&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.50</guid>
		</item>
		<item>
			<title>PrePrint: Model Reduction Using Piecewise-Linear Approximations Preserves Dynamic Properties of the Carbon Starvation Response in Escherichia coli</title>
			<link>http://www.pheedcontent.com/click.phdo?i=535ac49c52fde0e7f3d87da69b95edd7</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.49</pheedo:origLink>
			<description>The adaptation of the bacterium Escherichia coli to carbon starvation is controlled by a large network of biochemical reactions involving genes, mRNAs, proteins, and signalling molecules. The dynamics of these networks is difficult to analyze, notably due to a lack of quantitative information on parameter values. To overcome these limitations, model reduction approaches based on quasi-steady-state (QSS) and piecewise-linear (PL) approximations have been proposed, resulting in models that are easier to handle mathematically and computationally. The approximations are not supposed to affect the capability of the model to account for essential dynamical properties of the system, but the validity of this assumption has not been systematically tested. In this paper we carry out such a study by evaluating a large and complex PL model of the carbon starvation response in E. coli using an ensemble approach. The results show that, in comparison with conventional nonlinear models, the PL approximations generally preserve the dynamics of the carbon starvation response network, although with some deviations concerning notably the quantitative precision of the model predictions. This encourages the application of PL models to the qualitative analysis of bacterial regulatory networks, in situations where the reference time-scale is that of protein synthesis and degradation.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=535ac49c52fde0e7f3d87da69b95edd7&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=535ac49c52fde0e7f3d87da69b95edd7&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.49</guid>
		</item>
		<item>
			<title>PrePrint: Learning Genetic Regulatory Network Connectivity From Time Series Data</title>
			<link>http://www.pheedcontent.com/click.phdo?i=242b7fb897a29dcf8088ca1213468476</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.48</pheedo:origLink>
			<description>Recent experimental advances facilitate the collection of time series data that indicate which genes in a cell are expressed. This information can be used to understand the genetic regulatory network that generates the data. Typically, Bayesian analysis approaches are applied which neglect the time series nature of the experimental data, have difficulty in determining the direction of causality, and do not perform well on networks with tight feedback. This paper presents a method to learn genetic network connectivity which exploits the time series nature of experimental data to achieve better causal predictions. This method breaks up the data into bins, and determines an initial set of potential influence vectors for each gene based upon the probability of the gene&#x2019;s expression increasing in the next time step. These vectors are then combined to form new vectors with better scores and are competed against each other to determine the final influence vector for each gene. The result is a directed graph representation of the genetic network&#x2019;s repression and activation connections. Results are reported for several synthetic networks with tight feedback showing significant improvements over another dynamic Bayesian approach. Promising results are reported for genes involved in the yeast cell cycle.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=242b7fb897a29dcf8088ca1213468476&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=242b7fb897a29dcf8088ca1213468476&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.48</guid>
		</item>
		<item>
			<title>PrePrint: Efficient Formulations for Exact Stochastic Simulation of Chemical Systems</title>
			<link>http://www.pheedcontent.com/click.phdo?i=14b7eec17f212c44de1ffdd30ae033e5</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.47</pheedo:origLink>
			<description>One can generate trajectories to simulate a system of chemical reactions using either Gillespie's direct method or Gibson and Bruck's next reaction method. Because one usually needs many trajectories to understand the dynamics of a system, performance is important. In this paper we present new formulations of these methods that improve the computational complexity of the algorithms. We present optimized implementations, available from http://cain.sourceforge.net, that offer better performance than previous work. There is no single method that is best for all problems. Simple formulations often work best for systems with a small number of reactions, while some sophisticated methods offer the best performance for large problems and scale well asymptotically. We investigate the performance of each formulation on simple biological systems using a wide range of problem sizes. We also consider the numerical accuracy of the direct and the next reaction method. We have found that special precautions must be taken in order to ensure that randomness is not discarded during the course of a simulation.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=14b7eec17f212c44de1ffdd30ae033e5&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=14b7eec17f212c44de1ffdd30ae033e5&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.47</guid>
		</item>
		<item>
			<title>IEEE/ACM Transactions on Computational Biology and Bioinformatics - April-June 2009 (Vol. 6, No. 2)</title>
			<link>http://opac.ieeecomputersociety.org/opac?year=2009&amp;volume=6&amp;issue=02&amp;acronym=tcbb</link>
			<description>IEEE/ACM Transactions on Computational Biology and Bioinformatics</description>
			<guid isPermaLink="true">http://www.computer.org/portal/site/tcbb/</guid>
		</item>
		<item>
			<title>PrePrint: Genetic Networks and Soft Computing</title>
			<link>http://www.pheedcontent.com/click.phdo?i=b0c6cf24aa461fd2581455282fc6f667</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.39</pheedo:origLink>
			<description>Analysis of gene regulatory networks provides enormous information on various fundamental cellular processes involving growth, development, hormone secretion and cellular communication. Their extraction from available gene expression profiles is a challenging problem. Such reverse engineering of genetic networks offers insight into cellular activity, and towards prediction of adverse effects of new drugs or possible identification of new drug targets. Tasks like classification, clustering and feature selection enable efficient mining of knowledge about gene interactions in the form of networks. It is known that biological data is prone to different kinds of noise and ambiguity. Soft computing tools like fuzzy sets, evolutionary strategies and neurocomputing have been found to help in providing low cost, acceptable solutions in the presence of various types of uncertainties. In this article we survey the role of these soft methodologies and their hybridizations, for the purpose of generating genetic networks.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=b0c6cf24aa461fd2581455282fc6f667&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=b0c6cf24aa461fd2581455282fc6f667&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.39</guid>
		</item>
		<item>
			<title>PrePrint: Probabilistic Analysis of Probe Reliability in Differential Gene Expression Studies with Short Oligonucleotide Arrays</title>
			<link>http://www.pheedcontent.com/click.phdo?i=d3969f08c9f429a4e9dbca7468ab1214</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.38</pheedo:origLink>
			<description>Probe defects are a major source of noise in gene expression studies. While existing approaches detect noisy probes based on external information such as genomic alignments, we introduce and validate a targeted probabilistic method for analyzing probe reliability directly from expression data and independently of the noise source. This provides insights into the various sources of probe-level noise and gives tools to guide probe design.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=d3969f08c9f429a4e9dbca7468ab1214&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=d3969f08c9f429a4e9dbca7468ab1214&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.38</guid>
		</item>
		<item>
			<title>PrePrint: Identification and Modeling of Genes with Diurnal Oscillations from Microarray Time Series Data</title>
			<link>http://www.pheedcontent.com/click.phdo?i=b1d3a68d7a36910bed4511963c68fd03</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.37</pheedo:origLink>
			<description>Behavior of living organisms is strongly modulated by the day and night cycle giving rise to a cyclic pattern of activities. Such a pattern helps the organism to coordinate their activities and maintain a balance between what could be performed during the 'day' and what could be relegated to 'night'. This cyclic pattern, called the 'Circadian Rhythm', is a biological phenomenon observed in a large number of organisms. In this paper, our goal is to analyze transcriptome data from Cyanothece for the purpose of discovering genes whose expressions are rhythmic. We cluster these genes into groups that are close in terms of their phases and show that genes from a specific metabolic functional category are tightly clustered, indicating perhaps a 'preferred time of the day/night' when the organism performs this function. The proposed analysis is applied to two sets of micro array experiments performed under varying incident light patterns. Subsequently we propose a model with a network of three phase oscillators together with a central master clock and use it to approximate a set of 'circadian controlled genes' that can be approximated closely.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=b1d3a68d7a36910bed4511963c68fd03&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=b1d3a68d7a36910bed4511963c68fd03&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.37</guid>
		</item>
		<item>
			<title>PrePrint: Nonnegative Principal Component Analysis for Cancer Molecular Pattern Discovery</title>
			<link>http://www.pheedcontent.com/click.phdo?i=0ac0ff807f35bc85d082a839f6be6dc2</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.36</pheedo:origLink>
			<description>As a well established feature selection algorithm, principal component analysis (PCA) is often combined with state-of-the-art classification algorithms to identify cancer molecular patterns in microarray data. However, its global feature selection mechanism prevents it from effectively capturing the latent data structures in the high dimensional data. In this study, we investigate the benefit of adding nonnegative constraints on PCA and develop a nonnegative principal component analysis algorithm (NPCA) to overcome the global nature of PCA. A novel classification algorithm NPCA-SVM is proposed for microarray data pattern discovery. We report strong classification results from the NPCA-SVM algorithm on five benchmark microarray datasets by direct comparison with other related algorithms. We have also proved mathematically and interpreted biologically that microarray data will inevitably encounter over-fitting for a SVM/PCA-SVM learning machine under a Gaussian kernel. In addition, we demonstrate nonnegative principal component analysis can be used to capture meaningful biomarkers effectively.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;hr /&gt;
&lt;div style=&quot;font-size:xx-small;color:gray;padding-bottom:.5em&quot;&gt;Presented By:&lt;/div&gt;
&lt;div&gt;&lt;a href=&quot;http://www.pheedo.com/feeds/ht.php?t=c&amp;amp;i=0ac0ff807f35bc85d082a839f6be6dc2&amp;amp;p=1&quot;&gt;Inside Guantanamo: Sunday at 9P e/p&lt;/a&gt;&lt;/div&gt;
&lt;table border=&quot;0&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot;&gt;
&lt;tr&gt;&lt;td valign=&quot;top&quot;&gt;&lt;embed src=&quot;http://c.brightcove.com/services/viewer/federated_f9/17831997001?isVid=1&amp;publisherID=1660622131&quot; bgcolor=&quot;#FFFFFF&quot; flashVars=&quot;@videoPlayer=17854499001&amp;playerID=17831997001&amp;domain=embed&amp;&quot; base=&quot;http://admin.brightcove.com&quot; name=&quot;flashObj&quot; width=&quot;300&quot; height=&quot;250&quot; seamlesstabbing=&quot;false&quot; type=&quot;application/x-shockwave-flash&quot; allowFullScreen=&quot;true&quot; swLiveConnect=&quot;true&quot; allowScriptAccess=&quot;always&quot; pluginspage=&quot;http://www.macromedia.com/shockwave/download/index.cgi?P1_Prod_Version=ShockwaveFlash&quot;&gt;&lt;/embed&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;http://images.pheedo.com/g/ngc_bluewhale/brand_logo_80x60.png&quot;&gt;&lt;br /&gt;&lt;font size=&quot;2&quot; face=&quot;tahoma&quot; &gt;Guantanamo Bay is one of the world's controversial prisons. This may be its final chapter.  With unprecedented access, National Geographic has the story you haven't heard.  Both sides, told from the inside, before its doors close forever. Click to learn more and go Inside Guantanamo &gt;&gt;&lt;br /&gt;&lt;/font&gt;&lt;a href=&quot;http://www.pheedo.com/click.phdo?a=v3%3Aa271cee67dfff482f0d65fb1ab2dbeb4%3AMr%2Bh0MpnVRLPNJdcAt9CNC9V4bldEKN7LJct7xOR4Qasw2TqiPSywbekHkNSMJBXoLLTgxjqJ6GFDjQrWKxDTti%2BExxPSgB53ImQxT%2Fv%2F65baGhOO2fHMoDRL2wRGFtyEd9rjTRarteEV4MpZVASTMH%2BQlzbT04u%2FQ%3D%3D&quot;target=&quot;_blank&quot;&gt;&lt;font size=&quot;2&quot; font color=&quot;007DC3&quot; face=&quot;tahoma&quot; &gt;&lt;U&gt;natgeotv.com/guantanamo&lt;U&gt;&lt;/font&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;div style=&quot;font-size:xx-small; padding-top: 1em;&quot;&gt;&lt;span style=&quot;border-top: 1px solid&quot;&gt;
&lt;br style=&quot;display:none&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/&quot;&gt;Ads by Pheedo&lt;/a&gt;
&lt;/span&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0; height: 1px; width: 1px;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; src=&quot;http://www.pheedo.com/feeds/ht.php?t=v&amp;amp;i=0ac0ff807f35bc85d082a839f6be6dc2&amp;amp;p=1&quot;/&gt;
&lt;br/&gt;
&lt;/div&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.36</guid>
		</item>
		<item>
			<title>PrePrint: Finding Significant Matches of Position Weight Matrices in Linear Time</title>
			<link>http://www.pheedcontent.com/click.phdo?i=9f8e942600e246c1f3be64d8d3b2898f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.35</pheedo:origLink>
			<description>Position weight matrices are an important method for modeling signals or motifs in biological sequences, both in DNA and protein contexts. In this paper we present fast algorithms for the problem of finding significant matches of such matrices. Our algorithms are of the on--line type, and they generalize classical multi-pattern matching, filtering, and super-alphabet techniques of combinatorial string matching to the problem of weight matrix matching. Several variants of the algorithms are developed, including multiple matrix extensions that perform the search for several matrices in one scan through the sequence database. Experimental performance evaluation is provided to compare the new techniques against each other as well as against some other on--line and index--based algorithms proposed in the literature. Compared to the brute-force $O(mn)$ approach, our solutions can be faster by a factor that is proportional to the matrix length $m$. Our multiple-matrix filtration algorithm had the best performance in the experiments. On a current PC, this algorithm finds significant matches ($p$ = 0.0001) of the 123 JASPAR matrices in the human genome in about 18 minutes.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=9f8e942600e246c1f3be64d8d3b2898f&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=9f8e942600e246c1f3be64d8d3b2898f&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.35</guid>
		</item>
		<item>
			<title>PrePrint: Twin Removal in Genetic Algorithms for Protein Structure Prediction Using Low Resolution Model</title>
			<link>http://www.pheedcontent.com/click.phdo?i=c193c885d56205517c93cb80aeec2d0c</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.34</pheedo:origLink>
			<description>This paper presents the impact of twins and the measures for their removal from the population of genetic algorithm (GA) when applied to effective conformational searching. It is conclusively shown that a twin removal strategy for a GA provides considerably enhanced performance when investigating solutions to complex ab initio protein structure prediction (PSP) problems in low resolution model. Without twin removal, GA crossover and mutation operations can become ineffectual as generations lose their ability to produce significant differences which can lead to the solution stalling. The paper relaxes the definition of chromosomal twins in the removal strategy to not only encompass identical, but also highly-correlated chromosomes within the GA population, with empirical results consistently exhibiting significant improvements solving PSP problems.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=c193c885d56205517c93cb80aeec2d0c&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=c193c885d56205517c93cb80aeec2d0c&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.34</guid>
		</item>
		<item>
			<title>PrePrint: On Nakhleh's Metric for Reduced Phylogenetic Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=857054cd212bdf1733fe3b4c9a885b43</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.33</pheedo:origLink>
			<description>We prove that Nakhleh's metric for reduced phylogenetic networks is also a metric on the classes of tree-child phylogenetic networks, of semi-binary tree-sibling time consistent phylogenetic networks, and of multi-labeled phylogenetic trees. We also prove that it separates distinguishable phylogenetic networks. In this way, it becomes the strongest dissimilarity measure for phylogenetic networks available so far. Furthermore, we propose a generalization of that metric that separates arbitrary phylogenetic networks.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=857054cd212bdf1733fe3b4c9a885b43&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=857054cd212bdf1733fe3b4c9a885b43&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.33</guid>
		</item>
		<item>
			<title>PrePrint: Computing the Distribution of a Tree Metric</title>
			<link>http://www.pheedcontent.com/click.phdo?i=60576527e2b756135acb885d2a5f7c9f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.32</pheedo:origLink>
			<description>The Robinson-Foulds (RF) distance is by far the most widely used measure of dissimilarity between trees. Although the distribution of these distances has been investigated for twenty years, an algorithm that is explicitly polynomial time has yet to be described for computing this distribution (which is also the distribution of trees around a given tree under the popular Robinson-Foulds metric). In this paper we derive a polynomial-time algorithm for this distribution. We show how the distribution can be approximated by a Poisson distribution determined by the proportion of leaves that lie in 'cherries' of the given tree. We also describe how our results can be used to derive normalization constants that are required in a recently-proposed maximum likelihood approach to supertree construction.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=60576527e2b756135acb885d2a5f7c9f&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=60576527e2b756135acb885d2a5f7c9f&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.32</guid>
		</item>
		<item>
			<title>PrePrint: Evaluation of Geometric Complementarity between Molecular Surfaces Using Compactly Supported Radial Basis Functions</title>
			<link>http://www.pheedo.com/click.phdo?i=ee6bf5bf6c0fbf0465797664cfba450e</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.31</pheedo:origLink>
			<description>One of the challenges faced by all molecular docking algorithms is that of being able to discriminate between correct results and false positives obtained in the simulations. The scoring or energetic function is the one that must fulfill this task. Several scoring functions have been developed and new methodologies are still under development. In this paper we have employed the Compactly Supported Radial Basis Functions (CSRBF) to create analytical representations of molecular surfaces, which are then included as key components of a new scoring function for molecular docking. The method proposed here achieves a better ranking of the solutions produced by the program DOCK, as compared with the ranking done by its native contact scoring function. Our new analytical scoring function based on CSRBF can be easily included in different available docking programs as a reliable and quick filter in large-scale docking simulations.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=ee6bf5bf6c0fbf0465797664cfba450e&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=ee6bf5bf6c0fbf0465797664cfba450e&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.31</guid>
		</item>
		<item>
			<title>PrePrint: Heuristic Reuseable Dynamic Programming: Efficient Updating of Local Sequence Alignment</title>
			<link>http://www.pheedo.com/click.phdo?i=6344ec1bb66c39f9d60ca9ad10234d71</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.30</pheedo:origLink>
			<description>Recomputing the similarity results previously evaluated among sequences is inevitable, as researchers discover errors in biological sequences or analyze multiple nearly similar sequences, e.g., in a family of proteins. We present an efficient updated scheme for handling local sequence alignments with an affine gap model. Once a matching procedure between two amino acid sequences is done, we perform a forward-backward alignment to generate heuristic searching bands which are bounded by a set of suboptimal paths. Then, we run the Smith-Waterman algorithm in this confined space. Our heuristic alignment for a updated sequence shows that it can be further accelerated by using the prior work, "reusable dynamic programming" (rDP). In this study, we validate "relative node tolerance bound" (RNTB) in the pruned searching space. Furthermore, we improve the performance by quantifying the successful RNTB tolerance probability and switching to rDP on perturbation-resilient columns only. In our searching space derived by a threshold value of 90% of the optimal alignment score, we find that 98.3% of contours contain correctly updated paths, while the contour consumes only 25.36% of the cost of sparse dynamic programming (sDP) method, which corresponds to only 2.55% of a normal dynamic programming runtime with the Smith-Waterman algorithm.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=6344ec1bb66c39f9d60ca9ad10234d71&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=6344ec1bb66c39f9d60ca9ad10234d71&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.30</guid>
		</item>
		<item>
			<title>PrePrint: Quantifying the Degree of Self-Nestedness of Trees: Application to the Structural Analysis of Plants</title>
			<link>http://www.pheedo.com/click.phdo?i=b26f39549853d4d1e7ae8ea7b8fdcf70</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.29</pheedo:origLink>
			<description>In this paper we are interested in the problem of approximating trees by trees with a particular self-nested structure. Self-nested trees are such that all their subtrees of a given height are isomorphic. We show that these trees present remarkable compression properties, with high compression rates. In order to measure how far a tree is from being a self-nested tree, we then study how to quantify the degree of self-nestedness of any tree. For this, we define a measure of the self-nestedness of a tree by constructing a self-nested tree that minimizes the distance of the original tree to the set of self-nested trees that embed the initial tree. We show that this measure can be computed in polynomial time and depict the corresponding algorithm. The distance to this nearest embedding self-nested tree (NEST) is then used to define compression coefficients that reflect the compressibility of a tree. To illustrate this approach, we then apply these notions to the analysis of plant branching structures. The approach is characterized on both a database of artificial plants with varying degrees of self-nestedness and on a real plant structure. We finally show that the NEST may reveal important aspects of the plant growth.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=b26f39549853d4d1e7ae8ea7b8fdcf70&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=b26f39549853d4d1e7ae8ea7b8fdcf70&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.29</guid>
		</item>
		<item>
			<title>PrePrint: TRIAL: A Tool for Finding Distant Structural Similarities</title>
			<link>http://www.pheedo.com/click.phdo?i=7b1ee0cd68ffc86bfda67f86fb5f210b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.28</pheedo:origLink>
			<description>Finding structural similarities in distant proteins can reveal functional relationships that can not be identified using sequence comparison. Given two proteins A and B and threshold &#x03B5; &#x00C5;, we develop an algorithm, TRiplet-based Iterative ALignment (TRIAL) for computing the transformation of B that maximizes the number of aligned residues such that the root mean square distance of the alignment is at most &#x03B5; &#x00C5;. Our algorithm is designed with the specific goal of effectively handling proteins with low similarity in primary structure, where existing algorithms perform particularly poorly. Experiments show that our method outperforms existing methods. TRIAL alignment brings the secondary structures of distant proteins to similar orientations. It also finds more number of secondary structure matches at lower RMSD (Root Mean Square Deviation) values and increased overall alignment lengths. Its classification accuracy is up to 63% better than other methods, including CE and DALI. TRIAL successfully aligns 83% of the residues from the smaller protein in reasonable time while other methods align only 29 to 65% of the residues for the same set of proteins.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=7b1ee0cd68ffc86bfda67f86fb5f210b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=7b1ee0cd68ffc86bfda67f86fb5f210b&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.28</guid>
		</item>
		<item>
			<title>PrePrint: New Methods for Inference of Local Tree Topologies with Recombinant SNP Sequences in Populations</title>
			<link>http://www.pheedo.com/click.phdo?i=b1e067d11072eed44a65630fe46ab730</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.27</pheedo:origLink>
			<description>Partly due to ecombination, genealogical history of a set of DNA sequences in a population usually can not be represented by a single tree. Instead, genealogy is better represented by a genealogical network, which is a compact representation of a set of correlated local genealogical trees, each for a short region of genome and possibly with different topology. Inference of genealogical history for a set of DNA sequences under recombination has many potential applications, including association mapping of complex diseases. In this paper, we present two new methods for reconstructing local tree topologies with the presence of recombination, which extend and improve the previous work. We first show that the "tree scan" method can be converted to a probabilistic inference method based a hidden Markov model. We then focus on developing a novel local tree inference method called RENT that is both accurate and scalable to larger data. Through simulation, we demonstrate the usefulness of our methods by showing that the hidden Markov model-based method is comparable with the original method in terms of accuracy. We also show that RENT is competitive with other methods in terms of inference accuracy, and its inference error rate is often lower and can handle large data.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=b1e067d11072eed44a65630fe46ab730&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=b1e067d11072eed44a65630fe46ab730&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=b1e067d11072eed44a65630fe46ab730&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.27</guid>
		</item>
		<item>
			<title>PrePrint: The Metropolized Partial Importance Sampling MCMC Mixes Slowly on Minimum Reversal Rearrangement Paths</title>
			<link>http://www.pheedo.com/click.phdo?i=907e3d881f65e5c9eae95575feacd359</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.26</pheedo:origLink>
			<description>Markov chain Monte Carlo has been the standard technique for inferring the posterior distribution of genome rearrangement scenarios under a Bayesian approach. We present here a negative result on the rate of convergence of the generally used Markov chains. We prove that the relaxation time of the Markov chains walking on the optimal reversal sorting scenarios might grow exponentially with the size of the signed permutations, namely, with the number of syntheny blocks.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=907e3d881f65e5c9eae95575feacd359&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=907e3d881f65e5c9eae95575feacd359&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=907e3d881f65e5c9eae95575feacd359&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.26</guid>
		</item>
		<item>
			<title>PrePrint: A Cluster Refinement Algorithm for Motif Discovery</title>
			<link>http://www.pheedo.com/click.phdo?i=28f5fb3bfec40228d8c5d246580f262a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.25</pheedo:origLink>
			<description>Finding Transcription Factor Binding Sites, i.e., motif discovery, is crucial for understanding the gene regulatory relationship. Motifs are weakly conserved and motif discovery is a NP-hard problem. We propose a new approach called Cluster Refinement Algorithm for Motif Discovery (CRMD). CRMD employs a flexible statistical motif model allowing a variable number of motifs and motif instances. CRMD first uses a novel entropy-based clustering to find complete and good starting candidate motifs from the DNA sequences. CRMD then uses an effective greedy refinement to search for optimal motifs from the candidate motifs. The refinement is fast, and it changes the number of motif instances based on the adaptive thresholds. The performance of CRMD is further enhanced if the problem has one occurrence of motif instance per sequence. Using an appropriate similarity test of motifs, CRMD is also able to find multiple motifs. CRMD has been tested extensively on synthetic and real datasets. The experimental results verify that CRMD usually outperforms four other state-of-the-art algorithms in terms of the qualities of the solutions with competitive computing time. It finds a good balance between finding true motif instances and screening false motif instances, and is robust on problems of various levels of difficulty.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=28f5fb3bfec40228d8c5d246580f262a&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=28f5fb3bfec40228d8c5d246580f262a&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=28f5fb3bfec40228d8c5d246580f262a&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.25</guid>
		</item>
		<item>
			<title>PrePrint: A Genetic Optimization Approach for Isolating Translational Efficiency Bias</title>
			<link>http://www.pheedo.com/click.phdo?i=b73832bb5cd897347535b40e25be3cf0</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.24</pheedo:origLink>
			<description>The study of codon usage bias is an important research area that contributes to our understanding of molecular evolution, phylogenetic relationships, respiratory lifestyle, and other characteristics. Translational efficiency bias is perhaps the most well studied codon usage bias, as it is frequently utilized to predict relative protein expression levels. We present a novel approach to isolating translational efficiency bias in microbial genomes. There are several existent methods for isolating translational efficiency bias. Previous approaches are susceptible to the confounding influences of other potentially dominant biases. Additionally, existing approaches to identifying translational efficiency bias generally require both genomic sequence information and prior knowledge of a set of highly expressed genes. This novel approach provides more accurate results from sequence information alone by resisting the confounding effects of other biases. We validate this increase in accuracy in isolating translational efficiency bias on ten microbial genomes, five of which have proven particularly difficult for existing approaches due to the presence of strong confounding biases.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=b73832bb5cd897347535b40e25be3cf0&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=b73832bb5cd897347535b40e25be3cf0&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=b73832bb5cd897347535b40e25be3cf0&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.24</guid>
		</item>
		<item>
			<title>PrePrint: Model Reduction of Multiscale Chemical Langevin Equations: A Numerical Case Study</title>
			<link>http://www.pheedo.com/click.phdo?i=5a4dfa856179f318983f3604d18ef9fc</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.23</pheedo:origLink>
			<description>Two very important characteristics of biological reaction networks need to be considered carefully when modeling these systems. First, models must account for the inherent probabilistic nature of systems far from the thermodynamic limit. Often, biological systems cannot be modeled with traditional continuous-deterministic models. Second, models must take into consideration the disparate spectrum of time scales observed in biological phenomena, such as slow transcription events and fast dimerization reactions. In the last decade, significant efforts have been expended on the development of stochastic chemical kinetics models to capture the dynamics of biomolecular systems, and on the development of robust multiscale algorithms, able to handle stiffness. In this article the focus is on the dynamics of reaction sets governed by stiff chemical Langevin equations, i.e. stiff stochastic differential equations. These are particularly challenging systems to model, requiring prohibitively small integration step sizes. We describe and illustrate the application of a semi-analytical reduction framework for chemical Langevin equations that results in significant gains in computational cost.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=5a4dfa856179f318983f3604d18ef9fc&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=5a4dfa856179f318983f3604d18ef9fc&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=5a4dfa856179f318983f3604d18ef9fc&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.23</guid>
		</item>
		<item>
			<title>PrePrint: Constructing Level-2 Phylogenetic Networks from Triplets</title>
			<link>http://www.pheedo.com/click.phdo?i=88b2316f05fb11741011f2baa5c30523</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.22</pheedo:origLink>
			<description>Jansson and Sung showed that, given a dense set of input triplets T (representing hypotheses about the local evolutionary relationships of triplets of taxa), it is possible to determine in polynomial time whether there exists a level-1 network consistent with T, and if so to construct such a network (Inferring a Level-1 Phylogenetic Network from a Dense Set of Rooted Triplets, Theoretical Computer Science, 363, pp. 60-68 (2006)). Here we extend this work by showing that this problem is even polynomial-time solvable for the construction of level-2 networks. This shows that, assuming density, it is tractable to construct plausible evolutionary histories from input triplets even when such histories are heavily non-tree like. This further strengthens the case for the use of triplet-based methods in the construction of phylogenetic networks. We also implemented the algorithm and applied it to yeast data.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=88b2316f05fb11741011f2baa5c30523&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=88b2316f05fb11741011f2baa5c30523&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=88b2316f05fb11741011f2baa5c30523&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.22</guid>
		</item>
		<item>
			<title>PrePrint: A Sparse Learning Machine for High-Dimensional Data with Application to Microarray Gene Analysis</title>
			<link>http://www.pheedo.com/click.phdo?i=c6b0d2ceafd37073b8639dc3ff6573ab</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.8</pheedo:origLink>
			<description>Extracting features from high-dimensional data is a critically important task for pattern recognition and machine learning applications. High-dimensional data typically have much more variables than observations, and contain significant noise, missing components, or outliers. Features extracted from high-dimensional data need to be discriminative, sparse, and can capture essential characteristics of the data. In this paper, we present a way to constructing multivariate features and then classify the data into proper classes. The resulting small subset of features is nearly the best in the sense of Greenshtein's persistence, however, the estimated feature weights may be biased. We take a systematic step to correct the biases. We use conjugate gradient based primal-dual interior-point techniques for large-scale problems. We apply our procedure to microarray gene analysis. The effectiveness of our method is confirmed by experimental results.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=c6b0d2ceafd37073b8639dc3ff6573ab&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=c6b0d2ceafd37073b8639dc3ff6573ab&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=c6b0d2ceafd37073b8639dc3ff6573ab&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.8</guid>
		</item>
		<item>
			<title>PrePrint: Data Mining on DNA Sequences of Hepatitis B Virus</title>
			<link>http://www.pheedo.com/click.phdo?i=6788101bd48a123c676799880154416c</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.6</pheedo:origLink>
			<description>In this study, a data mining framework which includes molecular evolution analysis, clustering, feature selection, classifier learning and classification, is introduced. Our research group has collected HBV DNA sequences, either genotype B or C, from over 200 patients specifically for this project. In the molecular evolution analysis and clustering, three subgroups have been identified in genotype C and a clustering method has been developed to separate the subgroups. In the feature selection process, potential markers are selected based on Information Gain for further classifier learning. Then meaningful rules are learnt by our algorithm called the Rule Learning which is based on Evolutionary Algorithm. Also, a new classification method by Nonlinear Integral has been developed. Good performance of this method comes from the use of the fuzzy measure and the relevant nonlinear integral. The nonadditivity of the fuzzy measure reflects the importance of the feature attributes as well as their interactions. These two classifiers give explicit information on the importance of the individual mutated sites and their interactions towards the classification (potential causes to liver cancer in our case). A thorough comparison study of these two methods with existing methods is detailed.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=6788101bd48a123c676799880154416c&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=6788101bd48a123c676799880154416c&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=6788101bd48a123c676799880154416c&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.6</guid>
		</item>
		<item>
			<title>PrePrint: An Extended Kalman Filtering Approach to Modelling Nonlinear Dynamic Gene Regulatory Networks via Short Gene Expression Time Series</title>
			<link>http://www.pheedo.com/click.phdo?i=ee9f53c24c9121050225fe505bcef69a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.5</pheedo:origLink>
			<description>In this paper, the extended Kalman filter (EKF) algorithm is applied to model the gene regulatory network from gene time series data. The gene regulatory network is considered as a nonlinear dynamic stochastic model that consists of the gene measurement equation and the gene regulation equation. After specifying the model structure, we apply the EKF algorithm for identifying both the model parameters and the actual value of gene expression levels. It is shown that the EKF algorithm is an online estimation algorithm that can identify large number of parameters (including parameters of nonlinear functions) through iterative procedure by using a small number of observations. Four real-world gene expression data sets are employed to demonstrate the effectiveness of the EKF algorithm, and the obtained models are evaluated from the viewpoint of bioinformatics.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=ee9f53c24c9121050225fe505bcef69a&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=ee9f53c24c9121050225fe505bcef69a&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=ee9f53c24c9121050225fe505bcef69a&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.5</guid>
		</item>
		<item>
			<title>PrePrint: On Subset Seeds for Protein Alignment</title>
			<link>http://www.pheedo.com/click.phdo?i=e2da87df6f100eb306d2f49568784a96</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.4</pheedo:origLink>
			<description>We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard BLASTP seeding method [2], [3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in BLASTP and vector seeds, our seeds show a similar or even better performance than BLASTP on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds vs. BLASTP.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=e2da87df6f100eb306d2f49568784a96&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=e2da87df6f100eb306d2f49568784a96&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=e2da87df6f100eb306d2f49568784a96&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.4</guid>
		</item>
		<item>
			<title>PrePrint: An Approximation Algorithm for the Minimum Breakpoint Linearization Problem</title>
			<link>http://www.pheedo.com/click.phdo?i=42f03ddf148020f032ea368133e4e0ad</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.3</pheedo:origLink>
			<description>In the recent years there has been a growing interest in inferring the total order of genes or markers on a chromosome, since current genetic mapping efforts might only suffice to produce a partial order. Many interesting optimization problems were thus formulated in the framework of genome rearrangement. As an important one among them, the minimum breakpoint linearization (MBL) problem is to find the total order of a partially-ordered genome that minimizes its breakpoint distance to a reference genome whose genes are already totally ordered. It was previously shown to be NP-hard, and the algorithms proposed so far are all heuristic. In this paper, we present an $\frac{m^2+m}{2}$-approximation algorithm for the MBL problem, where $m$ is the number of gene maps that are combined together to form a partial order of the genome under investigation.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=42f03ddf148020f032ea368133e4e0ad&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=42f03ddf148020f032ea368133e4e0ad&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=42f03ddf148020f032ea368133e4e0ad&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.3</guid>
		</item>
		<item>
			<title>PrePrint: A Metric on the Space of Reduced Phylogenetic Networks</title>
			<link>http://www.pheedo.com/click.phdo?i=3922845ce88b844a3b3074ca31802866</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.2</pheedo:origLink>
			<description>Phylogenetic networks are leaf-labeled, rooted, acyclic, directed graphs, that model reticulate evolutionary histories. Several measures for quantifying the topological dissimilarity between two phylogenetic networks have been devised for various classes of phylogenetic networks. A biologically-motivated class of phylogenetic networks, namely reduced phylogenetic networks, was recently introduced. None of the existing measures is a metric on the space of reduced phylogenetic networks. In this paper, we provide a polynomiallycomputable&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=3922845ce88b844a3b3074ca31802866&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=3922845ce88b844a3b3074ca31802866&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=3922845ce88b844a3b3074ca31802866&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.2</guid>
		</item>
		<item>
			<title>PrePrint: Information-Theoretic Model of Evolution over Protein Communication Channel</title>
			<link>http://www.pheedo.com/click.phdo?i=7ae7a12d8f6921eee5c5211cdf3c406a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.1</pheedo:origLink>
			<description>In this paper, we propose a communication model of evolution and investigate its information-theoretic bounds. The process of evolution is modeled as the retransmission of information over a protein communication channel, where the transmitted message is the organism&#x2019;s proteome encoded in the DNA. We compute the capacity and the rate-distortion functions of the protein communication system for the three domains of life: Archaea, Bacteria and Eukaryotes. The tradeoff between the transmission rate and the distortion in noisy protein communication channels is analyzed. As expected, comparison between the optimal transmission rate and the channel capacity indicates that the biological fidelity does not reach the Shannon optimal distortion. However, the relationship between the channel capacity and rate distortion achieved for different biological domains provides tremendous insight into the dynamics of the evolutionary processes of the three domains of life. We rely on these results to provide a model of genome sequence evolution based on the two major evolutionary driving forces: mutations and unequal crossovers.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=7ae7a12d8f6921eee5c5211cdf3c406a&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=7ae7a12d8f6921eee5c5211cdf3c406a&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=7ae7a12d8f6921eee5c5211cdf3c406a&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.1</guid>
		</item>
		<item>
			<title>PrePrint: Bayesian Models and Algorithms for Protein beta-Sheet Prediction</title>
			<link>http://www.pheedo.com/click.phdo?i=905dda8a2d6878dbad2f75616d70a971</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.140</pheedo:origLink>
			<description>Prediction of the three-dimensional structure greatly benefits from the information related to secondary structure, solvent accessibility, and non-local contacts that stabilize a protein's structure. Prediction of such components is vital to our understanding of the structure and function of a protein. In this paper, we address the problem of beta-sheet prediction. We introduce a Bayesian approach for proteins with six or less beta-strands, in which we model the conformational features in a probabilistic framework. To select the optimum architecture, we analyze the space of possible conformations by efficient heuristics. Furthermore, we employ an algorithm that finds the optimum pairwise alignment between beta-strands using dynamic programming. Allowing any number of gaps in an alignment enables us to model beta-bulges more effectively. Though our main focus is proteins with six or less beta-strands, we are also able to perform predictions for proteins with more than six beta-strands by combining the predictions of BetaPro with the gapped alignment algorithm. We evaluated the accuracy of our method and BetaPro. We performed a 10-fold cross validation experiment on the BetaSheet916 set and we obtained significant improvements in the prediction accuracy.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=905dda8a2d6878dbad2f75616d70a971&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=905dda8a2d6878dbad2f75616d70a971&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=905dda8a2d6878dbad2f75616d70a971&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.140</guid>
		</item>
		<item>
			<title>PrePrint: RDCurve: A Nonparametric Method to Evaluate the Stability of Ranking Procedures</title>
			<link>http://www.pheedo.com/click.phdo?i=de5c9f4165385c776946785b1716438d</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.138</pheedo:origLink>
			<description>Great concerns have been raised about the reproducibility of gene signatures based on high-throughput techniques such as microarray. Studies analyzing similar samples often report poorly overlapping results, and the p-value usually lacks biological context. We propose a non-parametric Re-Discovery-Curve (RDCurve) method, to estimate the frequency of rediscovery of gene signature identified. Given a ranking procedure and a dataset with replicated measurements, the RDCurve bootstraps the dataset and repeatedly applies the ranking procedure, selects a subset of k important genes, and estimates the probability of rediscovery of the selected subset of genes. We also propose a permutation scheme to estimate the confidence band under the Null hypothesis for the significance of the RDCurve. The method is non-parametric and model independent. With the RDCurve we can assess the signal-noise ratio of the data, compare the performance of ranking procedures in term of their expected rediscovery rates, and choose the number of genes to be reported.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=de5c9f4165385c776946785b1716438d&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=de5c9f4165385c776946785b1716438d&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=de5c9f4165385c776946785b1716438d&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.138</guid>
		</item>
		<item>
			<title>PrePrint: Evolutionary Optimization of Kernel Weights Improves Protein Complex Comembership Prediction</title>
			<link>http://www.pheedo.com/click.phdo?i=4856814fde6946028bcee623f7192012</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.137</pheedo:origLink>
			<description>In recent years, more and more high-throughput data sources useful for protein complex prediction have become available (e.g. gene sequence, mRNA expression, interactions). The integration of these different data sources can be challenging. Recently, it has been recognized that kernel based classifiers are well suited for this task. However, the different kernels (data sources) are often combined using equal weights. Although several methods have been developed to optimize kernel weights, no large scale example of an improvement in classifier performance has been shown yet. In this work, we employ an evolutionary algorithm to determine weights for a larger set of kernels by optimizing a criterion based on the area under the ROC curve. We show that setting the right kernel weights can indeed improve performance. We compare this to existing kernel weight optimization methods (i.e. (regularized) optimization of the SVM-criterion or aligning the kernel with an ideal kernel) and find that these do not result in a significant performance improvement and can even cause a decrease in performance. Results also show that an expert approach of assigning high weights to features with high individual performance is not necessarily the best strategy.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=4856814fde6946028bcee623f7192012&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=4856814fde6946028bcee623f7192012&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=4856814fde6946028bcee623f7192012&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.137</guid>
		</item>
		<item>
			<title>PrePrint: Semi-Markov Models for Brownian Dynamics Simulation Algorithms in Biological Ion Channels</title>
			<link>http://www.pheedo.com/click.phdo?i=654d1d08b1f1737ccc40f4dcfaaf9c68</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.136</pheedo:origLink>
			<description>Constructing accurate computational models that explain how ions permeate through a biological ion channel is an important problem in biophysics and drug design. Brownian dynamics simulations are large scale interacting particle computer simulations for modeling ion channel permeation but can be computationally prohibitive. In this paper we show the somewhat surprising result that a small dimensional semi-Markov model can generate events (such as conduction events, dwell times at binding sites in the protein) that are statistically indistinguishable from Brownian dynamics computer simulation. This approach enables the use of extrapolation techniques to predict channel conduction when performing the actual Brownian dynamics simulation is computationally intractable. Numerical studies on the simulation of gramicidin A ion channels are presented.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=654d1d08b1f1737ccc40f4dcfaaf9c68&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=654d1d08b1f1737ccc40f4dcfaaf9c68&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=654d1d08b1f1737ccc40f4dcfaaf9c68&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.136</guid>
		</item>
		<item>
			<title>PrePrint: A Unified Approach for Reconstructing Ancient Gene Clusters</title>
			<link>http://www.pheedo.com/click.phdo?i=1f938a95b91c762c21aabda33d5fe426</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.135</pheedo:origLink>
			<description>The order of genes in genomes provides extensive information. In comparative genomics, differences or similarities of gene orders are determined to predict functional relations of genes or phylogenetic relations of genomes. For this purpose, various combinatorial models can be used to identify gene clusters &#x2014; groups of genes that are co-located in a set of genomes. We introduce a unified approach to model gene clusters and define the problem of labeling the inner nodes of a given phylogenetic tree with sets of gene clusters. Our optimization criterion in this context combines two properties: parsimony, i.e. the number of gains and losses of gene clusters has to be minimal, and consistency, i.e. for each ancestral node, there must exist at least one potential gene order that contains all the reconstructed clusters. We present and evaluate an exact algorithm to solve this problem. Despite its exponential worst-case time complexity, our method is suitable even for large scale data. We show the effectiveness and efficiency on both simulated and real data.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=1f938a95b91c762c21aabda33d5fe426&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=1f938a95b91c762c21aabda33d5fe426&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=1f938a95b91c762c21aabda33d5fe426&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.135</guid>
		</item>
		<item>
			<title>PrePrint: Multidimensional Profiling of Cell Surface Proteins and Nuclear Markers</title>
			<link>http://www.pheedo.com/click.phdo?i=a589ccc8b62e91997b9c6ca96c656c1c</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.134</pheedo:origLink>
			<description>Cell membrane proteins play an important role in tissue architecture and cell-cell communication. We hypothesize that segmentation and multidimensional characterization of the distribution of cell membrane proteins, on a cell-by-cell basis, enable improved classification of treatment groups and identify important characteristics that can otherwise be hidden. We have developed a series of computational steps to (i) delineate cell membrane protein signals and associate them with specific nuclei; (ii) compute a coupled representation of the multiplexed DNA content with membrane proteins; (iii) rank computed features associated with such a multidimensional representation; (iv) visualize selected features for comparative evaluation; and (v) discriminate between treatment groups in an optimal fashion. The novelty of our method is in the segmentation of the membrane signal and the multidimensional representation of phenotypes on a cell-by-cell basis. To test the utility of the new method, the proposed computational steps were applied to images of cells that have been irradiated with different radiation qualities in the presence and absence of other small molecules. These samples are labeled for their DNA content and E-cadherin membrane proteins. We demonstrate that multidimensional representations of cell-by-cell phenotypes improve predictive and visualization capabilities among different treatment groups, and identify hidden variables.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=a589ccc8b62e91997b9c6ca96c656c1c&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=a589ccc8b62e91997b9c6ca96c656c1c&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=a589ccc8b62e91997b9c6ca96c656c1c&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.134</guid>
		</item>
		<item>
			<title>PrePrint: Quartets MaxCut: A Divide and Conquer Quartets Algorithm</title>
			<link>http://www.pheedo.com/click.phdo?i=8c477b8920dc5c73c15af1adbd26afaa</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.133</pheedo:origLink>
			<description>Supertree methods are used to construct a large tree over a large set of taxa, from a set of small trees over overlapping subsets of the complete taxa set. Since accurate reconstruction methods are currently limited to a maximum of few dozens of taxa, the use of a supertree method in order to construct the tree of life is inevitable. Perhaps the simplest version of this that is widely applicable, yet quite challenging, is quartet based reconstruction. This problem lies at the root of many tree reconstruction methods and theoretical as well as experimental results have been reported. Nevertheless, dealing with false, conflicting quartets remains problematic. In this paper, we describe an algorithm for constructing a tree from a set of input quartet trees even with a significant fraction of errors. Our algorithm is based on a divide and conquer algorithm where our divide step uses a semi- definite formulation of max cut.We remark that this builds on previous work of ours [28] for piecing together trees from rooted triples. The recursion for quartets, however, is more complicated in that even with completely consistent quartets the problem is NP-hard.This complexity leads to several issues and some solutions of possible independent interest.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=8c477b8920dc5c73c15af1adbd26afaa&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=8c477b8920dc5c73c15af1adbd26afaa&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=8c477b8920dc5c73c15af1adbd26afaa&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.133</guid>
		</item>
		<item>
			<title>PrePrint: On the Complexity of uSPR Distance</title>
			<link>http://www.pheedo.com/click.phdo?i=709b83de1ca185bfbc37e2a03d7b2fba</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.132</pheedo:origLink>
			<description>We show that subtree prune and regraft (uSPR) distance on unrooted trees is fixed parameter tractable with respect to the distance. We also make progress on a conjecture of Steel on the preservation of uSPR distance under chain reduction, improving on lower bounds of Hickey.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=709b83de1ca185bfbc37e2a03d7b2fba&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=709b83de1ca185bfbc37e2a03d7b2fba&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=709b83de1ca185bfbc37e2a03d7b2fba&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.132</guid>
		</item>
		<item>
			<title>PrePrint: Using Gaussian Process with Test Rejection to Detect T-Cell Epitopes in Pathogen Genomes</title>
			<link>http://www.pheedo.com/click.phdo?i=b64bf79ef063112e6afbfaeedf8a0a42</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.131</pheedo:origLink>
			<description>A major challenge in the development of peptide-based vaccines is finding the right immunogenic element, with efficient and long-lasting immunisation effects, from large potential targets encoded by pathogen genomes. Computer models are convenient tools for scanning pathogen genomes to pre-select candidate immunogenic peptides for experimental validation. Current methods predict many false positives resulting from a low prevalence of true positives. We develop a test reject method based on the prediction uncertainty estimates determined by Gaussian process regression. This method filters false positives amongst predicted epitopes from a pathogen genome. The performance of stand-alone Gaussian process regression is compared to other state-of-the-art methods using cross-validation on eleven benchmark data sets. The results show that the Gaussian process method has the same accuracy as the top performing algorithms. The combination of Gaussian process regression with the proposed test reject method is used to detect true epitopes from the Vaccinia virus genome. The test rejection increases the prediction accuracy by reducing the number of false positives without sacrificing the method's sensitivity. We show that the Gaussian process in combination with test rejection is an effective method for prediction of T-cell epitopes in large and diverse pathogen genomes, where false positives are of concern.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=b64bf79ef063112e6afbfaeedf8a0a42&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=b64bf79ef063112e6afbfaeedf8a0a42&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=b64bf79ef063112e6afbfaeedf8a0a42&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.131</guid>
		</item>
		<item>
			<title>PrePrint: CollHaps: A Heuristic Approach to Haplotype Inference by Parsimony</title>
			<link>http://www.pheedo.com/click.phdo?i=c5b6011b594aa6bd1b4917fef55cf28f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.130</pheedo:origLink>
			<description>Haplotype data play a relevant role in several genetic studies, e.g. mapping of complex disease genes, drug design and evolutionary studies on populations. However, the experimental determination of haplotypes is expensive and time-consuming. This motivates the increasing interest in techniques for inferring haplotype data from genotypes, which can instead be obtained quickly and economically. Several such techniques are based on the maximum parsimony principle, which has been justified by both experimental results and theoretical arguments. However, the problem of haplotype inference by parsimony was shown to be NP-hard, thus limiting the applicability of exact parsimony-based techniques to relatively small datasets. In this paper we introduce collapse rule, a generalization of the well-known Clark's rule, and describe a new heuristic algorithm for haplotype inference (implemented in a program called CollHaps), based on parsimony and the iterative application of collapse rules. The performance of CollHaps is tested on several datasets. The experiments show that CollHaps enables the user to process large datasets obtaining very "parsimonious" solutions in short processing times. They also show a correlation, especially for large datasets, between parsimony and correct reconstruction, supporting the validity of the parsimony principle to produce accurate solutions.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=c5b6011b594aa6bd1b4917fef55cf28f&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=c5b6011b594aa6bd1b4917fef55cf28f&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=c5b6011b594aa6bd1b4917fef55cf28f&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.130</guid>
		</item>
	</channel>
</rss>