<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet href="/css/rss20.xsl" type="text/xsl"?>
<rss version="2.0" xmlns:pheedo="http://www.pheedo.com/namespace/pheedo">
	<channel>
		<title>IEEE Transactions on Computers</title>
		<link>http://www.computer.org/tc</link>
		<description>The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers, brief contributions, and comments on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability;
g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.	</description>
		<language>en-us</language>
		<pubDate>Sat, 7 Nov 2009 11:00:04 GMT</pubDate>
		<image>
			<url>http://csdl.computer.org/common/images/logos/tc.gif</url>
			<title>IEEE Computer Society</title>
			<description>List of recently published journal articles</description>
			<link>http://www.computer.org/tc</link>
		</image>
		<item>
			<title>PrePrint: Minimizing Eavesdropping Risk by Transmission Power Control in Multihop Wireless Networks</title>
			<link>http://www.pheedo.com/click.phdo?i=8c57ffbe2d336cd19d65e9d9f1b0d125</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2007.1066</pheedo:origLink>
			<description>To defend against reconnaissance activity in adhoc wireless networks, we propose transmission power control as an effective mechanism for minimizing the eavesdropping risk. Our main contributions are as follows: First, we cast the w-th order eavesdropping risk as the maximum probability of packets being eavesdropped when there are w adversarial nodes in the network. Second, we derive the closed-form solution of the first order eavesdropping risk as a polynomial function of the normalized transmission radius. This derivation assumes a uniform distribution of user nodes. Then we generalize the model to allow arbitrary user nodes distribution and prove that the uniform user distribution minimizes the first order eavesdropping risk. This result plays an essential role in deriving analytical bounds for the eavesdropping risk given arbitrary user distributions. Our simulation results show that for a wide range of non-uniform traffic patterns, the difference of their eavesdropping risk values from the corresponding lower bounds is 3dB or less.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=8c57ffbe2d336cd19d65e9d9f1b0d125&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=8c57ffbe2d336cd19d65e9d9f1b0d125&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2007.1066</guid>
		</item>
		<item>
			<title>PrePrint: Minimizing the Maximum Firewall Rule Set in a Network with Multiple Firewalls</title>
			<link>http://www.pheedcontent.com/click.phdo?i=8e41ca338ac074ff103b496566adba46</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.172</pheedo:origLink>
			<description>A firewall's complexity is known to increase with the size of its rule set. Empirical studies show that, as the rule set grows larger, the number of configuration errors on a firewall increases sharply, while the performance of the firewall degrades. When designing a security-sensitive network, it is critical to construct the network topology and its routing structure carefully in order to reduce the firewall rule sets, which helps lower the chance of security loopholes and prevent performance bottleneck. This paper studies the problems of how to place the firewalls in a topology during network design and how to construct the routing tables during operation, such that the maximum firewall rule set can be minimized. These problems have not been studied adequately despite their importance. We have two major contributions. First, we prove that the problems are NP-complete. Second, we propose a heuristic solution and demonstrate the effectiveness of the algorithm by simulations. The results show that the proposed algorithm reduces the maximum firewall rule set by 2 to 5 times when comparing with other algorithms.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=8e41ca338ac074ff103b496566adba46&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=8e41ca338ac074ff103b496566adba46&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.172</guid>
		</item>
		<item>
			<title>PrePrint: Secure and Efficient Broadcast Authentication in Wireless Sensor Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=6837676e8f3b0ab6bdf0b134b0a11339</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.171</pheedo:origLink>
			<description>Authenticated broadcast, enabling a base station to send commands and requests to low-powered sensor nodes in an authentic manner, is one of the core challenges for securing wireless sensor networks. &amp;#x03BC;TESLA and its multi-level variants based on delayed exposure of one-way chains are well known valuable broadcast authentication schemes, but concerns still remain for their practical application. To use these schemes on resource-limited sensor nodes, a 64-bit key chain is desirable for efficiency, but care must be taken. We will first show, by both theoretical analysis and rigorous experiments on real sensor nodes, that if &amp;#x03BC;TESLA is implemented in a raw form with 64-bit key chains, some of the future keys can be discovered through time memory data tradeoff techniques. We will then present X-TESLA, as a new member of the TESLA family, to remedy the fact that previous schemes do not consider problems arising from sleep modes, network failures, idle sessions, as well as the time memory data tradeoff risk, and to reduce their high cost of countering DoS attacks. In X-TESLA, short key chains are used to continue indefinitely, and new interesting strategies and management methods are possible, significantly reducing unnecessary computation and buffer occupation, to be efficient solutions to the raised problems.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=6837676e8f3b0ab6bdf0b134b0a11339&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=6837676e8f3b0ab6bdf0b134b0a11339&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.171</guid>
		</item>
		<item>
			<title>PrePrint: Coverage and Detection of a Randomized Scheduling Algorithm in Wireless Sensor Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=37559d733198f7f5506e881e7006495b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.170</pheedo:origLink>
			<description>In wireless sensor networks, some sensor nodes are put in sleep mode while other sensor nodes are in active mode for sensing and communication tasks in order to reduce energy consumption and extend network lifetime. This approach is a special case ($k=2$) of a randomized scheduling algorithm, in which $k$ subsets of sensors work alternatively. In this paper, we first study the randomized scheduling algorithm via both analysis and simulations in terms of network coverage intensity, detection delay, and detection probability. We further study asymptotic coverage and other properties. Finally, we analyze a problem of maximizing network lifetime under Quality of Service constraints such as bounded detection delay, detection probability, and network coverage intensity. We prove that the optimal solution exists, and we provide conditions of the existence of the optimal solutions.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=37559d733198f7f5506e881e7006495b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=37559d733198f7f5506e881e7006495b&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.170</guid>
		</item>
		<item>
			<title>PrePrint: A Unified Architecture for the Accurate and High-Throughput Implementation of Six Key Elementary Functions</title>
			<link>http://www.pheedcontent.com/click.phdo?i=98dfde2c22ecbe466585ab07b410f754</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.169</pheedo:origLink>
			<description>This article presents a unified architecture for the compact implementation of several key elementary functions, including reciprocal, square root and logarithm, in single-precision floating-point arithmetic. The proposed high-throughput design is based on uniform domain segmentation and curve fitting techniques. Numerically-accurate least-squares regression is utilized to calculate the polynomial coefficients. The architecture is optimized by analyzing the trade-off between the size of the required memory and the precision of intermediate variables to achieve the minimum 23-bit accuracy required for single-precision floating-point representation. The generality of the proposed unified datapath is demonstrated on a common field-programmable gate array.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=98dfde2c22ecbe466585ab07b410f754&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=98dfde2c22ecbe466585ab07b410f754&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.169</guid>
		</item>
		<item>
			<title>PrePrint: Energy-Efficient Task Mapping for Data-Driven Sensor Network Macroprogramming</title>
			<link>http://www.pheedcontent.com/click.phdo?i=442c0b4028e54fd6ffaa218549aece0a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.168</pheedo:origLink>
			<description>Data-driven macroprogramming of wireless sensor networks (WSNs) provides an easy to use high-level task graph representation to the application developer. However, determining an energy-efficient initial placement of these tasks onto the nodes of the target network poses a set of interesting problems. We present a framework to model this task-mapping problem arising in WSN macroprogramming. Our model can capture placement constraints in tasks, as well as multiple possible routes in the target network. Using our framework, we provide mathematical formulations for the task-mapping problem for two different metrics &amp;#x2014; energy balance and total energy spent. For both metrics, we address scenarios where a) a single or b) multiple paths are possible between nodes. Due to the complex nature of the problems, these formulations are not linear. We provide linearization heuristics for the same, resulting in mixed-integer programming (MIP) formulations. We also provide efficient heuristics for the above. Our experiments show that our heuristics give the same results as the MIP for real-world sensor network macroprograms, and show a speedup of up to several orders of magnitude. We also provide worst-case performance bounds of the heuristics.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=442c0b4028e54fd6ffaa218549aece0a&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=442c0b4028e54fd6ffaa218549aece0a&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.168</guid>
		</item>
		<item>
			<title>PrePrint: Improved Design of High-Performance Parallel Decimal Multipliers</title>
			<link>http://www.pheedcontent.com/click.phdo?i=61ff78d526e8a2a6aaeba78a9a6600b6</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.167</pheedo:origLink>
			<description>The new generation of high-performance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multipliers. In this paper we describe the architectures of two parallel decimal multipliers. The parallel generation of partial products is performed using signed-digit radix-10 or radix-5 recodings of the multiplier and a simplified set of multiplicand multiples. The reduction of partial products is implemented in a tree structure based on a decimal multioperand carry-save addition algorithm that uses unconventional (non BCD) decimal coded number systems. We further detail these techniques and present the new improvements to reduce the latency of the previous designs which include: optimized digit recoders for the generation of 2^n-tuples (and 5-tuples), decimal CSAs (carrysave adders) combining different decimal coded operands and carry-free adders implemented by special designed bit counters. Moreover, we detail a design methodology that combines all these techniques to obtain efficient reduction trees with different area and delay tradeoffs for any number of partial products generated. Evaluation results for 16-digit operands show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix-8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=61ff78d526e8a2a6aaeba78a9a6600b6&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=61ff78d526e8a2a6aaeba78a9a6600b6&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.167</guid>
		</item>
		<item>
			<title>PrePrint: A Study of k-Coverage and Measures of Connectivity in Three-Dimensional Wireless Sensor Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=6387a9df9b5eb0bf2e309107199652e7</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.166</pheedo:origLink>
			<description>In a wireless sensor network (WSN), connectivity enables the sensors to communicate with each other, while sensing coverage reflects the quality of surveillance. Although the majority of studies on coverage and connectivity in WSNs consider two-dimensional (2D) space, three-dimensional (3D) settings represent more accurately the network design for real-world applications. As an example, underwater sensor networks require design in 3D rather than 2D space. In this paper, we focus on the connectivity and k-coverage issues in 3D WSNs, where each location (or point) is covered by at least k sensors (the maximum value of is called the coverage degree).&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=6387a9df9b5eb0bf2e309107199652e7&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=6387a9df9b5eb0bf2e309107199652e7&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.166</guid>
		</item>
		<item>
			<title>PrePrint: Formal Reliability Analysis using Theorem Proving</title>
			<link>http://www.pheedcontent.com/click.phdo?i=a54fd3213ee53742e73af576e94c13ac</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.165</pheedo:origLink>
			<description>Reliability analysis has become a tool of fundamental importance to virtually all electrical and computer engineers because of the extensive usage of hardware systems in safety and mission critical domains, such as medicine, transportation and stock exchange markets. Due to the strong relationship between reliability theory and probabilistic notions, computer simulation techniques have been traditionally used to perform reliability analysis. However, simulation provides less accurate results and cannot handle large-scale systems due to its enormous CPU time requirements. To ensure accurate and complete reliability analysis and thus more reliable hardware designs, we propose to conduct a formal reliability analysis of systems within the sound core of a higher-order-logic theorem prover (HOL). In this paper, we present the higher-order-logic formalization of some core reliability theory concepts, which can be built upon to precisely analyze the reliability of various engineering systems. The proposed approach and formalization is then utilized to analyze the conditions under which a reconfigurable memory array is almost always repairable in the presence of stuck-at and coupling faults.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=a54fd3213ee53742e73af576e94c13ac&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=a54fd3213ee53742e73af576e94c13ac&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.165</guid>
		</item>
		<item>
			<title>IEEE Transactions on Computers - December 2009 (Vol. 58, No. 12)</title>
			<link>http://www.pheedo.com/click.phdo?i=b9b0cbeb664e1cc2fdb4b8c7691d9398</link>
			<pheedo:origLink>http://opac.ieeecomputersociety.org/opac?year=2009&amp;amp;volume=58&amp;amp;issue=12&amp;amp;acronym=tc</pheedo:origLink>
			<description>IEEE Transactions on Computers&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=b9b0cbeb664e1cc2fdb4b8c7691d9398&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=b9b0cbeb664e1cc2fdb4b8c7691d9398&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://www.computer.org/portal/site/tc/</guid>
		</item>
		<item>
			<title>PrePrint: Adaptive Power Management for Environmentally Powered Systems</title>
			<link>http://www.pheedcontent.com/click.phdo?i=3ed1ba0f9744f163f940cd3b6082ed8f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.158</pheedo:origLink>
			<description>Recently, there has been a substantial interest in the design of systems that receive their energy from regenerative sources such as solar cells. In contrast to approaches that minimize the power consumption subject to performance constraints, we are concerned with optimizing the performance of an application while respecting the limited and time-varying amount of available power. In this paper, we address power management of, e.g., wireless sensor nodes which receive their energy from solar cells. Based on a prediction of the future available energy, we adapt parameters of the application in order to maximize the utility in a longterm perspective. The paper presents a formal specification of the corresponding optimization problem including constraints concerning buffer sizes, timing and rates. Instead of solving the optimization problem online which may be prohibitively complex in terms of running time and energy consumption, we apply multiparametric programming to pre-compute the application parameters offline for different environmental conditions and system states. In order to guarantee sustainable operation, we propose a hierarchical software design which comprises a worst-case prediction of the incoming energy. As a further contribution, we suggest a new method for approximate multiparametric linear programming which substantially lowers the computational demand and memory requirement of the embedded software.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=3ed1ba0f9744f163f940cd3b6082ed8f&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=3ed1ba0f9744f163f940cd3b6082ed8f&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.158</guid>
		</item>
		<item>
			<title>PrePrint: The Design and Evaluation of a Self-Organizing Super-Peer Network</title>
			<link>http://www.pheedcontent.com/click.phdo?i=ad199d98c4f165b36c1df4d852110911</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.157</pheedo:origLink>
			<description>Super-peer architectures exploit the heterogeneity of nodes in a peer-to-peer (P2P) network by assigning additional responsibilities to higher-capacity nodes. In the design of a super-peer network for file sharing, several issues have to be addressed: how client peers are related to super-peers, how super-peers locate files, how the load is balanced among the super-peers, and how the system deals with node failures. In this paper we introduce a self-organizing super-peer network architecture (SOSPNet) that solves these issues in a fully decentralized manner. SOSPNet maintains a super-peer network topology that reflects the semantic similarity of peers sharing content interests. Super-peers maintain semantic caches of pointers to files which are requested by peers with similar interests. Client peers, on the other hand, dynamically select super-peers offering the best search performance. We show how this simple approach can be employed not only to optimize searching, but also to solve generally difficult problems encountered in P2P architectures such as load balancing and fault tolerance. We evaluate SOSPNet using a model of the semantic structure derived from 8-month traces of two large file-sharing communities.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=ad199d98c4f165b36c1df4d852110911&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=ad199d98c4f165b36c1df4d852110911&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.157</guid>
		</item>
		<item>
			<title>PrePrint: Impact of Peripheral-Processor Interference on WCET Analysis of Real-Time Embedded Systems</title>
			<link>http://www.pheedcontent.com/click.phdo?i=cb49329abe40fa28ae0fa99d7d0e5259</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.156</pheedo:origLink>
			<description>The integration phase of real-time COTS-based systems is challenging. When multiple tasks run concurrently, the interference at the bus level between cache fetching activities and I/O peripheral transactions is significant and causes unpredictable behaviors: experimentally, we show that tasks can have computation time variance up to 46% in a typical embedded system. In this work, we present a theoretical framework able to model the interaction between CPU and peripherals contending for shared main memory through the Front Side Bus (FSB). We first show how to compute worst case execution time (WCET) for a task given a trace of its cache activity and given an upper bound function that models peripheral activities. Then, we show how the analysis can be extended to a multitasking environment assuming a restricted-preemption model. Finally, we introduce the novel idea of "hardware server" as a means of controlling the unpredictable behavior of COTS peripheral components.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=cb49329abe40fa28ae0fa99d7d0e5259&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=cb49329abe40fa28ae0fa99d7d0e5259&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.156</guid>
		</item>
		<item>
			<title>PrePrint: Buffer Sizing for Rate-Optimal Single-Rate Dataflow Scheduling Revisited</title>
			<link>http://www.pheedcontent.com/click.phdo?i=790cce292aaec5244f843ceb248f96cb</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.155</pheedo:origLink>
			<description>Single-Rate Dataflow (SRDF) graphs, also known as Homogeneous Synchronous Dataflow (HSDF) graphs or Marked Graphs, are often used to model the implementation and do temporal analysis of concurrent DSP and multimedia applications. An important problem in implementing applications expressed as SRDF graphs is the computation of the minimal amount of buffering needed to implement a static periodic schedule that is optimal in terms of execution rate, or throughput. Ning and Gao propose in "A novel framework of register allocation for software pipelining, Proc. ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, 1993." a linear-programming-based polynomial algorithm to compute this minimal storage amount, claiming optimality. We show via a counter-example that the proposed algorithm is not optimal. We prove that the problem is in fact NP-complete. We give an exact solution, and experimentally evaluate the degree of inaccuracy of the algorithm of Ning and Gao.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=790cce292aaec5244f843ceb248f96cb&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=790cce292aaec5244f843ceb248f96cb&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.155</guid>
		</item>
		<item>
			<title>PrePrint: Upper Bounds for Dynamic Memory Allocation</title>
			<link>http://www.pheedcontent.com/click.phdo?i=3c3518bb7190618034dfab423f3071e6</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.154</pheedo:origLink>
			<description>In this paper, we study the upper bounds of memory storage for two different allocators. In the first case, we consider a general allocator that can allocate memory blocks anywhere in the available heap space. In the second case, a more economical allocator constrained by the address-ordered first-fit allocation policy is considered. We derive the upper bound of memory usage for all allocators and present a systematic approach to search for allocation/deallocation patterns that might lead to the largest fragmentation. These results are beneficial in embedded systems where memory usage must be reduced and predictable because of lack of swapping facility. They are also useful in other types of computing systems.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=3c3518bb7190618034dfab423f3071e6&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=3c3518bb7190618034dfab423f3071e6&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.154</guid>
		</item>
		<item>
			<title>PrePrint: Dynamic Multiway Segment Tree for IP Lookups and the Fast Pipelined Search Engine</title>
			<link>http://www.pheedcontent.com/click.phdo?i=c2a02f0197970c78795a57f52009580e</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.153</pheedo:origLink>
			<description>A dynamic multiway segment tree (DMST) is proposed for IP lookups in this paper. DMST is designed for dynamic routing tables that can dynamically insert and delete prefixes. DMST is implemented as a B-tree that has all distinct endpoints of ranges as its keys. The complexities of search, insertion, deletion, and memory requirement are the same as the existing multiway range tree (MRT) and prefix in B-tree (PIBT) for prefixes. In addition, a pipelined DMST search engine is proposed to further speed up the search operations. The proposed pipelined DMST search engine uses off-chip SRAMs instead of on-chip SRAMs because the capacity of the latter is too small to hold large routing tables and the cost of the latter is too expensive. By utilizing current FPGA and off-chip SRAM technologies, our proposed 5-stage pipelined search engine can achieve the worst-case throughputs of 33.3 and 41.7 million packets per second (Mpps) with 144-bit and 288-bit wide SRAM blocks, respectively. Furthermore, a straightforward extension of the pipelined search engine with multiple independent off-chip SRAMs can achieve the throughput of 200 Mpps which is equivalent to 102 Gbps for minimal Ethernet packets of size 64 bytes.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=c2a02f0197970c78795a57f52009580e&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=c2a02f0197970c78795a57f52009580e&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.153</guid>
		</item>
		<item>
			<title>PrePrint: Redundant-Digit Floating-Point Addition Scheme Based on a Stored Rounding Value</title>
			<link>http://www.pheedcontent.com/click.phdo?i=bde466f0c1685480fc77609689c40608</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.152</pheedo:origLink>
			<description>Due to the widespread use and inherent complexity of floating-point addition, much effort has been devoted to its speedup via algorithmic and circuit techniques. We propose a new redundant-digit representation for floating-point numbers that leads to computation speedup in two ways: (1) Reducing the per-operation latency when multiple floating-point additions are performed before result conversion to nonredundant format, and (2) Removing the addition associated with rounding. The second advantage is unique to our approach, which replaces the power- and area-intensive rounding addition by low-latency insertion of a rounding two-valued digit, or twit. Instead of conventional sign-magnitude representation, we use a sign-embedded encoding that leads to lower hardware redundancy and, thus, reduced power dissipation. While our intermediate redundant representations remain incompatible with the IEEE 754-2008 standard, many application-specific systems can benefit from our designs. Description of our radix-16 redundant representation and its addition algorithm is followed by the architecture of a floating-point adder based on this representation. Detailed circuit designs are provided for many of the adder&amp;#x2019;s critical subfunctions. Simulation and synthesis based on a 0.13um CMOS standard process show a latency reduction of 15% or better, and both area and power savings of around 58%, compared with the best previous designs.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=bde466f0c1685480fc77609689c40608&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=bde466f0c1685480fc77609689c40608&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.152</guid>
		</item>
		<item>
			<title>PrePrint: QoS-Guaranteed Reed-Solomon Coding with Block Interleaving for Real-Time Broadcast Services in 3G Wireless Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=83204e1fc0a50dc0a83aca15de14d210</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.151</pheedo:origLink>
			<description>Providing high-quality broadcast services for soft real-time applications over wireless networks such as CDMA2000, which have high bit error-rates, requires the control of errors that occur during data transmission. Reed-Solomon (RS) forward error correction (FEC) in the medium access control (MAC) layer performs this role in 3G broadcast services. We propose new analytic models for predicting the performance of RS coding and its execution time, which take into account the memory property of a fading channel, different channel conditions, and a variable level of block interleaving. We identify RS decoding as a significant cause of variability in execution time, taking the form of jitter, which depends on the channel conditions. We analyze the size of buffer required to absorb the jitter under different channel conditions. We then formulate a trade-off between the performance of RS coding and the delay that it causes in transmitting a fixed amount of data with different levels of block interleaving. Finally we show how to balance the quality with which content is presented against an acceptable buffering delay, which is very important to soft real-time applications, by using an adequate level of block interleaving. This study offers a guide for the provision of efficient broadcast services in real time with stochastically guaranteed quality.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=83204e1fc0a50dc0a83aca15de14d210&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=83204e1fc0a50dc0a83aca15de14d210&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.151</guid>
		</item>
		<item>
			<title>PrePrint: ALV: A New Data Redistribution Approach to RAID-5 Scaling</title>
			<link>http://www.pheedcontent.com/click.phdo?i=8256c481701ee51f191594bcea072132</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.150</pheedo:origLink>
			<description>When a RAID-5 Volume is scaled up with added disks, data has to be redistributed from pre-scaled old disks to all disks including the old and the new. Existing online scaling techniques suffer from long redistribution time as well as negative impact on application performance. By leveraging our insight into a reordering window, this paper presents a new data redistribution approach to RAID-5 scaling. The reordering window is a result of the natural space hole that grows in size as data being redistributed. The data inside the reordering window can migrate in any order without overwriting other, in-use data chunks. The new redistribution approach, called ALV, exploits three novel techniques. First, ALV changes the movement order of data chunks to access multiple successive chunks via a single I/O. Second, ALV updates mapping metadata lazily to minimize the number of metadata writes while ensuring data consistency. Third, ALV uses an on/off logical valve to adaptively adjust the redistribution rate depending on the application workload. We implemented ALV in Linux Kernel 2.6.18, and evaluated its performance by replaying three real-system traces: TPC-C, Cello-99, and SPC-web. The results demonstrated that ALV consistently outperformed the conventional approach by 53.31-73.91% in user response time and by 24.07-29.27% in redistribution time.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=8256c481701ee51f191594bcea072132&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=8256c481701ee51f191594bcea072132&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.150</guid>
		</item>
		<item>
			<title>PrePrint: An Analytic Framework for Detailed Resource Profiling in Large and Parallel Programs and its Application for Memory Use</title>
			<link>http://www.pheedcontent.com/click.phdo?i=f77a2bfea18fa3f30f9549e182969c0b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.149</pheedo:origLink>
			<description>Profiling is an essential and widely used technique. While efficient tools for the profiling of execution time are available, the choices for detailed profiling of memory use or other hardware resources are very limited. We were unable to find tools that provided sufficiently accurate insight into e.g. memory use without adding unacceptable overhead in memory use and execution time for the performance analysis of very large applications. In this paper we present a highly efficient probabilistic method for profiling that provides detailed resource usage information. Importantly, we provide an analytical framework which provides error estimates and allows to analyze and quantitatively optimize a wide variety profiling scenarios. We employed the probabilistic approach to implement a memory profiling tool that adds minimal overhead. The tool provides the memory use $M_\psi(t)$ for all call chains $\psi$ over the execution time. Experimental results confirm that execution time and memory overhead is less than 10% of the unprofiled, optimized execution. Importantly, the technique is sufficiently general to be applicable to profiling of other hardware resources as cache or TLB misses and across multiple processes, threads and processors.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=f77a2bfea18fa3f30f9549e182969c0b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=f77a2bfea18fa3f30f9549e182969c0b&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.149</guid>
		</item>
		<item>
			<title>PrePrint: Efficient Microarchitectural Vulnerabilities Prediction Using Boosted Regression Trees and Patient Rule Inductions</title>
			<link>http://www.pheedcontent.com/click.phdo?i=48ef3c44d7ad0ec8b5fb3a110cbb6b60</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.140</pheedo:origLink>
			<description>The shrinking processor feature size, lower threshold voltage and increasing clock frequency make modern processors highly vulnerable to transient faults. Architectural Vulnerability Factor (AVF) reflects the possibility that a transient fault eventually causes a visible error in the program output, and it indicates a system's susceptibility to transient faults. Therefore, the awareness of the AVF, especially at early design stage, is greatly helpful to achieve a trade-off between system performance and reliability. However, tracking the AVF during program execution is extremely costly, which makes accurate AVF prediction extraordinarily attractive to computer architects. In this paper, we propose to use Boosted Regression Trees (BRT), a nonparametric tree-based predictive modeling scheme, to identify the correlation across workloads, execution phases and processor configurations between a key processor structure's AVF and various performance metrics. The proposed method not only makes an accurate prediction but also quantitatively illustrates individual performance variable's importance to the AVF. Moreover, to reduce the prediction complexity, we also utilize a technique named Patient Rule Induction Method (PRIM) to extract some simple selecting rules on important metrics. Applying these rules during run time can fast identify execution intervals with a relatively high AVF.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=48ef3c44d7ad0ec8b5fb3a110cbb6b60&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=48ef3c44d7ad0ec8b5fb3a110cbb6b60&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.140</guid>
		</item>
		<item>
			<title>PrePrint: Microarchitectural On-line Testing for Failure Detection in Memory Order Buffers</title>
			<link>http://www.pheedcontent.com/click.phdo?i=fe7d939a4e18c679c6f866256cd41f8e</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.139</pheedo:origLink>
			<description>Technology scaling leads to burn-in phase out and higher post-silicon test complexity, which increases in-the-field error rate due to both latent defects and actual errors respectively. As a consequence, current reliability methods will likely be infeasible. Microarchitecture knowledge of application runtime behavior offers a possibility to have low-cost continuous online testing techniques detect hard errors in the field. Whereas data can be protected with redundancy (like parity or ECC), there is a lack of mechanisms for control logic. This paper proposes a microarchitectural approach for validating that the memory order buffer logic works correctly. Our design relies on a small cache-like structure that keeps track of the last store to each cached address. Each load is checked to have obtained the data from the youngest older producing store. We present three different implementations of this idea, offering different trade-offs for error coverage, performance overhead and design complexity.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=fe7d939a4e18c679c6f866256cd41f8e&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=fe7d939a4e18c679c6f866256cd41f8e&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.139</guid>
		</item>
		<item>
			<title>PrePrint: PERFECTORY: A Fault-Tolerant Directory Memory Architecture</title>
			<link>http://www.pheedcontent.com/click.phdo?i=acd62c54caa16d443563fa6213096663</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.138</pheedo:origLink>
			<description>The number of CPUs in chip multiprocessors is growing at the Moore's Law rate, due to continued technology advances. However, new technologies pose serious reliability challenges, such as more frequent occurrences of degraded or even non-operational devices, and they threaten the cost-effectiveness and dependability of future computing systems. This work studies how to protect the on-chip coherence directory from fault occurrences. In a chip multiprocessor, cache coherence mechanisms such as directory memory are critical for offering consistent data view to all CPUs. We propose a novel on-line fault detection and correction scheme to enhance yield and resilience to run-time errors at a small performance cost. The proposed scheme uses smart encoding and coherence protocol adaptation strategies to salvage faulty directory entries. We also develop an on-line error recovery scheme that protects the directory memory from soft errors. We call our fault-tolerant directory memory architecture PERFECTORY. Evaluation results show that PERFECTORY achieves very high fault resilience: Over 99% chip yield at 0.05% hard error ratio and 1,934 years MTTF at 1,000 FIT using a 100-processor cluster configuration. PERFECTORY limits performance degradation to less than 1% at 0.05% hard error ratio and requires significantly smaller area overheads than existing redundancy approaches.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=acd62c54caa16d443563fa6213096663&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=acd62c54caa16d443563fa6213096663&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.138</guid>
		</item>
		<item>
			<title>PrePrint: Proofs of Correctness and Properties of Integer Adder Circuits</title>
			<link>http://www.pheedcontent.com/click.phdo?i=b53647a7ef5bc05b7064bb28e720bd93</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.137</pheedo:origLink>
			<description>Adder circuits have been extensively studied. Their formal properties are well known, but the proofs are either incomplete or difficult to find. This short contribution intends to integrate all formal proofs related to adders in a single place and to add the details when necessary. The presentation is accessible to general VLSI designer. Another goal of this study is to put together relevant materials for the preparation of further formal studies in computer arithmetic. The presentation is made as concise as possible.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=b53647a7ef5bc05b7064bb28e720bd93&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=b53647a7ef5bc05b7064bb28e720bd93&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.137</guid>
		</item>
		<item>
			<title>PrePrint: Predictive Temperature-Aware DVFS</title>
			<link>http://www.pheedcontent.com/click.phdo?i=266032ffbe740fe37dcd95b65cf80387</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.136</pheedo:origLink>
			<description>In this paper, we propose predictive temperature-aware DVFS (Dynamic Voltage and Frequency Scaling) using the performance counters that are already embedded in commercial microprocessors. By using the performance counters and simple regression analysis, we can predict the localized temperature and efficiently scale the voltage/frequency. When localized thermal problems that were not detected by thermal sensors are found after layout (or fabrication), the thermal problems can be avoided by the proposed software solution without delaying time-to-market. The evaluation results show that in a Linux-based laptop with the Intel Core2 Duo processor, DVFS using the performance counters performs comparable to DVFS using the thermal sensor.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=266032ffbe740fe37dcd95b65cf80387&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=266032ffbe740fe37dcd95b65cf80387&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.136</guid>
		</item>
		<item>
			<title>PrePrint: Model-Driven System Capacity Planning Under Workload Burstiness</title>
			<link>http://www.pheedcontent.com/click.phdo?i=8136b0d0bceb05ef93eae096aa708b96</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.135</pheedo:origLink>
			<description>In this paper, we define and study a new class of capacity planning models called MAP queueing networks. MAP queueing networks provide the first analytical methodology that can describe and predict accurately the performance of complex systems operating under bursty workloads, such as multi-tier architectures or storage arrays. MAP queueing networks address this limitation by describing computer systems as closed networks of servers whose service times are Markovian Arrival Processes (MAPs), a class of Markov-modulated point processes that can model general distributions and burstiness. We propose a methodology to solve MAP queueing networks by two state space transformations, which we call Linear Reduction (LR) and Quadratic Reduction (QR). These transformations dramatically decrease the number of states in the underlying Markov chain of the queueing network model. From these reduced state spaces, we obtain two classes of bounds on arbitrary performance indexes, e.g., throughput, response time, utilizations. Numerical experiments show that LR an QR bounds achieve a mean accuracy error of 2%. We also illustrate the high effectiveness of the LR and QR bounds in the performance analysis of a real multi-tier architecture subject to TPC-W workloads that are characterized as bursty. These results promote MAP queueing networks as a new robust class of capacity planning models.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=8136b0d0bceb05ef93eae096aa708b96&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=8136b0d0bceb05ef93eae096aa708b96&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.135</guid>
		</item>
		<item>
			<title>PrePrint: Improving Flash Wear-Leveling by Proactively Moving Static Data</title>
			<link>http://www.pheedcontent.com/click.phdo?i=9bc1af53dc3dbced09a85ee9842addb6</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.134</pheedo:origLink>
			<description>Motivated by the strong demand for flash memory with enhanced reliability, this work attempts to achieve improved flash memory endurance without substantially increasing overhead and without excessively modifying popular implementation designs such as the Flash Translation Layer protocol (FTL), NAND Flash Translation Layer protocol (NFTL) and Block-Level flash translation layer protocol (BL). A wear-leveling mechanism for moving data that are not updated is proposed to distribute wear leveling actions over the entire physical address space, so that static or rarely updated data can be proactively moved and memory-space requirements can be minimized. The properties of the mechanism are then explored with various implementation considerations. A series of experiments based on a realistic trace demonstrates the significantly improved endurance of FTL, NFTL, and BL with limited system overhead.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=9bc1af53dc3dbced09a85ee9842addb6&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=9bc1af53dc3dbced09a85ee9842addb6&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.134</guid>
		</item>
		<item>
			<title>PrePrint: Network-on-Chip Hardware Accelerators for Biological Sequence Alignment</title>
			<link>http://www.pheedcontent.com/click.phdo?i=92912df3d127c02bb7880a40edb27df1</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.133</pheedo:origLink>
			<description>The most pervasive compute operation carried out in almost all bioinformatics applications is pairwise sequence homology detection. Due to exponentially growing sequence databases, computing this operation at a large-scale is becoming expensive. An effective approach to speed up this operation is to integrate a very high number of processing elements in a single chip so that the massive scales of fine-grain parallelism inherent in several bioinformatics applications can be exploited efficiently. Network-on-Chip (NoC) is a very efficient method to achieve such large scale integration. In this work, we propose to bridge the gap between data generation and processing in bioinformatics applications by designing NoC architectures for the sequence alignment operation. While accelerators using other hardware architectures such as FPGA, General Purpose Graphics Processing Unit (GPU) and the Cell Broadband Engine (CBE) have been previously designed for sequence alignment, the NoC paradigm enables integration of a much larger number of processing elements on a single chip and also offers a higher degree of flexibility in placing them along the die to suit the underlying algorithm. The results show that our NoC-based implementations can provide above 10^2-10^3-fold speedup over other hardware accelerators and above 10^4-fold speedup over traditional CPU architectures.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=92912df3d127c02bb7880a40edb27df1&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=92912df3d127c02bb7880a40edb27df1&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.133</guid>
		</item>
		<item>
			<title>PrePrint: Conversion Algorithms and Implementations for Koblitz Curve Cryptography</title>
			<link>http://www.pheedcontent.com/click.phdo?i=c754ceeac0731ee7f0ff7acd2568848c</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.132</pheedo:origLink>
			<description>In this paper, we discuss conversions between integers and $\tau$-adic expansions and we provide efficient algorithms and hardware architectures for these conversions. The results have significance in elliptic curve cryptography using Koblitz curves, a family of elliptic curves offering faster computation than general elliptic curves. However, in order to enable these faster computations, scalars need to be reduced and represented using a special base-$\tau$ expansion. Hence, efficient conversion algorithms and implementations are necessary. Existing conversion algorithms require several complicated operations, such as multi-precision multiplications and computations with large rationals, resulting in slow and large implementations in hardware and microcontrollers with limited instruction sets. Our algorithms are designed to utilize only simple operations, such as additions and shifts, which are easily implementable on practically all platforms. We demonstrate the practicability of the new algorithms by implementing them on Altera Stratix~II FPGAs. The implementations considerably improve both computation speed and required area compared to existing solutions.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=c754ceeac0731ee7f0ff7acd2568848c&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=c754ceeac0731ee7f0ff7acd2568848c&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.132</guid>
		</item>
		<item>
			<title>PrePrint: New Region-Based Algorithms for Deriving Bounded Petri nets</title>
			<link>http://www.pheedcontent.com/click.phdo?i=8ab09b4dce417f0225ac9f761c95758d</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.131</pheedo:origLink>
			<description>The theory of regions was introduced in the early nineties as a method to bridge state-based and event-based models. This paper tackles the problem of deriving a Petri net from a state-based model, using the theory of regions. Some of the restrictions required in the traditional approach are dropped in this paper, together with significant extensions that make the approach applicable in new scenarios. One of these scenarios is Process Mining, where accepting (discovering) additional behavior in the synthesized Petri net is sometimes valued. The algorithmic emphasis used in this paper contributes to the demystification of the theory of regions as been only a good theoretical exercise, opening the door for its application in the industrial domain.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=8ab09b4dce417f0225ac9f761c95758d&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=8ab09b4dce417f0225ac9f761c95758d&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.131</guid>
		</item>
		<item>
			<title>PrePrint: Design and Analysis of On-Chip Networks for Large Scale Cache Systems</title>
			<link>http://www.pheedcontent.com/click.phdo?i=d560154ed330624a10bcbe29c1edfac7</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.130</pheedo:origLink>
			<description>Switched networks have been adopted in on-chip communication for their scalability and efficient resource sharing. However, using a general network for a specific domain may result in unnecessary high cost and low performance when the interconnects are not optimized for the domain. Designing an optimal network for the specific domain is challenging because in-depth knowledge of interconnects and the application domain is required. Recently proposed Non-Uniform Cache Architectures (NUCAs) use wormhole-routed 2D mesh networks in L2 caches. We observe that in NUCAs network resources are underutilized with the considerable area cost (41% of cache) and the network delay is significantly large (63% of cache access time). Motivated by our observations, we investigate both router architecture and network topology for communication behaviors in large scale cache systems. We present Fast-LRU replacement, where cache replacement overlaps with data request delivery. Next, we propose a deadlock-free XYX routing algorithm in a mesh network and present a new halo network topology to reduce the required links. Finally, we introduce a single-cycle multicast router that needs small modification of the unicast router design. Simulation results show that our design improves the average IPC by 38% over the mesh design with Multicast Promotion replacement and uses 12% of the interconnection area of the mesh network.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=d560154ed330624a10bcbe29c1edfac7&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=d560154ed330624a10bcbe29c1edfac7&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.130</guid>
		</item>
		<item>
			<title>PrePrint: Heterogeneous Interconnects for Energy-Efficient Message Management in CMPs</title>
			<link>http://www.pheedcontent.com/click.phdo?i=fe15572aa62d831448f54dd6fb5d5c6a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.129</pheedo:origLink>
			<description>Previous studies have shown that the interconnection network of a Chip-Multiprocessor (CMP) has significant impact on both overall performance and energy consumption. Moreover, wires used in such interconnect can be designed with varying latency, bandwidth and power characteristics. In this work, we show how messages can be efficiently managed, from the point of view of both performance and energy, in tiled CMPs using a heterogeneous interconnect. Our proposal consists of two approaches. The first is "Reply Partitioning", a technique that splits replies with data into a short "Partial Reply" message that carries a sub-block of the cache line that includes the word requested by the processor plus an "Ordinary Reply" with the full cache line. This technique allows all coherence messages to be classified into two groups: critical and short, and non-critical and long. The second approach is the use of a heterogeneous interconnection network comprised of low-latency wires for critical messages and low-energy wires for non-critical ones. Detailed simulations of 8- and 16-core CMPs, show that our proposal obtains average savings of 7% in execution time and 30% in the Energy-Delay squared Product metric for the full CMP.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=fe15572aa62d831448f54dd6fb5d5c6a&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=fe15572aa62d831448f54dd6fb5d5c6a&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.129</guid>
		</item>
		<item>
			<title>PrePrint: A Unified Prediction Method for Predicting Program Behavior</title>
			<link>http://www.pheedcontent.com/click.phdo?i=56ff0eae9da66c2af2f2cead431c4f06</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.122</pheedo:origLink>
			<description>Dynamic management of computer resources is essential for adaptive computing. Adaptive computing systems rely on accurate and robust metric predictors to exploit runtime behavior of programs. In this study, we propose the Unified Prediction Method (UPM) that is system- and metric-independent for predicting computer metrics. Unlike ad-hoc predictors, UPM uses a parametric model and is entirely statistical and data-driven. The parameters of the model are estimated by minimizing an objective function. Choice of the objective function and the model type determines the form of the solution whether it is closed form or numerically determined through optimization. In this study two specific realizations of UPM are presented. The first realization uses mean squared error (MSE) objective function and the second realization uses accumulated squared error (ASE) objective function, in conjunction with autoregressive models. The former objective function leads to Linear Prediction and the latter leads to Predictive Least Square (PLS) prediction. The model parameters for these predictors can be estimated analytically. The prediction is optimal with respect to the chosen objective function. An extensive and rigorous series of prediction experiments for the instruction per cycle and L1 cache miss rate metrics demonstrate superior performance for the proposed predictors over the last value predictor and table based predictor on SPECCPU 2000 benchmarks.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=56ff0eae9da66c2af2f2cead431c4f06&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=56ff0eae9da66c2af2f2cead431c4f06&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.122</guid>
		</item>
		<item>
			<title>PrePrint: An Optimal Encoding to Represent a Single Set in an ROBDD</title>
			<link>http://www.pheedcontent.com/click.phdo?i=56bd3483bedb0d709caa48fed3254fd0</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.121</pheedo:origLink>
			<description>The contribution is an optimal encoding for a set in a Reduced Ordered Binary Decision Diagram (ROBDD) when the number of elements in the set's domain is not a power of two. The contribution includes a proof that the proposed encoding produces an ROBDD with the minimum number of nodes and discusses related open problems that are not solved by the proposed encoding.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=56bd3483bedb0d709caa48fed3254fd0&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=56bd3483bedb0d709caa48fed3254fd0&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.121</guid>
		</item>
		<item>
			<title>PrePrint: Distributed Recovery from Network Partitioning in Movable Sensor/Actor Networks via Controlled Mobility</title>
			<link>http://www.pheedcontent.com/click.phdo?i=1cdbfb44f81b479abadb08da3492505f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.120</pheedo:origLink>
			<description>Mobility has been introduced to sensor networks through the deployment of movable nodes. In movable wireless networks, network connectivity among the nodes is a crucial factor in order to relay data to the sink node, exchange data for collaboration and perform data aggregation. However, such connectivity can be lost due to a failure of one or more nodes. Even a single node failure may partition the network and thus eventually reduce the quality and efficiency of the network operation. To handle this connectivity problem, we present PADRA to detect possible partitions and then restore the network connectivity through controlled relocation of movable nodes. The idea is to identify whether or not the failure of a node will cause partitioning in advance in a distributed manner. If a partitioning is to occur, PADRA designates a failure handler to initiate the connectivity restoration process. The overall goal in this process is to localize the scope of the recovery and minimize the overhead imposed on the nodes. We further extend PADRA to handle multiple node failures. The approach, namely, MDAPRA strives to provide a mutual exclusion mechanism in repositioning the nodes to restore connectivity. The effectiveness of the proposed approaches are validated through simulation experiments.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=1cdbfb44f81b479abadb08da3492505f&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=1cdbfb44f81b479abadb08da3492505f&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.120</guid>
		</item>
		<item>
			<title>PrePrint: Scheduling Concurrent Bag-of-Tasks Applications on Heterogeneous Platforms</title>
			<link>http://www.pheedcontent.com/click.phdo?i=87b99e5cc19ecabd4e4b53c25d02cdac</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.117</pheedo:origLink>
			<description>Scheduling problems are already difficult on traditional parallel machines, and they become extremely challenging on heterogeneous clusters. In this paper we deal with the problem of scheduling multiple applications, made of collections of independent and identical tasks, on a heterogeneous master-worker platform. The applications are submitted online, which means that there is no a priori (static) knowledge of the workload distribution at the beginning of the execution. The objective is to minimize the maximum stretch, i.e., the maximum ratio between the actual time an application has spent in the system and the time this application would have spent if executed alone. On the theoretical side, we design an optimal algorithm for the offline version of the problem (when all release dates and application characteristics are known beforehand). We also introduce a heuristic for the general case of online applications. On the practical side, we have conducted extensive simulations and MPI experiments, showing that we are able to deal with very large problem instances in a few seconds. Also, the solution that we compute totally outperforms classical heuristics from the literature, thereby fully assessing the usefulness of our approach.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=87b99e5cc19ecabd4e4b53c25d02cdac&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=87b99e5cc19ecabd4e4b53c25d02cdac&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.117</guid>
		</item>
		<item>
			<title>PrePrint: QoS Control for Pipelines of Tasks Using Multiple Resources</title>
			<link>http://www.pheedcontent.com/click.phdo?i=bba959f28750dfd30094f1f9e316fd21</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.116</pheedo:origLink>
			<description>We consider soft real-time applications organised as pipelines of tasks using resources of different type (communication, computation, storage). The applications are assumed to be periodically triggered and the different tasks communicate by unidirectional buffers. The problem we cope with is how to effectively share the resources so that some specified Quality of Service (QoS) requirements are met. The QoS considered here is tightly related to the end-to-end temporal behaviour of the application. To compensate for time-varying resource requirements, we advocate a distributed control approach whereby the scheduling parameters of each task are tuned depending on the temporal behaviour of the application measured by appropriate sensors. The use of real-time scheduling strategies enables a mathematically safe control design, in which the QoS requirements are translated into formal control goals, and formal proofs are provided on the ability of the controller to fulfil these goals. We also offer extensive simulations that validate the approach for multimedia applications.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=bba959f28750dfd30094f1f9e316fd21&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=bba959f28750dfd30094f1f9e316fd21&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.116</guid>
		</item>
		<item>
			<title>PrePrint: A Novel Weighted-Graph-Based Grouping Algorithm for Metadata Prefetching</title>
			<link>http://www.pheedcontent.com/click.phdo?i=2ac2a570891d6e331da91b87eedd9e6f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.115</pheedo:origLink>
			<description>Although data prefetching algorithms have been extensively studied for years, there is no counterpart research done for metadata access performance. Existing data prefetching algorithms, either lack of emphasis on group prefetching, or bearing a high level of computational complexity, do not work well with metadata prefetching cases. Therefore, an efficient, accurate and distributed metadata-oriented prefetching scheme is critical to leverage the overall performance in large distributed storage systems. In this paper, we present a novel weighted-graph-based prefetching technique, built on both direct and indirect successor relationship, to reap performance benefit from prefetching specifically for clustered metadata servers, an arrangement envisioned necessary for petabyte scale distributed storage systems. Extensive trace-driven simulations show that by adopting our new metadata prefetching algorithm, the miss rate for metadata accesses on the client site can be effectively reduced, while the average response time of metadata operations can be dramatically cut by up to 67%, compared with legacy LRU caching algorithm and existing state of the art prefetching algorithms.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=2ac2a570891d6e331da91b87eedd9e6f&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=2ac2a570891d6e331da91b87eedd9e6f&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.115</guid>
		</item>
		<item>
			<title>PrePrint: Comment on "Performability Analysis: A New Algorithm"</title>
			<link>http://www.pheedcontent.com/click.phdo?i=229c183c79bef627d118197014515bdb</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.114</pheedo:origLink>
			<description>Paper [1] describes an algorithm for computing the complementary distribution of the accumulated reward over an interval of time in a homogeneous Markov process with time-independent reward rates associated with states. The algorithm is stated to have, in the general case, a storage complexity O(mNM), where m+1 is the number of different reward rates, M is the number of states, and N is a nonnegative truncation parameter, and in a particular but frequent case , a storage complexity O([(m-1)C+N]M), where C is a nonnegative truncation parameter never larger and often much smaller than N. In this comment, we show that, in that particular case, small modifications of the algorithm when C&gt;0 lead to a storage complexity O(mCM) for C&amp;#x003E;1 and O(M) for C=1. We also show that in a second particular case, in which the truncation parameters are N and C', 0&amp;#8806;C'&amp;#8806;N, small modifications of the algorithm when C'&amp;#x003E;0 lead to a storage complexity O(mC'M) for C'&amp;#x003E;1 and O(M) for C'=1. Finally, we note that, in the first particular case, the storage complexity of the algorithm for C=0 is O(M) and, in the second particular case, the storage complexity for C'=0 is O(M).&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=229c183c79bef627d118197014515bdb&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=229c183c79bef627d118197014515bdb&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.114</guid>
		</item>
		<item>
			<title>PrePrint: Equivalence, Dominance and Similarity Relations Between Fault Pairs and a Fault Pair Collapsing Process for Fault Diagnosis</title>
			<link>http://www.pheedcontent.com/click.phdo?i=78f1dcd9fd7153560bf0218837150edd</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.112</pheedo:origLink>
			<description>Equivalence and dominance relations used earlier in fault diagnosis procedures are defined as relations between faults, similar to the relations used for fault collapsing. Since the basic entity of diagnostic fault simulation and test generation is a fault pair, and not a single fault, we introduce a framework where equivalence and dominance relations are defined for fault pairs. Using equivalence and dominance relations between fault pairs, we define a fault pair collapsing process, where fault pairs are removed from consideration under diagnostic fault simulation and test generation since they are guaranteed to be distinguished when other fault pairs are distinguished. Another concept, which was used earlier to enhance fault collapsing, is the level of similarity between faults. We extend this definition into a level of similarity between fault pairs and discuss its use for fault pair collapsing. The level of similarity encompasses equivalence and dominance relations between fault pairs, and extends them to allow additional fault pair collapsing.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=78f1dcd9fd7153560bf0218837150edd&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=78f1dcd9fd7153560bf0218837150edd&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.112</guid>
		</item>
		<item>
			<title>PrePrint: Communication-Aware Load Balancing for Parallel Applications on Clusters</title>
			<link>http://www.pheedcontent.com/click.phdo?i=747fe94b3738cf7746434d2bf4eedee9</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.108</pheedo:origLink>
			<description>Conventional load balancers have proven effective in increasing the utilization of CPU, memory, and disk I/O resources in a cluster. However, most of the existing load-balancing schemes ignore network resources, leaving an opportunity to improve the effective bandwidth of networks on clusters running parallel applications. For this reason, we propose a communication-aware load balancing technique that is capable of improving the performance of communication-intensive applications by increasing the effective utilization of networks in cluster environments. To facilitate the proposed load-balancing scheme, we introduce a behavior model for parallel applications with large requirements of network, CPU, memory, and disk I/O resources. Our load-balancing scheme can make full use of this model to quickly and accurately determine the load induced by a variety of parallel applications. Simulation results generated from a diverse set of both synthetic bulk synchronous and real parallel applications on a cluster show that our scheme significantly improves the performance, in terms of slowdown and turn-around time, over existing schemes by up to 206% (with an average of 74%) and 235% (with an average of 82%), respectively.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=747fe94b3738cf7746434d2bf4eedee9&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=747fe94b3738cf7746434d2bf4eedee9&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.108</guid>
		</item>
		<item>
			<title>PrePrint: Scalable Node Level Computation Kernels for Parallel Exact Inference</title>
			<link>http://www.pheedcontent.com/click.phdo?i=ae994cb2713b8f40a65074bca41a8b32</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.106</pheedo:origLink>
			<description>In this paper, we investigate data parallelism in exact inference with respect to arbitrary junction trees. Exact inference is a key problem in exploring probabilistic graphical models, where the computation complexity increases dramatically  with clique width and the number of states of random variables. We study potential table representation and scalable algorithms for node level primitives. Based on such node level primitives, we propose computation kernels for evidence collection  and evidence distribution. A data parallel algorithm for exact inference is presented using the proposed computation kernels. We analyze the scalability of node level primitives, computation kernels and the exact inference algorithm using the coarse  grained multicomputer (CGM) model. According to the analysis, we achieve O(N d_c w_c \prod^{w_c}_{j=1} r_{c, j} / P) local computation time and O(N) global communication rounds using P processors, 1 &amp;#x2264; P &amp;#x2264; max_c  \prod^{w_c}_{j=1} r_{c, j}, where N is the number of cliques in the junction tree; d_c is the clique degree; r_{c, j} is the number of states of the j-th random variable in clique C; w_c is the clique width; and w_s is the separator width. We  implemented the proposed algorithm on state-of-the-art clusters. Experimental results show that the proposed algorithm exhibits almost linear scalability over a wide range.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=ae994cb2713b8f40a65074bca41a8b32&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=ae994cb2713b8f40a65074bca41a8b32&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.106</guid>
		</item>
		<item>
			<title>PrePrint: Integrating Evolutionary Computation with Abstraction Refinement for Model Checking</title>
			<link>http://www.pheedcontent.com/click.phdo?i=0f447aabe1a5480147f9e63d67258ada</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.105</pheedo:origLink>
			<description>Model checking for large scale systems is extremely difficult due to the state explosion problem. Creating useful abstractions for model checking task is a challenging problem, often involving many iterations of refinement. In this paper we consider techniques for model checking in the counterexample guided abstraction refinement. The state separation problem is one popular approach in counterexample guided abstraction refinement, and it poses the main hurdle during the refinement process. To achieve effective minimization of the separation set, we present a novel probabilistic learning approach based on the sample learning technique, evolutionary algorithm and effective heuristics. We integrate it with the abstraction refinement framework in the VIS model checker. We include experimental results on model checking to compare our new approach to recently published techniques. The benchmark results show that our approach has overall speedup of more than 56% against previous techniques. Our work is the first successful integration of evolutionary algorithm and abstraction refinement for model checking.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=0f447aabe1a5480147f9e63d67258ada&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=0f447aabe1a5480147f9e63d67258ada&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.105</guid>
		</item>
		<item>
			<title>PrePrint: Tradeoffs between Latency, Complexity and Load Balancing with Multicast Algorithms</title>
			<link>http://www.pheedcontent.com/click.phdo?i=4e682976693b4cc228f97e3d22806ae4</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.104</pheedo:origLink>
			<description>The increasing number of collective communication based services with a mass interest and the parallel increasing demand for service quality are paving the way towards end-to-end QoS guarantees. Although many multicast algorithms in interconnection networks have been widely reported in the literature, most of them handle the multicast communication within limited performance metrics, i.e. either delay/latency or throughput. In contrast, this study investigates the multicast communication within a group of QoS constrains, namely, latency, jitter, throughput and additional traffic caused. In this paper, we present the Qualified Groups (QG) as a novel path based multicast algorithm for interconnection networks. To the best of our knowledge, the QG is the first multicast algorithm that considers the multicast latency at both the network and node levels across different traffic scenarios in interconnection networks. Our analysis shows that the proposed multicast algorithm exhibits superior performance characteristics over other well known path-based multicast algorithms under different operating conditions. In addition, our results show that the QG can significantly improve the parallelism of the muticast communication.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=4e682976693b4cc228f97e3d22806ae4&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=4e682976693b4cc228f97e3d22806ae4&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.104</guid>
		</item>
		<item>
			<title>PrePrint: Independent Spanning Trees on Multidimensional Torus Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=99a9d772eb40a5296a2c3263030b7192</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.98</pheedo:origLink>
			<description>Two spanning trees rooted at vertex r in a graph G are called independent spanning trees (ISTs) if for each vertex v in G, v &amp;#x2260; r, the paths from vertex v to vertex r in these two trees are internally distinct. If the connectivity of G is k, the IST problem is to construct k ISTs rooted at each vertex. The IST problem has found applications in fault-tolerant broadcasting, but it is still open for general graphs with connectivity greater than four. In this paper, we shall propose a very simple algorithm for solving the IST problem on multidimensional torus networks. In our algorithm, every vertex can determine its parent for a specific independent spanning tree only depending on its own label. Thus, our algorithm can also be implemented in parallel systems or distributed systems very easily.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=99a9d772eb40a5296a2c3263030b7192&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=99a9d772eb40a5296a2c3263030b7192&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.98</guid>
		</item>
		<item>
			<title>PrePrint: CSMT: Simultaneous Multithreading for Clustered VLIW Processors</title>
			<link>http://www.pheedcontent.com/click.phdo?i=2fdd25339141d18130ce1e753164a85c</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.96</pheedo:origLink>
			<description>Simultaneous MultiThreading (SMT) is a well known technique that improves resource utilization by exploiting thread level parallelism at the instruction grain level. However, implementing SMT for VLIWs requires complex structures, which is contrary to the VLIW philosophy of hardware simplicity. In this paper, we propose CSMT (Cluster-level Simultaneous MultiThreading) to allow some degree of SMT in clustered VLIW processors with low hardware cost and complexity. CSMT considers the set of operations that execute simultaneously in a given cluster as the assignment unit. To minimize cluster conflicts between threads, a very simple hardware-based cluster renaming mechanism is proposed. The hardware required to implement CSMT is cheap, realistic and practical for a clustered VLIW processor. An analysis of the hardware required to implement CSMT shows that it is quite scalable, with upto 8 threads easily supported at low hardware cost. The experimental results show that CSMT significantly improves performance when compared with other multithreading approaches suited for VLIW. For instance, with 4 threads CSMT shows an average speedup of 110% over a single-thread VLIW architecture and 40% over Interleaved MultiThreading (IMT). In some cases, speedup can be as high as 225% over single-thread architecture and 84% over IMT.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=2fdd25339141d18130ce1e753164a85c&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=2fdd25339141d18130ce1e753164a85c&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.96</guid>
		</item>
		<item>
			<title>PrePrint: A High-Performance Deadlock-Free Multicast Routing Algorithm for K-Ary N-Cubes</title>
			<link>http://www.pheedcontent.com/click.phdo?i=87e3f0181a533c878eb1c27b1f0e3301</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.90</pheedo:origLink>
			<description>Current multicast routing algorithms for multiprocessor systems are based on the reservation of transmission resources and buffer space for messages before transmission. This strategy causes cyclic dependencies among messages and, in turn, deadlock situations. This paper proposes and evaluates the performance of a new multicast routing algorithm, called Deadlock Free Multicast Routing (DFMR), which removes the root cause of deadlocks. DFMR prevents deadlocks by allowing nodes to send flits as soon as inter-node links are available for transmission. This is obtained by allowing any possible interleaving of flits from all messages on inter-node links. The resulting routing algorithm eliminates the need to implement deadlock avoidance, detection, and recovery mechanisms, but needs larger output buffers. In DFMR, message flits carry additional information overhead which is used to avoid deadlock. Simulation results show that this information overhead has a negligible impact on overall performance which is considerably greater than that of previous algorithms.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=87e3f0181a533c878eb1c27b1f0e3301&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=87e3f0181a533c878eb1c27b1f0e3301&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.90</guid>
		</item>
		<item>
			<title>PrePrint: A Game Theoretic Framework for Power Control in Wireless Sensor Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=d70042b63a656a183539b23d42ea2d4d</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.82</pheedo:origLink>
			<description>In infrastructure-less sensor networks, efficient usage of energy is very critical because of the limited energy available to the sensor nodes. Among various phenomena that consume energy, radio communication is by far the most demanding. Energy loss can be minimized by effectively controlling the power at which the nodes transmit signals. In this paper, we apply game theory to solve the power control problem in a CDMA based distributed sensor network. We formulate a non-cooperative game under incomplete information and study the existence of Nash equilibrium. With the help of this equilibrium, we devise a distributed algorithm for optimal power control and prove that the system is power stable only if the nodes comply with certain transmit power thresholds. We show that even in a non-cooperative scenario, it is in the best interest of the nodes to comply with these thresholds. The power level at which a node should transmit, to maximize its utility, is evaluated. Moreover, we compare the utilities when the nodes are allowed to transmit with discrete and continuous power levels; the performance with discrete levels is upper-bounded by the continuous case. Numerical results demonstrate that the proposed algorithm achieves the best possible payoff/utility for the sensor nodes even by consuming less power.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=d70042b63a656a183539b23d42ea2d4d&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=d70042b63a656a183539b23d42ea2d4d&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.82</guid>
		</item>
		<item>
			<title>PrePrint: Thread Relocation: A Runtime Architecture for Tolerating Hard-Errors in Chip Multiprocessor</title>
			<link>http://www.pheedcontent.com/click.phdo?i=8c676ef7da175b2f57d3ed4378c35d45</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.76</pheedo:origLink>
			<description>As the semiconductor industry continues its relentless push for nano-CMOS technologies, device reliability and occurrence of hard-errors has emerged as a dominant concern in multi-cores. Although regular memory structures are protected against hard-errors using error correcting codes or spare rows and columns, many of the structures within the cores are left unprotected. Even if the location of hard-errors is known a priori, disabling faulty cores result in a substantial performance loss. Several proposed techniques use microarchitectural redundancy to allow a defective core to continue operation. These techniques are attractive, but limited due to either added cost of additional redundancy that offers no benefits to an error-free core, or limited coverage, due to the natural redundancy offered by the microarchitecture. We propose to exploit the inter-core redundancy in chip multiprocessors for hard-error tolerance. Our scheme combines hardware reconfiguration to ensure reduced functionality of cores, and a runtime layer of software (microvisor) to manage mapping of threads to cores. Microvisor observes the changing phase behavior of threads and initiates thread relocation to match the computational demands of threads to the capabilities of cores. Our results show that in the presence of degraded cores, microvisor mitigates performance losses to an average of 2%.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=8c676ef7da175b2f57d3ed4378c35d45&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=8c676ef7da175b2f57d3ed4378c35d45&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.76</guid>
		</item>
		<item>
			<title>PrePrint: Matrix Stripe Cache-Based Contiguity Transform for Fragmented Writes in RAID-5</title>
			<link>http://www.pheedo.com/click.phdo?i=a882a699a21f52a77d8e9da8847dfbbe</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2007.1058</pheedo:origLink>
			<description>Given that contiguous reads and writes between a cache and a disk outperform fragmented reads and writes, fragmented reads and writes are forcefully transformed into contiguous reads and writes via a proposed matrix stripe cachebased contiguity transform (MSC-CT) method, which employs a rule of consistency for data integrity at the block level, and a rule of performance that ensures no performance degradation. MSCCT performs for reads and writes, both of which are produced by write requests from a host, as a write request from a host employs reads for parity-update and writes to disks in a RAID- 5 array. MSC-CT is compatible with existing disk technologies. The proposed implementation in a Linux kernel delivers a peak throughput that is 3.2 times higher than a case without MSC-CT on representative workloads. The results demonstrate that MSCCT is extremely simple to implement, has low overhead, and is ideally suited for RAID controllers not only for random writes but also for sequential writes in various realistic scenarios.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=a882a699a21f52a77d8e9da8847dfbbe&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=a882a699a21f52a77d8e9da8847dfbbe&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2007.1058</guid>
		</item>
	</channel>
</rss>