<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet href="/css/rss20.xsl" type="text/xsl"?>
<rss xmlns:pheedo="http://www.pheedo.com/namespace/pheedo" version="2.0">
	<channel>
		<title>IEEE Transactions on Computers</title>
		<link>http://www.computer.org/tc</link>
		<description>The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers, brief contributions, and comments on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability;
g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.	</description>
		<language>en-us</language>
		<pubDate>Tue, 16 Dec 2008 14:42:42 GMT</pubDate>
		<image>
			<url>http://csdl.computer.org/common/images/logos/tc.gif</url>
			<title>IEEE Computer Society</title>
			<description>List of recently published journal articles</description>
			<link>http://www.computer.org/tc</link>
		</image>
		<item>
			<title>PrePrint: An Efficient Directed Localization Recursion Protocol for Wireless Sensor Networks</title>
			<link>http://www.pheedo.com/click.phdo?i=44381b080df339f20471f01e9f83be6d</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.221</pheedo:origLink>
			<description>The establishment of a localization system is an important task in wireless sensor networks. Due to the geographical correlation between sensed data, location information is commonly used to name the gathered data and address nodes and regions in data dissemination protocols. In general, to estimate its location, a node needs the position information of at least three reference points (neighbors that know their positions). In this work, we propose a different scheme in which only two reference points are required in order to estimate a position. To choose between the two possible solutions of an estimate, we use the known direction of the recursion. This approach leads to a recursive localization system that works with low density networks (increasing by 40% the number of nodes with estimates in some cases), reduces the position error by almost 30%, requires 37% less processor resources to estimate a position, uses fewer beacon nodes, and also indicates the node position error based on its distance to the recursion origin. No GPS-enabled node is required, since the recursion origin can be used as a relative coordinate system. The algorithm's evaluation is performed by comparing it with a similar localization system; also, experiments are made to evaluate the impact of both systems in geographic algorithms.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=44381b080df339f20471f01e9f83be6d&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=44381b080df339f20471f01e9f83be6d&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=44381b080df339f20471f01e9f83be6d&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.221</guid>
		</item>
		<item>
			<title>PrePrint: On the Multi-Hop Performance of Synchronization Mechanisms in High Propagation Delay Networks</title>
			<link>http://www.pheedo.com/click.phdo?i=8d8bfe42ca7c6e6b507ff5c164747fef</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.220</pheedo:origLink>
			<description>We analyze the single and multi-hop performance of time synchronization mechanisms for challenging environments characterized by high propagation delays, low duty-cycle operation, and imprecise clocks, such as underwater acoustic sensor networks. We find that receiver-receiver based schemes are unsuitable for such environments, and therefore focus primarily on sender-receiver schemes. According to our analysis, a one-way dissemination approach provides good clock skew estimation and poor offset estimation while a two-way exchange approach provides accurate offset estimation but imprecise clock skew estimation. In average, using one-way scheme can result in significant cumulative propagation error over multiple hops, and using two-way can lead to high variance of propagation error. We develop and analyze a hybrid one-way dissemination/two-way exchange technique, and verify the performance of our hybrid scheme through trace-based experiments. The results suggest that this hybrid approach can provide bounded average error propagation in multi-hop settings and significantly lower variance of propagation error.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=8d8bfe42ca7c6e6b507ff5c164747fef&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=8d8bfe42ca7c6e6b507ff5c164747fef&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=8d8bfe42ca7c6e6b507ff5c164747fef&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.220</guid>
		</item>
		<item>
			<title>PrePrint: A New Basis for Shifters in General-Purpose Processors for Existing and Advanced Bit Manipulations</title>
			<link>http://www.pheedo.com/click.phdo?i=931e7d4d557b50d06f22e39e544060a2</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.219</pheedo:origLink>
			<description>This paper describes a new basis for the implementation of the shifter functional unit in microprocessors that can implement new advanced bit manipulations as well as standard shifter operations. Our design is based on the inverse butterfly and butterfly datapath circuits, rather than the barrel shifter or log-shifter designs currently used. We show how this new shifter basis can implement the standard shift and rotate operations, as well as more advanced extract, deposit and mix operations found in some processors. Furthermore, it can perform important new classes of even more advanced bit manipulation instructions like arbitrary bit permutations, bit scatter and bit gather instructions. Thus our new functional unit performs the functionality of three functional units &amp;#x2013; the basic shifter, the multimedia-mix unit and the advanced bit manipulation functional unit, while having a latency only slightly worse than that of the log-shifter.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=931e7d4d557b50d06f22e39e544060a2&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=931e7d4d557b50d06f22e39e544060a2&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=931e7d4d557b50d06f22e39e544060a2&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.219</guid>
		</item>
		<item>
			<title>PrePrint: Decimal Floating-Point Multiplication</title>
			<link>http://www.pheedo.com/click.phdo?i=6ac5c43fbb80620d3dd84e2eec7c0af0</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.218</pheedo:origLink>
			<description>Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents the design of two decimal floating-point multipliers; one whose partial product accumulation strategy employs decimal carry-save addition, and one which employs binary carry-save addition. The multiplier based on decimal carry-save addition favors a non-pipelined, iterative implementation. The multiplier utilizing binary carry-save addition allows for an efficient pipelined implementation when latency and throughput are considered more important than area. Both designs comply with specifications for decimal multiplication given in the proposed revision of the IEEE 754 Standard for Floating-point Arithmetic (IEEE P754). The multipliers extend previously published decimal fixed-point multipliers by adding several features including exponent generation, sticky bit generation, shifting of the intermediate product, rounding, and exception detection and handling. Novel features of the multipliers include support for decimal floating-point numbers, on-the-fly generation of the sticky bit in the iterative design, early estimation of the shift amount, and efficient decimal rounding. Iterative and parallel decimal fixed-point and floating-point multipliers are compared in terms of their area, delay, latency, and throughput based on verified Verilog register transfer level models.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=6ac5c43fbb80620d3dd84e2eec7c0af0&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=6ac5c43fbb80620d3dd84e2eec7c0af0&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=6ac5c43fbb80620d3dd84e2eec7c0af0&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.218</guid>
		</item>
		<item>
			<title>PrePrint: High-Performance Hardware Architectures for Galois Counter Mode</title>
			<link>http://www.pheedo.com/click.phdo?i=eebcd47aba010be0e37eb24a06875beb</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.217</pheedo:origLink>
			<description>Various high-performance hardware architectures for GCM (Galois Counter Mode) in conjunction with various AES (Advanced Encryption Standard) circuits and multiplier-adders are proposed. A total of seventeen GCM-AES circuits were synthesized by using a 130-nm CMOS standard cell library, and the trade-offs between speed and hardware resources were evaluated. Our flexible architectures achieved a wide variety of performances from compact (2.56 Gbps with 34.5 Kgates) to high-speed (62.6 Gbps with 979.3 Kgates). All of our architectures support key sizes of 128, 192, and 256 bits, while only one previous approach does. Even with variable-length key support, our architecture also achieved the highest hardware efficiency (defined as throughput per gate) among the designs using the same generation of process technology.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=eebcd47aba010be0e37eb24a06875beb&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=eebcd47aba010be0e37eb24a06875beb&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=eebcd47aba010be0e37eb24a06875beb&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.217</guid>
		</item>
		<item>
			<title>PrePrint: Formally Verified Argument Reduction with a Fused-Multiply-Add</title>
			<link>http://www.pheedo.com/click.phdo?i=0895072205fef82dbef7abe9cfb941b4</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.216</pheedo:origLink>
			<description>Cody &amp;#x0026; Waite argument reduction technique works perfectly for reasonably large arguments but as the input grows there are no bits left to approximate the constant with enough accuracy. Under mild assumptions, we show that the result computed with a fused-multiply-add provides a fully accurate result for many possible values of the input with a constant almost accurate to the full working precision. We also present an algorithm for a fully accurate second reduction step to reach full double accuracy (all the significand bits of two numbers are accurate) even in the worst cases of argument reduction. Our work recalls the common algorithms and presents proofs of correctness. All the proofs are formally verified using the Coq automatic proof checker.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=0895072205fef82dbef7abe9cfb941b4&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=0895072205fef82dbef7abe9cfb941b4&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=0895072205fef82dbef7abe9cfb941b4&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.216</guid>
		</item>
		<item>
			<title>PrePrint: Accurate Floating Point Product and Exponentiation</title>
			<link>http://www.pheedo.com/click.phdo?i=c289a449ec27f1447e4e5c836ee56a7c</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.215</pheedo:origLink>
			<description>Several different techniques and softwares intend to improve the accuracy of results computed in a fixed finite precision. Here we focus on a method to improve the accuracy of the product of floating point numbers. We show that the computed result is as accurate as if computed in twice the working precision. The algorithm is simple since it only requires addition, subtraction and multiplication of floating point numbers in the same working precision as the given data. Such an algorithm can be useful for example to compute the determinant of a triangular matrix and to evaluate a polynomial when represented by the root product form. It can also be used to compute the power of a floating point number.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=c289a449ec27f1447e4e5c836ee56a7c&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=c289a449ec27f1447e4e5c836ee56a7c&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=c289a449ec27f1447e4e5c836ee56a7c&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.215</guid>
		</item>
		<item>
			<title>PrePrint: A Process Algebraic View of Latency-Insensitive Systems</title>
			<link>http://www.pheedo.com/click.phdo?i=464228ac69e0fa5f3007ad0558a8191b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.214</pheedo:origLink>
			<description>Latency-insensitive (LI) systems are those which can function correctly in spite of delays along its connecting wires. This delay is assumed to be a multiple of the clock period. The paper presents a single-clock process algebraic model for such systems. It gives the definitions for LI computational blocks and LI connectors. Important properties for these are shown to be satisfied. Composition of such modules can be done by the parallel composition operator of the process algebra. Conditions are given to check for liveness and deadlock freedom of LI systems. Comparison of latency-equivalence between streams of events can be done using the model and this leads to a method of proving latency-equivalent modules. The paper is a step towards high-level specification and verification of such systems. The work can be extended to address more complex interconnections by modelling the underlying finite-state machines.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=464228ac69e0fa5f3007ad0558a8191b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=464228ac69e0fa5f3007ad0558a8191b&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=464228ac69e0fa5f3007ad0558a8191b&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.214</guid>
		</item>
		<item>
			<title>IEEE Transactions on Computers - January 2009 (Vol. 58, No. 1)</title>
			<link>http://www.pheedo.com/click.phdo?i=b9b0cbeb664e1cc2fdb4b8c7691d9398</link>
			<pheedo:origLink>http://opac.ieeecomputersociety.org/opac?year=2009&amp;amp;volume=58&amp;amp;issue=01&amp;amp;acronym=tc</pheedo:origLink>
			<description>IEEE Transactions on Computers&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=b9b0cbeb664e1cc2fdb4b8c7691d9398&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=b9b0cbeb664e1cc2fdb4b8c7691d9398&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://www.computer.org/portal/site/tc/</guid>
		</item>
		<item>
			<title>PrePrint: Verified Real Number Calculations: A Library for Interval Arithmetic</title>
			<link>http://www.pheedo.com/click.phdo?i=b7a4e1845ec0234f5749306fe1c1e79c</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.213</pheedo:origLink>
			<description>Real number calculations on elementary functions are remarkably difficult to handle in mechanical proofs. In this paper, we show how these calculations can be performed within a theorem prover or proof assistant in a convenient and highly automated as well as interactive way. First, we formally establish upper and lower bounds for elementary functions. Then, based on these bounds, we develop a rational interval arithmetic where real number calculations take place in an algebraic setting. In order to reduce the dependency effect of interval arithmetic, we integrate two techniques: interval splitting and Taylor series expansions. This pragmatic approach has been developed, and formally verified, in a theorem prover. The formal development also includes a set of customizable strategies to automate proofs involving explicit calculations over real numbers. Our ultimate goal is to provide guaranteed proofs of numerical properties with minimal human theorem-prover interaction.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=b7a4e1845ec0234f5749306fe1c1e79c&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=b7a4e1845ec0234f5749306fe1c1e79c&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=b7a4e1845ec0234f5749306fe1c1e79c&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.213</guid>
		</item>
		<item>
			<title>PrePrint: Power-up SRAM State as an Identifying Fingerprint and Source of True Random Numbers</title>
			<link>http://www.pheedo.com/click.phdo?i=bf3feca1c639a2a15b04f9a682a243b2</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.212</pheedo:origLink>
			<description>Passive applications create a need for low-cost security and privacy in potentially hostile environments, supported by primitives including identification and random number generation. Our measurements show that power-up of SRAM produces a physical fingerprint. We propose a system of Fingerprint Extraction and Random Numbers in SRAM (FERNS) that harvests static identity and randomness from existing volatile CMOS storage without requiring any dedicated circuitry. The identity results from manufacture-time physically random device threshold mismatch, and the random numbers result from run-time physically random noise. We use experimental data from high performance SRAM, and the WISP UHF RFID tag to validate the principles behind FERNS. We demonstrate that 8 byte fingerprints from an SRAM chip are sufficient for uniquely identifying circuits among a population of 5,120 and extrapolate that 16 to 24 bytes of SRAM would be sufficient for uniquely identifying every instance of the SRAM ever produced. Using a smaller population, we demonstrate similar identifying ability from the embedded SRAM microcontroller memory of the WISP. In addition to identification, we show that SRAM fingerprints capture noise, enabling true random number generation. We demonstrate that the initial states of a 256 byte SRAM can produce 128 bit true random numbers capable of passing the NIST approximate entropy test.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=bf3feca1c639a2a15b04f9a682a243b2&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=bf3feca1c639a2a15b04f9a682a243b2&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=bf3feca1c639a2a15b04f9a682a243b2&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.212</guid>
		</item>
		<item>
			<title>PrePrint: A Floating-Point Unit for 4D Vector Inner Product with Reduced Latency</title>
			<link>http://www.pheedo.com/click.phdo?i=8b39aa8a1000e2292b8b2c7ecb4a325f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.210</pheedo:origLink>
			<description>This paper presents the algorithm and implementation of a new high-performance functional unit for floating-point 4-dimensional vector inner-product (4D dot-product; DP4) which is most frequently performed in 3D graphics application. The proposed IEEE compliant DP4 unit computes Z=AB+CD+EF+GH in one path, and keeps the intermediate rounding by IEEE754 rounding-to-nearest-even. The intermediate rounding is merged with shift-alignment, and intermediate carry-propagated addition and normalization are omitted to reduce latency in the proposed architecture. The proposed DP4 unit is implemented with 0.18um CMOS technology and has 12.8ns critical path delay, which is reduced by 45.5% compared to a previous DP4 implementation using discrete multipliers and adders. The proposed DP4 unit also reduces the cycle time of 3D graphics applications by 12.4% on average compared to usual 3D graphics FPU based on 4-way multiply-add-fused units.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=8b39aa8a1000e2292b8b2c7ecb4a325f&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=8b39aa8a1000e2292b8b2c7ecb4a325f&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=8b39aa8a1000e2292b8b2c7ecb4a325f&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.210</guid>
		</item>
		<item>
			<title>PrePrint: Practical Deadlock-Free Fault-Tolerant Routing in Meshes Based on the Planar Network Fault Model</title>
			<link>http://www.pheedo.com/click.phdo?i=29963a439bec11af397b8a71e2ba633a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.211</pheedo:origLink>
			<description>The number of virtual channels required for deadlock-free routing is important for cost-effective and high-performance system design. The planar adaptive routing scheme is an effective deadlock avoidance technique using only 3 virtual channels for each physical channel in 3-dimensional or higher dimensional mesh networks with a very simple deadlock avoidance scheme. However, there exists one idle virtual channels for all physical channels along the first dimension, and two idle virtual channels for channels along the last dimension in a mesh network based on the planar adaptive routing algorithm. A new deadlock avoidance technique is proposed for 3-dimensional meshes using only two virtual channels by making full use of the idle channels. The deadlock-free adaptive routing scheme is then modified to a deadlock-free adaptive fault-tolerant routing scheme based on a planar network (PN) fault model. The proposed deadlock-free adaptive routing scheme is also extended to n-dimensional meshes still using two virtual channels. Sufficient simulation results are presented to demonstrate the effectiveness of the proposed algorithm.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=29963a439bec11af397b8a71e2ba633a&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=29963a439bec11af397b8a71e2ba633a&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=29963a439bec11af397b8a71e2ba633a&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.211</guid>
		</item>
		<item>
			<title>PrePrint: A Software Implementation of the IEEE 754R Decimal Floating-Point Arithmetic Using the Binary Encoding Format</title>
			<link>http://www.pheedo.com/click.phdo?i=7ac790c1beec74ffab5c16930a8a2125</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.209</pheedo:origLink>
			<description>The IEEE Standard 754-1985 for Binary Floating-Point Arithmetic was revised, and an important addition is the definition of decimal floating-point arithmetic. This is intended mainly to provide a robust, reliable framework for financial applications that are often subject to legal requirements concerning rounding and precision of the results, because the binary floating-point arithmetic may introduce small but unacceptable errors. Using binary floating-point calculations to emulate decimal calculations in order to correct this issue has led to the existence of numerous proprietary software packages, each with its own characteristics and capabilities. IEEE 754R decimal arithmetic should unify the ways decimal floating-point calculations are carried out on various platforms. New algorithms and properties are presented in this paper which are used in a software implementation of the IEEE 754R decimal floating-point arithmetic, with emphasis on using binary operations efficiently. The focus is on rounding techniques for decimal values stored in binary format, but algorithms are outlined for the more important or interesting operations of addition, multiplication, division, including the case of nonhomogeneous operands, as well as conversions between binary and decimal floating-point formats. Performance results are included for a wider range of operations, showing promise that our approach is viable for applications that require decimal floating-point calculations.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=7ac790c1beec74ffab5c16930a8a2125&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=7ac790c1beec74ffab5c16930a8a2125&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=7ac790c1beec74ffab5c16930a8a2125&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.209</guid>
		</item>
		<item>
			<title>PrePrint: A Safe Stochastic Analysis with Relaxed Limitations on the Periodic Task Model</title>
			<link>http://www.pheedo.com/click.phdo?i=4dcd6728ac0b9af92381530936abbf53</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.208</pheedo:origLink>
			<description>This paper proposes a safe stochastic analysis for fixed-priority scheduling, which is applicable to a broader spectrum of periodic tasks than the ones analyzable by any of existing techniques. The proposed analysis can find a safe upper bound of deadline miss probability for periodic tasks with (1) arbitrary execution time distributions, (2) varying interrelease times with the period as the minimum, and (3) the maximum utilization factor Umax that can be greater than 1. One challenge for this is that the release times of tasks are not known a priori, thus we need to consider all possible phase combinations among jobs to find the worst case deadline miss probability, which is not tractable. To handle this difficulty, we first derive the worst case phase combination for harmonic task sets. Then, we present a safe way to transform a non-harmonic task set to a harmonic task set such that the deadline miss probabilities obtained with the worst case phase combination for the transformed harmonic task set are guaranteed to be worse than those for the original non-harmonic task set with all possible phase combinations. Through experiments, we show that the deadline miss probability computed as the safe upper bound by the proposed analysis is tight enough for practical uses.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=4dcd6728ac0b9af92381530936abbf53&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=4dcd6728ac0b9af92381530936abbf53&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=4dcd6728ac0b9af92381530936abbf53&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.208</guid>
		</item>
		<item>
			<title>PrePrint: Improved Polynomial Multiplication Formulae over $\F_2$ Using Chinese Remainder Theorem</title>
			<link>http://www.pheedo.com/click.phdo?i=feaaa23d09ad397665d3d2beb655771c</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.207</pheedo:origLink>
			<description>Let $n$ and $\ell$ be positive integers and $f(x)$ be an irreducible polynomial over $\F_2$ such that $\ell deg(f(x))&lt;2n-1.$ We obtain an effective upper bound for the multiplication complexity of $n$-term polynomials modulo $f(x)^\ell.$ This upper bound allows a better selection of the moduli when Chinese Remainder Theorem is used for polynomial multiplication over $\F_2$. We give improved formulae to multiply polynomials of small degree over $\F_2$. In particular we improve the best known multiplication complexities over $\F_2$ in the literature in some cases.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=feaaa23d09ad397665d3d2beb655771c&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=feaaa23d09ad397665d3d2beb655771c&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=feaaa23d09ad397665d3d2beb655771c&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.207</guid>
		</item>
		<item>
			<title>PrePrint: Prediction-Based Micro-Scheduler: Toward Responsive Scheduling of General-Purpose Operating Systems</title>
			<link>http://www.pheedo.com/click.phdo?i=62fdbe55de8a341f923c987f0993efd3</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.206</pheedo:origLink>
			<description>In this paper, we propose a novel scheduling technique to improve the responsiveness of a real-time process while maintaining relative execution rates of non-real-time processes, called the prediction-based micro-scheduler. It runs upon existing macro-scheduler and it conditionally rearranges the scheduling pattern generated by the macro-scheduler based on urgent interval prediction and lock hold time prediction. The rearrangement occurs if one process seeks to enter a long non-preemptible section and the operation is predicted to significantly disturb the future execution of a real-time process. We implemented the prototype on Linux 2.6.19. Experimental results show that the worst-case OS latency of a real-time process is reduced up to 34% of the original one while still maintaining relative execution rates of non-real-time processes. Moreover, the performance degradation caused by the micro-scheduler does not exceed 5%.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=62fdbe55de8a341f923c987f0993efd3&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=62fdbe55de8a341f923c987f0993efd3&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=62fdbe55de8a341f923c987f0993efd3&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.206</guid>
		</item>
		<item>
			<title>PrePrint: Exact Path Delay Fault Coverage Calculation of Partitioned Circuits</title>
			<link>http://www.pheedo.com/click.phdo?i=5a86366706065e7a0cae132aea52943c</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.205</pheedo:origLink>
			<description>Exact Path Delay Fault (PDF) coverage calculation of large circuits with an exponential number of detectable PDFs requires exponential memory space. This often yields memory overflow in computations. One common approach to avoid memory overflow is to partition or virtually cut circuits into several subcircuits, and to perform coverage calculation at the partition-level circuit. Partitioning reduces the number of PDFs to be stored in memory exponentially at the expense of losing considerable coverage. This paper describes an algorithm to improve the reported coverage value by recovering the lost PDFs up to a user controlled degree (Delta), which is a function of the length of cut net sequences. The reported coverage monotonically increases as Delta increases: the exact coverage value is guaranteed when Delta is equal to the number of consecutive partitions on the longest path. Experimental results are provided for the ISCAS85 circuits that illustrate the trade off between the computation time, the number of partitions, the values of Delta, and the reported coverage. The results indicate that as Delta and the number of partitions increase, the run time does not increase after some point since the cost of set operations reduces as more partitions are created.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=5a86366706065e7a0cae132aea52943c&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=5a86366706065e7a0cae132aea52943c&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=5a86366706065e7a0cae132aea52943c&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.205</guid>
		</item>
		<item>
			<title>PrePrint: A Discrete Logarithm Number System for Integer Arithmetic Modulo 2^k: Algorithms and Lookup Structures</title>
			<link>http://www.pheedo.com/click.phdo?i=81bdac8eac2d7fc20c618fa65bd0106d</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.204</pheedo:origLink>
			<description>We present a k-bit encoding of the k-bit binary integers based on a discrete logarithm representation. The representation supports a discrete logarithm number system (DLS) that allows integer multiplication to be reduced to addition and integer exponentiation to be reduced to multiplication. We introduce right-to-left bit serial conversion, deconversion and unified conversion/deconversion algorithms between binary and DLS. The conversion algorithms utilize O(k) additions, do not require use of a multiplier, and are applicable at least up to 128 bit integers. We illustrate the use of the representation in determining a novel and efficient integer power modulo 2^k operation|x^y| mod(2^k), and compare hardware performance with a current state-of-the-art method. Furthermore, we describe properties of the conversion mappings that allow compact table lookup structures to be employed for direct conversion-to and deconversion-from the DLS encoding. Our lookup architecture allows 16-bit conversion and deconversion mappings to be realized with table sizes of order 2-8 Kbytes, which is up to a 64x size reduction of the 128 Kbytes of an arbitrary 16-bits-in 16-bits-out function table. Performance and area results are given that demonstrate effectiveness of the table lookup architecture. The lookup methodology extends to other 16-bit integer functions such as multiplicative inverse and squaring operations.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=81bdac8eac2d7fc20c618fa65bd0106d&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=81bdac8eac2d7fc20c618fa65bd0106d&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=81bdac8eac2d7fc20c618fa65bd0106d&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.204</guid>
		</item>
		<item>
			<title>PrePrint: Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support</title>
			<link>http://www.pheedo.com/click.phdo?i=ea05e4c725bdabd4ae19364815ac008a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.203</pheedo:origLink>
			<description>The demand for improved SIMD floating-point performance on general-purpose x86-compatible microprocessors is rising. At the same time, there is a contrary demand in the low-power computing market for a reduction in power consumption. Along with this, there is the absolute necessity of backwards compatibility for x86-compatible microprocessors, which includes the support of x87 scientific floating-point instructions. The combined effect is that there is a need for low-power, low-cost floating-point units that are still capable of delivering good SIMD performance while maintaining full x86 functionality. This paper presents the design of an x86-compatible, IEEE-compliant floating-point multiplier that is specifically tailored to provide good SIMD performance in a low-cost, low-power solution while maintaining full x87 backwards compatibility. The floating-point multiplier efficiently supports multiple precisions using an iterative rectangular multiplier scheme. The floating-point multiplier can perform two parallel single-precision (SP) multiplies every cycle with a latency of two cycles, one double-precision (DP) multiply every two cycles with a latency of four cycles, or one extended-double-precision (EP) multiply every three cycles with a latency of five cycles. The iterative floating-point multiplier also supports division, square-root and transcendentals. Compared to a previous design with similar functionality, the proposed iterative floating-point multiplier has 60% less area and 59% less dynamic power dissipation.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=ea05e4c725bdabd4ae19364815ac008a&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=ea05e4c725bdabd4ae19364815ac008a&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=ea05e4c725bdabd4ae19364815ac008a&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.203</guid>
		</item>
		<item>
			<title>PrePrint: An Efficient Rounding Boundary Test for pow(x,y) in Double Precision</title>
			<link>http://www.pheedo.com/click.phdo?i=5f3c182014e708a7da9a8155b2d179e8</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.202</pheedo:origLink>
			<description>The correct rounding of the function pow (power function x^y) is currently based on Ziv's iterative approximation process. In order to ensure its termination, cases when x^y falls on a rounding boundary must be filtered out. Such rounding boundaries are floating-point numbers and midpoints between two consecutive floating-point numbers. Detecting rounding boundaries for pow is a difficult problem. Previous approaches use repeated square root extraction followed by repeated square and multiply. This article presents a new rounding boundary test for pow in double precision which reduces this to a few comparisons with pre-computed constants. These constants are deduced from worst cases for the Table Maker's Dilemma, searched over a small subset of the input domain. This is a novel use of such worst-case bounds. The resulting algorithm has been designed for a fast-on-average correctly rounded implementation of pow, considering the scarcity of rounding boundary cases. It does not stall average computations for rounding boundary detection. The article includes its correctness proof and experimental results.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=5f3c182014e708a7da9a8155b2d179e8&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=5f3c182014e708a7da9a8155b2d179e8&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=5f3c182014e708a7da9a8155b2d179e8&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.202</guid>
		</item>
		<item>
			<title>PrePrint: Kahan's Algorithm for a Correct Discriminant Computation at Last Formally Proven</title>
			<link>http://www.pheedo.com/click.phdo?i=358690ea6a47e0a51bd120f1db9c9354</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.200</pheedo:origLink>
			<description>This article tackles Kahan's algorithm to compute accurately the discriminant. This is a known difficult problem and this algorithm leads to an error bounded by 2 ulps of the floating-point result. The proofs involved are long and tricky and even trickier than expected as the test involved may give a result different from the result of the same test without rounding. We give here the total demonstration of the validity of this algorithm and we provide sufficient conditions to guarantee neither overflow, nor underflow will jeopardize the result. The IEEE-754 double precision program is annotated using the Why platform and the proof obligations are done using the Coq automatic proof checker.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=358690ea6a47e0a51bd120f1db9c9354&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=358690ea6a47e0a51bd120f1db9c9354&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=358690ea6a47e0a51bd120f1db9c9354&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.200</guid>
		</item>
		<item>
			<title>PrePrint: Improved Computation of Square Roots in Specific Finite Fields</title>
			<link>http://www.pheedo.com/click.phdo?i=2bbc37504aee6b8bc56a03cfeef923e8</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.201</pheedo:origLink>
			<description>In this paper we study exponentiation in the specific finite fields Fq with very special exponents such as they occur in algorithms for computing square roots. Here, q is a prime power, q = p^k, where k &amp;#x003E; 1 and k is odd. Our algorithmic approach improves the corresponding exponentiation resulted from better-rewritten exponent. To the best of our knowledge, it is the first major improvement to the Tonelli-Shanks algorithm, for example, the number of multiplications can be reduced to at least 60% on an average when p = 1 (mod 16). Several numerical examples are given that show the speed-up of the proposed methods.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=2bbc37504aee6b8bc56a03cfeef923e8&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=2bbc37504aee6b8bc56a03cfeef923e8&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=2bbc37504aee6b8bc56a03cfeef923e8&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.201</guid>
		</item>
		<item>
			<title>PrePrint: Provably Secure Steganography</title>
			<link>http://www.pheedo.com/click.phdo?i=6de7369fbce3bd183015e31ac2f4e95c</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.199</pheedo:origLink>
			<description>Steganography is the problem of hiding secret messages in "innocent-looking," public communication so that the presence of the secret messages cannot be detected. This paper introduces a cryptographic formalization of steganographic security in terms of computational indistinguishability from a channel, an indexed family of probability distributions on cover messages. We use cryptographic and complexity-theoretic proof techniques to show that the existence of one-way functions and the ability to sample from the channel are necessary conditions for secure steganography. We then construct a steganographic protocol, based on rejection sampling from the channel, that is provably secure and has nearly optimal bandwidth under these conditions. This is the first known example of a general, provably secure steganographic protocol. We also give the first formalization of "robust" steganography, where an adversary attempts to remove any hidden messages without unduly disrupting the cover channel. We give a necessary condition on the amount of disruption the adversary is allowed in terms of a worst-case measure of mutual information. We give a construction similar to a random code that is provably secure, computationally efficient, and has nearly optimal bandwidth, assuming repeatable access to the channel distribution.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=6de7369fbce3bd183015e31ac2f4e95c&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=6de7369fbce3bd183015e31ac2f4e95c&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=6de7369fbce3bd183015e31ac2f4e95c&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.199</guid>
		</item>
		<item>
			<title>PrePrint: Shortest Path Trees Computation in Dynamic Graphs</title>
			<link>http://www.pheedo.com/click.phdo?i=cb89032c48a34b3135939fe0fe4b835f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.198</pheedo:origLink>
			<description>Let G=(V, E, w) be a simple digraph, in which all edge weights are non-negative real numbers. Let G' be obtained from G by an application of a set of edge weight updates to G. Let s " V, and let Ts and Ts"&#166; be shortest path trees (SPTs) rooted at s in G and G', respectively. The Dynamic Shortest Path (DSP) problem is to compute Ts' from Ts. The DSP problem finds applications in many areas, such as networking and databases, in which a graph is constantly updated and SPTs are re-computed in real-time. Existing work on this problem either focuses on a single edge weight change, or for multiple edge weight changes, some of them are incorrect or are not optimized. We correct and extend a few state-of-the-art dynamic SPT algorithms to handle multiple edge weight updates. We prove that these algorithms are correct. The complexities of these algorithms are also analyzed. Dynamic algorithms may not outperform static algorithms all the time. To evaluate the proposed algorithms, we compare them with the well-known static Dijkstra's algorithm. Extensive experiments are conducted on the Connecticut road system. The experimental results suggest the most appropriate algorithms to be used under different circumstances.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=cb89032c48a34b3135939fe0fe4b835f&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=cb89032c48a34b3135939fe0fe4b835f&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=cb89032c48a34b3135939fe0fe4b835f&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.198</guid>
		</item>
		<item>
			<title>PrePrint: Differentiating Services with Non-Congestive Queuing (NCQ)</title>
			<link>http://www.pheedo.com/click.phdo?i=2433d543efeb5e3015e693535910a2c8</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.197</pheedo:origLink>
			<description>We discuss a new packet service paradigm, called 'Less Impact Better Service' (LIBS), which is realized through a novel queuing discipline, called 'Non-Congestive Queuing' (NCQ). NCQ prioritizes small packets when conditions permit, and utilizes service thresholds to confine the delay impact of prioritization on congestive applications. We show that LIBS and NCQ satisfy more users with diverse demands on delay and throughput. We obtained both analytical and simulation results, which are very promising. Diversity is simulated by using FTP, sensor and VoIP traffic.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=2433d543efeb5e3015e693535910a2c8&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=2433d543efeb5e3015e693535910a2c8&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=2433d543efeb5e3015e693535910a2c8&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.197</guid>
		</item>
		<item>
			<title>PrePrint: Enhancing Downlink Performance in Wireless Networks by Simultaneous Multiple Packet Transmission</title>
			<link>http://www.pheedo.com/click.phdo?i=36e06e14b1fe571c4800de435a3640fe</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.191</pheedo:origLink>
			<description>In this paper we consider using simultaneous Multiple Packet Transmission (MPT) to improve the downlink performance of wireless networks. With MPT, the sender can send two compatible packets simultaneously to two distinct receivers and can double the throughput in the ideal case. We formalize the problem of finding a schedule to send out buffered packets in minimum time as finding a maximum matching problem in a graph. Since maximum matching algorithms are relatively complex and may not meet the timing requirements of real-time applications, we give a fast approximation algorithm that is capable of finding a matching at least 3/4 of the size of a maximum matching in O(|E|) time where |E| is the number of edges in the graph. We also give analytical bounds for maximum allowable arrival rate which measures the speedup of the downlink after enhanced with MPT and our results show that the maximum arrival rate increases significantly even with a very small compatibility probability. We also use an approximate analytical model and simulations to study the average packet delay and our results show that packet delay can be greatly reduced even with a very small compatibility probability.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=36e06e14b1fe571c4800de435a3640fe&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=36e06e14b1fe571c4800de435a3640fe&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=36e06e14b1fe571c4800de435a3640fe&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.191</guid>
		</item>
		<item>
			<title>PrePrint: On Supporting High Quality 3D Geometry Multicasting over IEEE 802.11 Wireless Networks</title>
			<link>http://www.pheedo.com/click.phdo?i=4d8f61ac9b4cd4d1fde5542f34f0b9a3</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.190</pheedo:origLink>
			<description>With significant improvements in both wireless technologies and computational capabilities of mobile devices, it is now possible to exchange and render 3D graphics over wireless networks on mobile devices such as PDAs and laptops. In this paper, we consider a typical scenario where users holding mobile devices of different display resolutions and rendering capabilities request the same 3D object in an IEEE 802.11 wireless LAN. Several schemes are proposed. First, to support high quality 3D content multicasting, we analyze the characteristics of 3D data and choose the minimum data set for unicast in order to avoid excessive bandwidth consumption. Then, a transcoding algorithm is proposed to address the issue of multi-user diversity. In addition, to take advantage of the nature of broadcast in wireless medium and further mitigate the issue of serious resource usage due to large data size, we propose to broadcast certain less important refinement data. With the proposed hybrid unicast/broadcast transmission scheme and a packet overhearing mechanism, good scalability can be achieved. Finally, we schedule the unicast according to users' experienced link condition to handle user mobility. Simulation results show that the proposed schemes in combination can efficiently achieve the dual objectives of low transmission delay and small distortion.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=4d8f61ac9b4cd4d1fde5542f34f0b9a3&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=4d8f61ac9b4cd4d1fde5542f34f0b9a3&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=4d8f61ac9b4cd4d1fde5542f34f0b9a3&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.190</guid>
		</item>
		<item>
			<title>PrePrint: Modeling Detection Latency with Collaborative Mobile Sensing Architecture</title>
			<link>http://www.pheedo.com/click.phdo?i=777e1b0f3b6b43b9f86d75f341b70342</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.189</pheedo:origLink>
			<description>Detection latency, which is defined as the time from the target arrival to the time of the first detection, is an important metric for the performance of sensor networks carrying out target detection, especially when the target is malicious or hostile. It characterizes the efficiency of detecting the presence of a target in a region of interest. Traditionally, stationary sensor networks are used to perform such sensing tasks. Consequently, nearly all research literature for the target detection problem has focused on stationary sensor networks. This paper addresses the problem of detecting the presence/absence of a target using a mobile sensor network. An analytic method is proposed to model the detection latency based on a collaborative sensing architecture. Detection latency for different node mobilities are presented. The accuracy of the analytic model is verified by simulations. The paper also compares the performance of mobile and stationary sensor networks. The comparison shows if the target is present at the worst possible location in a given deployment, then detection latency of mobile sensor networks is considerably shorter as compared to that of stationary networks with the same number of nodes.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=777e1b0f3b6b43b9f86d75f341b70342&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=777e1b0f3b6b43b9f86d75f341b70342&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.189</guid>
		</item>
		<item>
			<title>PrePrint: Efficient Multidimensional Packet Classification with Fast Updates</title>
			<link>http://www.pheedo.com/click.phdo?i=b7f2972758425f6ab3728f109e5dc98d</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.181</pheedo:origLink>
			<description>Packet classification has continued to be an important research topic for high-speed routers in recent years. In this paper, we propose a new packet classification scheme based on the binary range and prefix searches. The basic data structure of the proposed packet classification scheme for multi-dimensional rule tables is a hierarchical list of sorted ranges and prefixes that allows the binary search to be performed on the list at each level to find the best matched rule. We also propose a set of heuristics to further improve the performance of the proposed algorithm. We test our schemes by using rule tables of various sizes generated by ClassBench and compare them with the existing schemes, EGT, EGT-PC, and HyperCuts. The performance results show that in a test using a two-dimensional segmentation table, the proposed scheme not only performs better than the EGT, EGT-PC, and HyperCuts in classification speed and memory usage, but also achieves faster table update operations that are not supported in the existing schemes.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=b7f2972758425f6ab3728f109e5dc98d&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=b7f2972758425f6ab3728f109e5dc98d&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.181</guid>
		</item>
		<item>
			<title>PrePrint: A Homogeneous Architecture for Power Policy Integration in Operating Systems</title>
			<link>http://www.pheedo.com/click.phdo?i=350bb445033700134eba61619c68074f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.180</pheedo:origLink>
			<description>A significant volume of research has concentrated on operating system-directed power management. The primary focus of previous research has been the development of better policies. In this paper, we provide evidence that one policy may outperform another under different conditions. Hence, it is difficult, or even impossible, to design the "best" policy for all computers. We explain how to select the best policies at run-time without user or administrator intervention by using a software framework called the Homogeneous Architecture for Power Policy Integration (HAPPI). This architecture is portable across different platforms running Linux. HAPPI specifies common requirements for policies and provides an interface to simplify the implementation of policies in a commodity OS. Our approach allows these policies to be compared simultaneously to select the best policy among a set of distinct policies at run-time. Experimental results indicate that HAPPI achieves energy savings within 4 percent of the best individual policy for each device in several computing systems without a priori knowledge of workloads.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=350bb445033700134eba61619c68074f&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=350bb445033700134eba61619c68074f&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.180</guid>
		</item>
		<item>
			<title>PrePrint: Error Correcting Codes for Ternary Content Addressable Memories</title>
			<link>http://www.pheedo.com/click.phdo?i=ab963a35ed9cc1cdc85276e6404eabd1</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.179</pheedo:origLink>
			<description>As VLSI silicon technology continues its relentless advance and memory densities increase, the problem of soft errors&amp;#x2014;bit upsets caused by alpha particles or neutron hits&amp;#x2014;demands solutions. Error Correcting Codes (ECCs) are routinely used on Random Access Memories (RAMs) to increase soft error tolerance&amp;#x2014;codewords (error correcting code bits concatenated to the data) are written to and read from memory, and the read codeword is decoded to correct errors. Content addressable memories (CAMs) also demand error mitigation measures. The method employed for RAMs is also applicable to CAMs: the match-line sense amplifier is modified to function as a comparator [1], codewords are stored and searched for. We investigate the extension of this method to TCAMs. Ternary CAMs (TCAMs) cannot employ the efficient ECCs (known as linear block codes&amp;#x2014;LBCs) used with RAMs and CAMs. We develop the error correcting codes necessary to implement error-resilient TCAMs.We prove that the rate (ratio of data bits to total number of bits in the codeword) of the specialized ECCs necessary for TCAMs cannot exceed 1/t, where t is the number of bit errors the code can correct (in contrast, LBCs asymptotically have rate one); simple majority codes are the best. Keywords: ECCs, error, correcting, codes, TCAMs, LBCs.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=ab963a35ed9cc1cdc85276e6404eabd1&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=ab963a35ed9cc1cdc85276e6404eabd1&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.179</guid>
		</item>
		<item>
			<title>PrePrint: Memory-MISER: Improving Main Memory Energy Efficiency in Servers</title>
			<link>http://www.pheedo.com/click.phdo?i=138f557380dbc10bd60c63709a248f22</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.177</pheedo:origLink>
			<description>Main memory power in volume and mid-range servers is growing as a fraction of total system power. The resulting energy consumption increases system cost and the heat produced reduces reliability. Emergent memory technology will provide systems with the ability to dynamically turn-on (online) and turn-off (offline) memory devices at runtime. This technology, coupled with slack in memory demand, offers the potential for significant energy savings in servers. However, to gain general acceptance in the server community, power-aware techniques must maintain performance and scale to thousands of memory devices. We propose a Memory Management Infra-Structure for Energy Reduction (Memory MISER) that is transparent, performance-neutral, and scalable. Memory MISER provides: 1) a prototype Linux kernel that manages memory at device granularity, and 2) a userspace daemon that tracks systemic memory demand and implements energy- and performance-constrained device controller policies. Experiments on an 8-node cluster of servers show our Memory MISER conserves memory energy up to 56.8% with no performance degradation for scientific codes that utilized the entire cluster. For multi-user workloads, we achieved memory energy savings of up to 67.94% with no performance degradation. Normalizing to total system energy consumption, our power-aware memory approach reduced energy between 18.81% and 39.02%.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=138f557380dbc10bd60c63709a248f22&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=138f557380dbc10bd60c63709a248f22&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=138f557380dbc10bd60c63709a248f22&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.177</guid>
		</item>
		<item>
			<title>PrePrint: Fair Round Robin: A Low Complexity Packet Schduler with Proportional and Worst-Case Fairness</title>
			<link>http://www.pheedo.com/click.phdo?i=a64af2e6825a085d7d807920b7226657</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.176</pheedo:origLink>
			<description>Round robin based packet schedulers generally have a low complexity and provide long-term fairness. The main limitation of such schemes is that they do not support short-term fairness. In this paper, we propose a new low complexity round robin scheduler, called Fair Round Robin (FRR), that overcomes this limitation. FRR has similar complexity and long-term fairness properties as the stratified round robin scheduler, a recently proposed scheme that arguably provides the best quality-of-service properties among all existing round robin based low complexity packet schedulers. FRR offers better short-term fairness than stratified round robin and other existing round robin schedulers.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=a64af2e6825a085d7d807920b7226657&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=a64af2e6825a085d7d807920b7226657&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.176</guid>
		</item>
		<item>
			<title>PrePrint: Generalized Elastic Scheduling for Real-Time Tasks</title>
			<link>http://www.pheedo.com/click.phdo?i=2d6ac4445f903834815dcd2742d84158</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.175</pheedo:origLink>
			<description>The elastic task model is a powerful model for adapting periodic real-time systems in the presence of uncertainty. This work generalizes the existing elastic scheduling approach in several directions. First, it presents a general framework, which formulates a trade-off between task schedulability and a specific performance metric as an optimization problem. Such a framework allows real-time systems under overloads to graciously adapt by adjusting their performance level. Second, it is shown in this work that the well-known task compression algorithm in fact solves a quadratic programming problem that seeks to minimize the sum of the squared deviation of a task's utilization from initial desired utilization. This finding indicates that the task compression algorithm may be applied to efficiently solve other similar types of problems that often arise in real-time applications. In particular, an iterative approach is proposed to solve the period selection problem for real-time tasks with deadlines less than respective periods. Further, the framework is adapted to solve the deadline selection problem, which is useful in some control systems with fixed periods.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=2d6ac4445f903834815dcd2742d84158&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=2d6ac4445f903834815dcd2742d84158&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.175</guid>
		</item>
		<item>
			<title>PrePrint: Reducing Area Overhead for Error-Protecting Large L2/L3 Caches</title>
			<link>http://www.pheedo.com/click.phdo?i=e46080483dcc7c84be402cab9f37f086</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.174</pheedo:origLink>
			<description>Due to increasing concern about various errors, current processors adopt error protection mechanisms for their on-chip components. Especially, protecting caches in current processors incur as much as 12.5% area overhead due to error correcting codes. Considering large L2/L3 caches employed in current high-performance processors, the area overhead is very high and consumes a large number of on-chip transistors. As an attempt to reduce that overhead, this paper proposes an area-efficient error protection architecture for large L2/L3 caches. First, it selectively applies ECC (Error Correcting Code) to only dirty cache lines and other clean cache lines are protected by using simple parity check codes. Second, the dirty cache lines are periodically cleaned by exploiting the generational behavior of cache lines in order not to increase traffic to off-chip main memory. Experimental results show that the cleaning technique effectively reduces the number of dirty cache lines per cycle. The ECCs of the reduced dirty cache lines can be confined in a small ECC array or ECC cache. Our proposed error-protection architecture has been shown to reduce the area overhead of a 1MB L2 cache for error protection by 59% with less than 1% performance degradation, on the average, using SPEC2000 benchmarks running on a typical four-issue superscalar processor.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=e46080483dcc7c84be402cab9f37f086&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=e46080483dcc7c84be402cab9f37f086&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.174</guid>
		</item>
		<item>
			<title>PrePrint: Correction to: Reduced Length Checking Sequences</title>
			<link>http://www.pheedo.com/click.phdo?i=ac6e9e8f074eaa0b5ac2e4a66f299824</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.173</pheedo:origLink>
			<description>This paper describes corrections to a previous paper, Reduced Length Checking Sequecnes, that appeared in IEEE Transactions on Computers in 2002 (51 9, pp.1111-1117).&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=ac6e9e8f074eaa0b5ac2e4a66f299824&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=ac6e9e8f074eaa0b5ac2e4a66f299824&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=ac6e9e8f074eaa0b5ac2e4a66f299824&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.173</guid>
		</item>
		<item>
			<title>PrePrint: On the Design of Fault-Tolerant Scheduling Strategies Using Primary-Backup Approach for Computational Grids with Low Replication Costs</title>
			<link>http://www.pheedo.com/click.phdo?i=d4d542305b046fd30b3c6e55c5063b64</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.172</pheedo:origLink>
			<description>Fault-tolerant scheduling is an imperative step for large-scale computational Grid systems, as often geographically distributed nodes co-operate to execute a task. By and large, primary-backup approach is a common methodology used for fault tolerance wherein each task has a primary copy and a backup copy on two different processors. In this paper, we identify two cases that may happen when scheduling dependent tasks with primary-backup approach. We derive two important constraints that must be satisfied. Further, we show that these two constraints play a crucial role in limiting the schedulability and overloading efficiency of backups of dependent tasks. We then propose two strategies to improve schedulability and overloading efficiency, respectively. We propose two algorithms (MRC-ECT and MCT-LRC), to schedule backups of independent jobs and dependent jobs, respectively. MRC-ECT is shown to guarantee an optimal backup schedule in terms of replication cost for an independent task, while MCT-LRC can schedule a backup of a dependent task with minimum completion time and less replication cost. We conduct extensive simulation experiments to quantify the performance of the proposed algorithms.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=d4d542305b046fd30b3c6e55c5063b64&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=d4d542305b046fd30b3c6e55c5063b64&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.172</guid>
		</item>
		<item>
			<title>PrePrint: On Spatial Orders and Location Codes</title>
			<link>http://www.pheedo.com/click.phdo?i=a71a9bd547f23d8f70bce166473850a6</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.171</pheedo:origLink>
			<description>Spatial orders such as the Morton (Z) order, Uorder, or X-order have applications in matrix manipulation, graphic rendering and data encryption. It is shown that these spatial orders are single examples of entire classes of spatial orders which can be defined in arbitrary numbers of dimensions and base values. Secondly, an algorithm is proposed which can be used to transform between these spatial orders and cartesian coordinates. It is shown that the efficiency of the algorithm improves with a larger base value. By choosing a base value that corresponds to the available memory page size, the computational effort required to perform operations such as matrix multiplication can be optimized.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=a71a9bd547f23d8f70bce166473850a6&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=a71a9bd547f23d8f70bce166473850a6&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.171</guid>
		</item>
		<item>
			<title>PrePrint: DIA: A Complexity-Effective Decoding Architecture</title>
			<link>http://www.pheedo.com/click.phdo?i=3fb4f5d02c4bba2440be4edc4939c67b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.170</pheedo:origLink>
			<description>Fast instruction decoding is a true challenge for the design of CISC microprocessors implementing variable length instructions. A well-known solution to overcome this problem is caching decoded instructions in a hardware buffer. Fetching already decoded instructions avoids the need for decoding them again, improving processor performance. However, introducing such special-purpose storage in the processor design involves an important increase in the fetch architecture complexity. In this paper, we propose a novel decoding architecture that reduces the fetch engine implementation cost. Instead of using a special-purpose hardware buffer, our proposal stores frequently decoded instructions in the memory hierarchy. The address where the decoded instructions are stored is kept in the branch prediction mechanism, enabling it to guide our decoding architecture. This makes it possible for the processor front-end to fetch already decoded instructions from memory instead of the original non-decoded instructions. Our results show that, using our decoding architecture, a state-of-the-art superscalar processor achieves competitive performance improvements, while requiring less chip area and energy consumption in the fetch architecture than a hardware code caching mechanism.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=3fb4f5d02c4bba2440be4edc4939c67b&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=3fb4f5d02c4bba2440be4edc4939c67b&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.170</guid>
		</item>
		<item>
			<title>PrePrint: Testing of SOCs with Hierarchical Cores: Common Fallacies, Test-Access Optimization, and Test Scheduling</title>
			<link>http://www.pheedo.com/click.phdo?i=9e18ce66d8da0fb8715cc1bcf6161366</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.169</pheedo:origLink>
			<description>Many system-on-chip (SOC) integrated circuits today contain hierarchical (parent) cores that have multiple levels of design hierarchy involving "child cores". Hierarchy imposes a number of constraints on the manner in which tests must be applied to parent cores and their child cores. However, most prior work on wrapper design, test access mechanism (TAM) optimization, and test scheduling are hierarchy-oblivious, i.e., these techniques treat all cores in an SOC at the same level of hierarchy. We first show that wrappers, TAMs and test schedules designed for non-hierarchical SOCs are not valid for SOCs with hierarchical cores. We next present two approaches for the efficient testing of SOC with hierarchical cores. In the first approach, an existing wrapper design is modified such that that all constraints imposed by the hierarchy are satisfied and full flexibility is provided for TAM optimization and test scheduling. The second approach is based on a hierarchy-aware wrapper architecture for parent cores that operates in two disjoint modes for the testing of parent and child cores. We show how an existing test-architecture design algorithm can be adapted for use with these two methods. Results for the ITC'02 SOC Test Benchmarks show that the first approach offers lower test application times while the second approach requires less area overhead.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=9e18ce66d8da0fb8715cc1bcf6161366&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=9e18ce66d8da0fb8715cc1bcf6161366&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=9e18ce66d8da0fb8715cc1bcf6161366&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.169</guid>
		</item>
		<item>
			<title>PrePrint: Distributed Loop Controller for Multi-threading in Uni-threaded ILP Architectures</title>
			<link>http://www.pheedo.com/click.phdo?i=5ba4d06734bf03380777b76496314ae3</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.168</pheedo:origLink>
			<description>Reduced energy consumption is one of the most important design goals for embedded application domains like wireless communication, multimedia and biomedical applications. The instruction memory hierarchy has been proven to be one of the most power hungry parts of the system. This paper introduces an architectural enhancement for the instruction memory to reduce energy consumption and improve performance. The proposed distributed instruction memory organization requires minimal hardware overhead and supports the execution of multiple incompatible loops in parallel in a uni-processor system. We present different methods to implement the loop controller architecture, compare them, and show that distributing the instruction memory helps to reduce the interconnect cost as well. This architecture enhancement can reduce the energy consumed in the instruction memory hierarchy by 59% and improve the performance by 22% compared to hardware based enhanced SMT based architectures.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=5ba4d06734bf03380777b76496314ae3&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=5ba4d06734bf03380777b76496314ae3&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.168</guid>
		</item>
		<item>
			<title>PrePrint: A Response Time Bound in Fixed-Priority Scheduling with Arbitrary Deadlines</title>
			<link>http://www.pheedo.com/click.phdo?i=cc2a4ea03491f7d42621058da2f33086</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.167</pheedo:origLink>
			<description>Since worst-case response times must be determined repeatedly during the interactive design of real-time application systems, repeated exact computation of such response times would slow down the design process considerably. In this research, we identify three desirable properties of estimates of the exact response times: continuity with respect to system parameters; efficient computability; and approximability. We derive a technique possessing these properties for estimating the worst-case response time of sporadic task systems that are scheduled using fixed priorities upon a preemptive uniprocessor.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=cc2a4ea03491f7d42621058da2f33086&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=cc2a4ea03491f7d42621058da2f33086&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.167</guid>
		</item>
		<item>
			<title>PrePrint: A New Hierarchical Data Cache Architecture for iSCSI Storage Server</title>
			<link>http://www.pheedo.com/click.phdo?i=6056cf60ef3299b4ff6ce4fe63ca9398</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.166</pheedo:origLink>
			<description>With the emergence of data intensive applications, recent years have seen a fast growing volume of I/O traffic propagated through the local I/O interconnect bus. In this paper, we present a hierarchical Data Cache Architecture called DCA to effectively slash local interconnect traffic and thus boost the storage server performance. A popular iSCSI storage server architecture is chosen as an example. DCA is composed of a read cache in NIC card called NIC cache and a read/write unified cache in host memory called Helper cache. NIC cache services most portions of read requests without fetching data via PCI bus, while Helper cache 1) supplies some portions of read requests per partial NIC cache hit; 2) directs cache placement for NIC cache and 3) absorbs most transient writes locally. We develop a novel State Locality Aware cache Placement algorithm called SLAP to improve NIC cache hit ratio for mixed read and write workloads. To demonstrate the effectiveness of DCA, we develop a DCA prototype system and evaluate it with an open source iSCSI implementation under representative storage server workloads. Experimental results showed that DCA can boost iSCSI storage server throughput by up to 121% and reduce the PCI traffic by up to 74% compared with an iSCSI target without DCA.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=6056cf60ef3299b4ff6ce4fe63ca9398&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=6056cf60ef3299b4ff6ce4fe63ca9398&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.166</guid>
		</item>
		<item>
			<title>PrePrint: A Recursive Paradigm To Solve Boolean Relations</title>
			<link>http://www.pheedo.com/click.phdo?i=ca7382164e9af9adba966cf842933135</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.165</pheedo:origLink>
			<description>A Boolean relation can specify some types of flexibility of a combinational circuit that cannot be expressed with don't cares. Several problems in logic synthesis, such as Boolean decomposition or multi-level minimization, can be modeled with Boolean relations. However, solving Boolean relations is a computationally expensive task. This paper presents a novel recursive algorithm for solving Boolean relations. The algorithm has several features: efficiency, wide exploration of solutions and customizable cost function. The experimental results show the applicability of the method in logic minimization problems and tangible improvements with regard to previous heuristic approaches.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=ca7382164e9af9adba966cf842933135&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=ca7382164e9af9adba966cf842933135&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=ca7382164e9af9adba966cf842933135&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.165</guid>
		</item>
		<item>
			<title>PrePrint: A Highly Accurate Method for Assessing Reliability of Redundant Arrays of Inexpensive Disks (RAID)</title>
			<link>http://www.pheedo.com/click.phdo?i=e2f890f8272c43c289b1624ead21752e</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.163</pheedo:origLink>
			<description>Abstract - The statistical bases for current models of RAID reliability are reviewed and a highly accurate alternative is provided and justified. This new model corrects statistical errors associated with the pervasive assumption that system (RAID group) times to failure follow a homogeneous Poisson process, and corrects errors associated with assuming the time-to-failure and time-to-restore distributions are exponentially distributed. Statistical justification for the new model uses theory for reliability of repairable systems. Four critical component distributions are developed from field data. These distributions are for times to catastrophic failure, reconstruction and restoration, read errors, and disk data scrubs. Model results have been verified and predict between 2 to 1,500 times as many double disk failures as estimates made using the mean time to data loss method. Model results are compared to system level field data for RAID group of 14 drives and show excellent correlation and greater accuracy than either MTTDL.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=e2f890f8272c43c289b1624ead21752e&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=e2f890f8272c43c289b1624ead21752e&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.163</guid>
		</item>
		<item>
			<title>PrePrint: Coordinated En-Route Web Caching in Multi-Server Networks</title>
			<link>http://www.pheedo.com/click.phdo?i=ca7601add34d8f6180fb428375bfb05f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.162</pheedo:origLink>
			<description>With the emergence of various advanced networks that comprise a group of geographically distributed servers, such as Content Delivery Networks (CDNs) and Peer-to-Peer (P2P) systems, coordinated en-route web caching in multi-server networks becomes increasingly attractive but remains of great challenge as solutions for single-server networks become invalid here. In this paper, we first establish mathematical formulation for this problem that takes into account all requests (to any server) that pass through the intermediate nodes on a response path and caches the requested object optimally among these nodes so that system's total benefit is maximized. Then we derive efficient dynamic programming based methods for finding optimal solutions to the problem for the unconstrained case and two QoS-constrained cases respectively. For each case, we present a caching scheme to illustrate application of the corresponding method. Finally we evaluate the proposed schemes on different performance metrics through extensive simulation experiments. The experiment results show that our proposed schemes can yield a steady performance improvement and achieve desired QoS in a multi-server network. To the best of our knowledge, solutions presented in this paper are the first for the problem of coordinated en-route web caching in multi-server networks&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=ca7601add34d8f6180fb428375bfb05f&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=ca7601add34d8f6180fb428375bfb05f&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.162</guid>
		</item>
		<item>
			<title>PrePrint: An Enhanced Universal NxN Fully Non-Blocking Quantum Switch</title>
			<link>http://www.pheedo.com/click.phdo?i=71ce644a5a719fd93ab3efde17bdcfe8</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.161</pheedo:origLink>
			<description>This study develops a quantum switching device with fully non-blocking properties. Although previous studies have also presented quantum-based solutions for the blocking problem, the proposed schemes are characterized by an increased packet loss, a large number of quantum SWAP gates and an increased propagation delay time complexity. The current study overcomes these drawbacks by designing an NxN fully non-blocking quantum switch, in which the packet payload is passed through quantum SWAP gates while the packet header is passed through quantum control gates designed by applying a modified quantum Karnaugh mapping method. The allocation of quantum SWAP gates to the different layers within the switch is solved using a Perfect Matching in Complete Graph (PMiCG) algorithm. A symmetry-based heuristic method is proposed to reduce the time complexity of the search process for all the perfect matching pairs to a time complexity of O(N^2). The performance of the proposed quantum switch is compared with that of a quantum self-routing packet switch and a quantum switching / quantum merge sorting scheme, respectively, in terms of the hardware complexity, the propagation delay time complexity, the auxiliary qubit complexity and the packet loss probability.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=71ce644a5a719fd93ab3efde17bdcfe8&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=71ce644a5a719fd93ab3efde17bdcfe8&quot;/&gt;&lt;/a&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=71ce644a5a719fd93ab3efde17bdcfe8&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.161</guid>
		</item>
		<item>
			<title>PrePrint: Many-to-Many Disjoint Path Covers in the Presence of Faulty Elements</title>
			<link>http://www.pheedo.com/click.phdo?i=57fb42a9b7f64703c54bb2722464b026</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.160</pheedo:origLink>
			<description>A many-to-many k-disjoint path cover (k-DPC) of a graph G is a set of k disjoint paths joining k sources and k sinks in which each vertex of G is covered by a path. It is called a paired many-to-many disjoint path cover when each source should be joined to a specific sink, and it is called an unpaired many-to-many disjoint path cover when each source can be joined to an arbitrary sink. In this paper, we discuss about paired and unpaired many-to-many disjoint path covers including their relationships, application to strong hamiltonicity, and necessary conditions. And then, we give a construction scheme for paired many-to-many disjoint path covers in the graph H_0 + H_1 obtained from connecting two graphs H_0 and H_1 with |V(H_0)|=|V(H_1)| by |V(H_0)| pairwise nonadjacent edges joining vertices in H_0 and vertices in H_1, where H_0 = G_0 + G_1 and H_1 = G_2 + G_3 for some graphs G_j's. Using the construction, we show that every m-dimensional restricted HL-graph and recursive circulant G(2^m, 4) with f or less faulty elements have a paired k-DPC for any f and k &amp;#x2265; 2 with f+2k &amp;#x2264; m.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=57fb42a9b7f64703c54bb2722464b026&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=57fb42a9b7f64703c54bb2722464b026&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.160</guid>
		</item>
		<item>
			<title>PrePrint: Using Node Diagnosability to Determine t-diagnosability under the Comparison Diagnosis Model</title>
			<link>http://www.pheedo.com/click.phdo?i=d20b2d2ad7da8d502884d8351cd2d4de</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.158</pheedo:origLink>
			<description>Diagnosis is an essential subject for the reliability of a multiprocessor system. In this paper, we present a novel idea on system diagnosis called node diagnosability. The node diagnosability can be viewed as a local strategy toward system diagnosability. There is a strong relationship between the node diagnosability and the traditional diagnosability. For this local sense, we focus more on a single processor, and require only identifying the status of this particular processor correctly. Under the comparison diagnosis model, we propose a sufficient condition to determine the node diagnosability of a given processor. Furthermore, we propose an useful local structure called an extended star at a given processor to guarantee its node diagnosability, and provide an efficient algorithm to determine the faulty or fault-free status of each processor based on this structure. For a multiprocessor system with total number of processors $N$, the time complexity of our algorithm to diagnose a given processor is $O(\log N)$ and to diagnose all the faulty processors is $O(N\log N)$ under the comparison model, provided that there is an extended star structure at each processor, and that the time for looking up the testing result of a comparator in the syndrome table is constant.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=d20b2d2ad7da8d502884d8351cd2d4de&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=d20b2d2ad7da8d502884d8351cd2d4de&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.158</guid>
		</item>
	</channel>
</rss>