<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet href="/css/rss20.xsl" type="text/xsl"?>
<rss version="2.0" xmlns:pheedo="http://www.pheedo.com/namespace/pheedo">
	<channel>
		<title>IEEE Transactions on Computers</title>
		<link>http://www.computer.org/tc</link>
		<description>The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers, brief contributions, and comments on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability;
g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.	</description>
		<language>en-us</language>
		<pubDate>Sat, 13 Mar 2010 11:00:01 GMT</pubDate>
		<image>
			<url>http://csdl.computer.org/common/images/logos/tc.gif</url>
			<title>IEEE Computer Society</title>
			<description>List of recently published journal articles</description>
			<link>http://www.computer.org/tc</link>
		</image>
		<item>
			<title>PrePrint: Architectural Leakage Power Minimization of Scratchpad Memories by Application-Driven Sub-Banking</title>
			<link>http://www.pheedcontent.com/click.phdo?i=679a30c7d5470d87cd549ff72bb65e1a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.43</pheedo:origLink>
			<description>Partitioning a memory into multiple blocks that can be independently accessed is a widely used technique to reduce its dynamic power. For embedded systems, its benefits can be even pushed further by properly matching the partition to the memory access patterns. When leakage energy comes into play, however, idle memory blocks must be put into a proper low-leakage sleep state to actually save energy when not accessed. In this case, the matching becomes an instance of the power management problem, because moving to and from this sleep state requires additional energy. In this work, we propose an effective solution to the problem of the leakage-aware partitioning of a memory into disjoint sub-blocks; In particular, we target scratchpad memories, which are commonly used in some embedded systems as a replacement for caches. We show that, although the solution space is extremely large and non-convex, it is possible to prove a non-trivial property that reduces the number of partition boundaries to be enumerated, therefore making exhaustive exploration feasible. Experiments on a different sets of embedded applications have shown that total energy savings larger than 60% on average can be obtained, with a marginal overhead in execution time, thanks to an effective implementation of the low-leakage sleep state.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=679a30c7d5470d87cd549ff72bb65e1a&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=679a30c7d5470d87cd549ff72bb65e1a&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.43</guid>
		</item>
		<item>
			<title>PrePrint: A Low Hardware Overhead Self-Diagnosis Technique Using Reed-Solomon Codes for Self-Repairing Chips</title>
			<link>http://www.pheedcontent.com/click.phdo?i=fb3fe6380cb8ab1ba7d7e93aa686f902</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.182</pheedo:origLink>
			<description>A self-diagnosis circuit that can be used for built-in self-repair is proposed. The circuit under diagnosis is assumed to be comprised of a large number of field repairable units (FRUs), which can be replaced with spares when they are found to be defective. Responses that are scanned out of scan chains are compressed by the group compactor, the space compression circuit and finally the time compression circuit. Both the space and the time compression circuit implement a Reed-Solomon code. Unlike prior work, in the proposed technique, responses of all FRUs are observed at the same time to reduce diagnosis time. We propose a novel space-compression circuit that reduces hardware overhead by exploiting the frequency difference of the scan shift clock and the system clock and by combining scan cells into groups of size r. When the size of constituent multiple-input signature-register (MISR) is m, the total number of signatures to be stored for the fault-free signature is 2lmB bits, where 1&amp;#x2264; B &amp;#x2264; m. The experimental results show that the proposed diagnosis circuit that can locate up to 4 defective FRUs in the same test session can be implemented with less than 1 % of hardware overhead for a large industrial design.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=fb3fe6380cb8ab1ba7d7e93aa686f902&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=fb3fe6380cb8ab1ba7d7e93aa686f902&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.182</guid>
		</item>
		<item>
			<title>PrePrint: Minimizing Eavesdropping Risk by Transmission Power Control in Multihop Wireless Networks</title>
			<link>http://www.pheedo.com/click.phdo?i=8c57ffbe2d336cd19d65e9d9f1b0d125</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2007.1066</pheedo:origLink>
			<description>To defend against reconnaissance activity in adhoc wireless networks, we propose transmission power control as an effective mechanism for minimizing the eavesdropping risk. Our main contributions are as follows: First, we cast the w-th order eavesdropping risk as the maximum probability of packets being eavesdropped when there are w adversarial nodes in the network. Second, we derive the closed-form solution of the first order eavesdropping risk as a polynomial function of the normalized transmission radius. This derivation assumes a uniform distribution of user nodes. Then we generalize the model to allow arbitrary user nodes distribution and prove that the uniform user distribution minimizes the first order eavesdropping risk. This result plays an essential role in deriving analytical bounds for the eavesdropping risk given arbitrary user distributions. Our simulation results show that for a wide range of non-uniform traffic patterns, the difference of their eavesdropping risk values from the corresponding lower bounds is 3dB or less.&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=8c57ffbe2d336cd19d65e9d9f1b0d125&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=8c57ffbe2d336cd19d65e9d9f1b0d125&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2007.1066</guid>
		</item>
		<item>
			<title>PrePrint: A Counter Architecture for Online DVFS Profitability Estimation</title>
			<link>http://www.pheedcontent.com/click.phdo?i=e2d7ced57f807720e64da0a1bf66b9a7</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.65</pheedo:origLink>
			<description>Dynamic voltage and frequency scaling (DVFS) is a well-known and effective technique for reducing power consumption in modern microprocessors. An important concern though when applying DVFS for online energy and power optimizations is to estimate its profitability in terms of performance and energy. Current DVFS profitability estimation approaches however lack accuracy or incur runtime performance and/or energy overhead. This paper proposes a counter architecture for online DVFS profitability estimation on superscalar out-of-order processors. The counter architecture teases apart the fraction of the execution time that is susceptible to clock frequency versus the fraction that is insusceptible to clock frequency. By doing so, the counter architecture can accurately estimate the performance and energy consumption at different V/f operating points from a single program execution. The DVFS counter architecture estimates performance, energy consumption and energy-delay-squared product (ED2P) within 0.2%, 0.5% and 0.8% on average, respectively, over a 4x frequency range. Further, the counter architecture incurs a limited hardware cost and is an enabler for online DVFS scheduling both at the intra-core as well as at the inter-core level in a multi-core processor.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=e2d7ced57f807720e64da0a1bf66b9a7&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=e2d7ced57f807720e64da0a1bf66b9a7&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.65</guid>
		</item>
		<item>
			<title>PrePrint: Reconfigurable Hardware Implementations of Tweakable Enciphering Schemes</title>
			<link>http://www.pheedcontent.com/click.phdo?i=95f875d38c5bd92a2779baba435044c2</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.64</pheedo:origLink>
			<description>Tweakable enciphering schemes are length preserving block cipher modes of operation that provide a strong pseudo-random permutation. It has been suggested that these schemes can be used as the main building blocks for achieving in-place disk encryption. In this paper we present optimized FPGA implementations of six tweakable enciphering schemes, namely, HCH, HCTR, XCB, EME, HEH and TET, using a 128-bit AES core as the underlying block cipher. We report performance timings of these modes when using both, pipelined and sequential AES structures. The universal polynomial hash function included in the specification of HCH, HCHfp (a variant of HCH), HCTR, XCB, TET and HEH was implemented using a Karatsuba-Ofman multiplier as the main building block. According to our place-and-route results on a Xilinx Virtex IV FPGA, our designs achieve a a throughput of 3.95 Gbps for HEH when using an encryption/decryption pipelined AES core, and a throughput of 5.71 Gbps for EME when using a encryption-only pipeline AES core. The performance results reported in this paper provide experimental evidence that hardware implementations of tweakable enciphering schemes can actually match and even outperform the data rates achieved by state-of-the-art disk controllers, thus showing that they might be used for achieving provably secure in-place hard disk encryption.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=95f875d38c5bd92a2779baba435044c2&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=95f875d38c5bd92a2779baba435044c2&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.64</guid>
		</item>
		<item>
			<title>PrePrint: Hydra: A Block-Mapped Parallel Flash Memory Solid-State Disk Architecture</title>
			<link>http://www.pheedcontent.com/click.phdo?i=57987c73c3a2042e48adbfbaa488ebe7</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.63</pheedo:origLink>
			<description>Flash memory solid state disks (SSDs) are gaining popularity and replacing hard disk drives (HDDs) in mobile computing systems, such as ultra mobile PCs (UMPCs) and notebook PCs, because of their lower power consumption, faster random access, and higher shock resistance. In this paper, we describe a high-performance flash memory SSD architecture called Hydra that efficiently and effectively translates the inherent parallelism present in multiple flash memory chips into improved storage system performance. The Hydra SSD architecture uses an extensive bus-level and chip-level interleaving to exploit parallelism in multiple flash memory chips. It also has a prioritized structure consisting of a single foreground unit and multiple background units, each capable of executing a sequence of high-level flash memory operations without any software intervention. The foreground unit that has a priority over the background units is used to expedite the processing of read requests. Finally, the Hydra SSD architecture employs an aggressive write buffering mechanism not only for expediting the processing of write requests but also, more importantly, for effectively utilizing multiple flash memory chips. Performance evaluation based on an FPGA implementation of the Hydra SSD architecture shows that its performance is more than 80% better than the best of the HDDs and SSDs we evaluated.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=57987c73c3a2042e48adbfbaa488ebe7&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=57987c73c3a2042e48adbfbaa488ebe7&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.63</guid>
		</item>
		<item>
			<title>PrePrint: On the Problem of Evaluating the Performance of Multiprogrammed Workloads</title>
			<link>http://www.pheedcontent.com/click.phdo?i=12c510a6deb73cbb1f1e83d32ea7ca84</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.62</pheedo:origLink>
			<description>Multithreaded architectures are becoming more and more popular. In order to evaluate their behavior, several methodologies and metrics have been proposed. A methodology defines when the measurements of a given workload execution are taken. A metric combines those measurements to obtain a final evaluation result. However, since current evaluation methodologies do not provide representative measurements for these metrics, the analysis and evaluation of novel ideas could be either unfair or misleading. Given the potential impact of multithreaded architectures on current and future processor designs, it is crucial to develop an accurate evaluation methodology for them. This paper presents FAME, a new evaluation methodology aimed to fairly measure the performance of multithreaded processors executing multiprogrammed workloads. FAME reexecutes all programs in the workload until all of them are fairly represented in the final measurements taken. We compare FAME with previously used methodologies showing that it provides more accurate measurements, becoming an ideal evaluation methodology to analyze proposals for multithreaded architectures.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=12c510a6deb73cbb1f1e83d32ea7ca84&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=12c510a6deb73cbb1f1e83d32ea7ca84&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.62</guid>
		</item>
		<item>
			<title>PrePrint: Transparent Fault Tolerance of Device Drivers for Virtual Machines</title>
			<link>http://www.pheedcontent.com/click.phdo?i=f170dccd2866b7704c8a85e96fd7a928</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.61</pheedo:origLink>
			<description>In a consolidated server system using virtualization, physical device accesses from guest virtual machines (VMs) need to be coordinated. In this environment, a separate driver VM is usually assigned to this task to enhance reliability and to reuse existing device drivers. This driver VM needs to be highly reliable, since it handles all the I/O requests. This paper describes a mechanism to detect and recover the driver VM from faults to enhance the reliability of the whole system. The proposed mechanism is transparent in that guest VMs cannot recognize the fault and the driver VM can recover and continue its I/O operations. Our mechanism provides a progress monitoring-based fault detection that is isolated from fault contamination with low monitoring overhead. When a fault occurs, the system recovers by switching the faulted driver VM to another one. The recovery is performed without service disconnection or data loss and with negligible delay by fully exploiting the I/O structure of the virtualized system.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=f170dccd2866b7704c8a85e96fd7a928&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=f170dccd2866b7704c8a85e96fd7a928&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.61</guid>
		</item>
		<item>
			<title>PrePrint: Instruction-Level Impact Analysis of Low-Level Faults in a Modern Microprocessor Controller</title>
			<link>http://www.pheedcontent.com/click.phdo?i=02dbe0339e4c86e8b4caf948d3a58de0</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.60</pheedo:origLink>
			<description>We investigate the correlation between low-level faults in the control logic of a modern microprocessor and their instruction-level impact on the execution of typical workload. Such information can prove immensely useful in accurately assessing and prioritizing faults with regards to their criticality, as well as commensurately allocating resources to enhance on-line testability and error/fault resilience through concurrent error detection/correction methods. To this end, we developed an extensive fault simulation infrastructure which allows injection of stuck-at faults and transient errors of arbitrary starting time and duration, as well as cost-effective simulation and classification of their repercussions into various instruction-level error types. As a test vehicle for our study, we employ a superscalar, dynamically-scheduled, out-of-order, Alpha-like microprocessor, on which we execute SPEC2000 integer benchmarks. Extensive fault injection campaigns in control modules of this microprocessor facilitate valuable observations regarding the distribution of low-level faults into the instruction-level error types that they cause. Furthermore, experimentation with both Register Transfer (RT-) and Gate-Level faults, as well as with both stuck-at faults and transient errors, confirms the validity and corroborates the utility of these observations.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=02dbe0339e4c86e8b4caf948d3a58de0&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=02dbe0339e4c86e8b4caf948d3a58de0&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.60</guid>
		</item>
		<item>
			<title>PrePrint: Accurate Determination of Loop Iterations for Worst-Case Execution Time Analysis</title>
			<link>http://www.pheedcontent.com/click.phdo?i=ad338ab6ff066b9d44ff6dc2e8e9d2f8</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.59</pheedo:origLink>
			<description>Determination of accurate estimates for the Worst-Case Execution Time of a program is essential for guaranteeing the correct temporal behaviour of any Real-Time System. Of particular importance is tightly bounding the number of iterations of loops in the program or excessive undue pessimism can result. This paper presents a novel approach to determining the number of iterations of a loop for such analysis. Program traces are collected and analysed allowing the number of loop executions to be parametrically determined safely and precisely under certain conditions. The approach is mathematically proven to be safe and its practicality is demonstrated on a series of benchmarks.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=ad338ab6ff066b9d44ff6dc2e8e9d2f8&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=ad338ab6ff066b9d44ff6dc2e8e9d2f8&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.59</guid>
		</item>
		<item>
			<title>PrePrint: Conditional Partial Order Graphs: Model, Synthesis and Application</title>
			<link>http://www.pheedcontent.com/click.phdo?i=ccf87be13cdb4de16ae1658462c57ce2</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.58</pheedo:origLink>
			<description>The paper introduces a new formal model for specification and synthesis of control paths in the context of asynchronous system design. The model, called Conditional Partial Order Graph (CPOG), captures concurrency and choice in a system&amp;#8217;s behaviour in a compact and efficient way. It has advantages over widely used interpreted Petri Nets and Finite State Machines for a class of systems which have many behavioural scenarios defined on the same set of actions, e.g. CPU microcontrollers. The CPOG model has potential applications in the area of microcontrol synthesis and brings new methods for modelling concurrency into the application domain of modern and future processor architectures. The paper gives the formal definition of the CPOG model, formulates and solves the problem of CPOG synthesis, and introduces various optimisation techniques. The presented ideas can be applied for CPU control synthesis as well as for synthesis of different kinds of event-coordination circuits often used in data coding and communication in digital systems, as demonstrated with several application examples.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=ccf87be13cdb4de16ae1658462c57ce2&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=ccf87be13cdb4de16ae1658462c57ce2&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.58</guid>
		</item>
		<item>
			<title>IEEE Transactions on Computers - April 2010 (Vol. 59, No. 4)</title>
			<link>http://www.pheedo.com/click.phdo?i=b9b0cbeb664e1cc2fdb4b8c7691d9398</link>
			<pheedo:origLink>http://opac.ieeecomputersociety.org/opac?year=2010&amp;amp;volume=59&amp;amp;issue=04&amp;amp;acronym=tc</pheedo:origLink>
			<description>IEEE Transactions on Computers&lt;br style=&quot;clear: both;&quot;/&gt;
&lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=b9b0cbeb664e1cc2fdb4b8c7691d9398&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=b9b0cbeb664e1cc2fdb4b8c7691d9398&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;
</description>
			<guid isPermaLink="false">http://www.computer.org/portal/site/tc/</guid>
		</item>
		<item>
			<title>PrePrint: Multicasting in Quantum Switching Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=7441c892a02b27dea862771337bc3c10</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.52</pheedo:origLink>
			<description>In this paper, we present a quantum multicasting network, called a generalized quantum connector (n-GQC) which can be used to multicast quantum information from n inputs to n outputs. This network is recursively constructed using n/2-GQCs and consists of O(n log^2 n) quantum gates. The key component of an n-GQC is another network, called an n-quantum concentrator (n-QC). This concentrator is also an n&#215;n quantum network, and can route arbitrary quantum states on any m of its inputs to its top m outputs, for any m, 1 &amp;#x2264; m &amp;#x2264; n. Its quantum gate complexity is O(n log n). The quantum gate-level depths of n-QC and n-GQC are O(log^2 n) and O(log^3 n), respectively. Both n-QC and n-GQC are based on the classical self-routing concentrators and generalized connection networks given by Lee and Oru&amp;#x00E7;[1]. While these networks work for multicasting classical packets, they cannot be used to multicast quantum packets as they employ balancer switches with both forward and backward propagations of packets. We introduce a quantum balancer switch that works using a forward propagation of packets only, thereby facilitating the n-QC an n-GQC designs presented in the paper.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=7441c892a02b27dea862771337bc3c10&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=7441c892a02b27dea862771337bc3c10&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.52</guid>
		</item>
		<item>
			<title>PrePrint: FPGA Designs with Optimized Logarithmic Arithmetic</title>
			<link>http://www.pheedcontent.com/click.phdo?i=1e3071a92877fec848f85f23d64f07ce</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.51</pheedo:origLink>
			<description>This paper proposes optimization of Logarithmic Number System (LNS) arithmetic units on FPGAs. First, we introduce a general polynomial approximation approach for LNS arithmetic function evaluation. Second, we develop a library generator that produces LNS arithmetic libraries containing +,-,*,/ operators as well as convertors between floating-point and LNS numbers. The generated libraries provide a simple-to-use address encoding mechanism for coefficient tables, on-chip block memory sharing to save coefficient storage, and bit-width minimization based on affine arithmetic. Third, to facilitate comparison between LNS and floating-point designs, we develop a two-level area cost estimation tool: the first level uses area models to acquire results in less than a second, with an average error around 5%; the second level performs an automated mapping of different designs onto FPGAs to acquire exact results. To evaluate accuracy, we develop a bit-accurate simulator based on truncation or rounding of 80-bit floating-point calculations. When compared with existing LNS designs, our generated units provide in most cases 6% to 37% reduction in area and 20% to 50% reduction in latency. We demonstrate our comparison tools on realistic applications. Our tool infrastructure enables us to fast prototype LNS designs, and effectively study LNS and its tradeoffs in size, accuracy and throughput when compared with floating-point designs.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=1e3071a92877fec848f85f23d64f07ce&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=1e3071a92877fec848f85f23d64f07ce&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.51</guid>
		</item>
		<item>
			<title>PrePrint: Complexity of Data Collection, Aggregation, and Selection for Wireless Sensor Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=4cc18eb8ffe1cfb95a939c9f97c265b3</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.50</pheedo:origLink>
			<description>Processing the gathered information efficiently is a key functionality for wireless sensor networks. In this paper, we study the time complexity, message complexity, and energy cost complexity of various processing operations for a multi-hop wireless sensor network of n nodes. For most of the operations studied in this paper, we first present a lower-bound on the complexity for the optimal methods, and then we provide an (asymptotically matching) upper-bound on the complexity by presenting efficient distributed algorithms to solve these problems.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=4cc18eb8ffe1cfb95a939c9f97c265b3&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=4cc18eb8ffe1cfb95a939c9f97c265b3&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.50</guid>
		</item>
		<item>
			<title>PrePrint: A Graphics Parallel Memory Organization Exploiting Request Correlations</title>
			<link>http://www.pheedcontent.com/click.phdo?i=15e84de1e452b43635de60fcae148a5b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.48</pheedo:origLink>
			<description>Real-time graphics applications require memory organizations featuring parallel pixel access and low cost implementation. This work bases on a non-linear skew mapping scheme and exploits the correlation between consecutive requests for pixels to design an efficient parallel memory organization. The mapping achieves parallel access, of mn pixels in various shapes, to the memory organized with mn banks. The proposed design technique combines the mapping properties and the spatial correlations among pixel requests to eliminate conflicts by spending at most one extra cycle every mn consecutive parallel pixel accesses. Consequently, the technique ensures that any pixel pattern -among these commonly used in graphics- can be accessed in a single cycle from any image location. The address computations become straightforward as the numbers of the requested pixels and the banks -apart from equal- can be powers of 2.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=15e84de1e452b43635de60fcae148a5b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=15e84de1e452b43635de60fcae148a5b&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.48</guid>
		</item>
		<item>
			<title>PrePrint: ProgressFace: An Algorithm to Improve Routing Efficiency of GPSR-Like Routing Protocols in Wireless Ad Hoc Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=239c37fbc25e98f7a44c35ca6cc2584e</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.47</pheedo:origLink>
			<description>In GPSR-like routing, such as GPSR, GFG, GOAFR+, and GPVFR, perimeter forwarding is used to recover from a greedy forwarding failure by routing the packet to a progress node along the face boundary. The problem of perimeter forwarding is that many hops may be taken if the packet is forwarded in the wrong direction. We propose an algorithm, termed ProgressFace, that uses an additional traversal step to decide the direction of perimeter forwarding. A concave node sends a short packet to traverse the face boundary to identify the progress set, which consists of at most four nodes, such that, for any destination, at least one progress node is in the progress set or the neighbor set. Additionally, the hop distances of the nodes in the progress set along both directions are evaluated so that the shorter one is identified. The following packets encountering the concave node each are then sent along the corresponding direction toward the progress node in the progress set or the neighbor set. Simulations show that GPSR, GFG, GOAFR+, and GPVFR each conduct a shorter routing path, if augmented with the ProgressFace algorithm.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=239c37fbc25e98f7a44c35ca6cc2584e&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=239c37fbc25e98f7a44c35ca6cc2584e&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.47</guid>
		</item>
		<item>
			<title>PrePrint: Comments on "A New Architecture for a Parallel Finite Field Multiplier with Low Complexity Based on Composite Field"</title>
			<link>http://www.pheedcontent.com/click.phdo?i=a06c90905255b5e023bed6a5cb102a4b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.46</pheedo:origLink>
			<description>In this comment, we show that the gate complexity of Karatsuba polynomial multipliers presented in the above paper can be reduced by exploring the common sub-expressions.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=a06c90905255b5e023bed6a5cb102a4b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=a06c90905255b5e023bed6a5cb102a4b&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.46</guid>
		</item>
		<item>
			<title>PrePrint: A Karatsuba-Based Algorithm for Polynomial Multiplication in Chebyshev Form</title>
			<link>http://www.pheedcontent.com/click.phdo?i=45228c2175e65a94e0098131aa4e4f0d</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.45</pheedo:origLink>
			<description>In this paper, we present a new method for multiplying polynomials in Chebyshev form. Our approach has two steps. First, the well-known Karatsuba's algorithm is applied to polynomials constructed by using Chebyshev coefficients. Then, from the obtained result, extra arithmetic operations are used to write the final result in Chebyshev form. The proposed algorithm has a quadratic computational complexity. We also compare our method to other approaches.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=45228c2175e65a94e0098131aa4e4f0d&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=45228c2175e65a94e0098131aa4e4f0d&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.45</guid>
		</item>
		<item>
			<title>PrePrint: Loop-Based Instruction Prefetching to Reduce the Worst-Case Execution Time</title>
			<link>http://www.pheedcontent.com/click.phdo?i=5e6f051e37fa90e64180a936f0c1e256</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.44</pheedo:origLink>
			<description>Estimating and optimizing worst-case execution time (WCET) is critical for hard real-time systems to ensure that different tasks can meet their respective deadlines. Recent work has shown that simple prefetching techniques such as the Next-N-Line prefetching can enhance both the average-case and worst-case performance; however, the improvement on the worst-case execution time is rather limited and inefficient. This paper studies a loop-based instruction prefetching approach, which can exploit the program control-flow information to intelligently prefetch instructions that are most likely needed. Our evaluation indicates that the loop-based instruction prefetching outperforms the Next-N-Line prefetching in both the worst-case and the average-case performance for real-time applications.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=5e6f051e37fa90e64180a936f0c1e256&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=5e6f051e37fa90e64180a936f0c1e256&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.44</guid>
		</item>
		<item>
			<title>PrePrint: A Hardware Accelerator for Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space</title>
			<link>http://www.pheedcontent.com/click.phdo?i=02461373fecc4d6c22a1ef00a177ea66</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.42</pheedo:origLink>
			<description>The astonishing accomplishments in the field of Genomics would not have been possible without the techniques, algorithms and tools developped in Bioinformatics. Biological sequence comparison is a very important operation in Bioinformatics since it is used to determine how similar two sequences are. As a result of this operation, one or more alignments are produced. DIALIGN is an exact algorithm that uses dynamic programming to obtain optimal biological sequence alignments in quadratic space and time. One effective way to accelerate DIALIGN is to design FPGA-based architectures to execute it. Nevertheless, the retrieval of the alignment in hardware requires modifications on the original algorithm since it executes in quadratic space. In this paper, we propose and evaluate two FPGA-based accelerators to execute DIALIGN in linear space: one to obtain the optimal DIALIGN score (DIALIGN-Score) and one to retrieve the DIALIGN alignment (DIALIGN-Alignment). In this paper, we also propose a linear space variant of the DIALIGN algorithm and design the DIALIGN-Alignment accelerator to implement it. The experimental results show that impressive speedups can be obtained with both accelerators. When comparing long biological sequences, the DIALIGN-Score accelelator achieved a speedup of 383.4 whereas the DIALIGN-Alignment accelerator reached a speedup of 141.38.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=02461373fecc4d6c22a1ef00a177ea66&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=02461373fecc4d6c22a1ef00a177ea66&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.42</guid>
		</item>
		<item>
			<title>PrePrint: Cache-Based Memory Copy Hardware Accelerator for Multi-Core Systems</title>
			<link>http://www.pheedcontent.com/click.phdo?i=c46e7107086dc3f17a2ab1cf44b0e2eb</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.41</pheedo:origLink>
			<description>In this paper, we present a new architecture of the cache-based memory copy hardware accelerator in a multi-core system supporting message passing. The accelerator is able to accelerate memory data movements, in particular memory copies. We perform an analytical analysis based on open-queuing theory to study the utilization of our accelerator in a multi-core system. In order to correctly model the system, we gather the necessary information by utilizing a full-system simulator. We present both the simulation results and the analytical analysis based on open-queuing theory. We demonstrate the advantages of our solution based on a full-system simulator utilizing several applications: the STREAM benchmark and the receiver-side of the TCP/IP stack. Our accelerator provides speedups from 2.96 to 4.61 for the receiver-side of the TCP/IP stack, reduces the number of instructions from 26% to 44% and achieves a higher cache hit rate. Utilizing the analytical analysis based on open-queuing theory, our accelerator is able to achieve a reduction in the average number of cycles executed per instruction up to 50% for one of the CPUs in the system.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=c46e7107086dc3f17a2ab1cf44b0e2eb&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=c46e7107086dc3f17a2ab1cf44b0e2eb&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.41</guid>
		</item>
		<item>
			<title>PrePrint: Authenticated Group Key Transfer Protocol Based on Secret Sharing</title>
			<link>http://www.pheedcontent.com/click.phdo?i=45a802bcbdaffbf6a3ebf25c308b7796</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.40</pheedo:origLink>
			<description>Key transfer protocols rely on a mutually trusted key generation center (KGC) to select session keys and transports session keys to all communication entities secretly. Most often, KGC encrypts session keys under another secret key shared with each entity during registration. In this paper, we propose an authenticated key transfer protocol based on secret sharing scheme that KGC can broadcast group key information to all group members at once and only authorized group members can recover the group key; but un-authorized users cannot recover the group key. The confidentiality of this transformation is information theoretically secure. We also provide authentication for transporting this group key. Goals and security threats of our proposed group key transfer protocol will be analyzed in detail.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=45a802bcbdaffbf6a3ebf25c308b7796&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=45a802bcbdaffbf6a3ebf25c308b7796&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.40</guid>
		</item>
		<item>
			<title>PrePrint: EvolvingSpace: A Data Centric Framework for Integrating Bioinformatics Applications</title>
			<link>http://www.pheedcontent.com/click.phdo?i=d6c6b5da98c5943d385ad99c5771af3a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.39</pheedo:origLink>
			<description>The paper presents EvolvingSpace, a data centric distributed system, which is intended to address the data and application integration problem in bioinformatics data centers. The system employs commodity PCs for data storage and computation. EvolvingSpace manages data in a decentralized manner, which is convenient for storing data annotations and can eliminate potential data-access bottlenecks. It indexes distributed data in multi-levels to facilitate the construction of complex workflows that consist of different types of applications. In addition, the paper gives a data locality aware and workflow aware scheduling algorithm (ES-Scheduling) to balance the data distribution and computing performance as well as throughput and workflow response time. We run extensive experiments using the system with real bioinformatics applications. Our results show that the system is efficient for running integrated bioinformatics applications and has good scalability.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=d6c6b5da98c5943d385ad99c5771af3a&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=d6c6b5da98c5943d385ad99c5771af3a&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.39</guid>
		</item>
		<item>
			<title>PrePrint: Priority Tries for IP Address Lookup</title>
			<link>http://www.pheedcontent.com/click.phdo?i=0d8cb4ebfea0bf7bb0211489efaf0834</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.38</pheedo:origLink>
			<description>High speed IP address lookup is essential to achieve wire-speed packet forwarding in Internet routers. The longest prefix matching for IP address lookup is more complex than exact matching because it involves dual dimensions: length and value. This paper presents a new formulation for IP address lookup problem using range representation of prefixes, and an efficient binary trie structure named a priority trie is proposed. In this range representation, prefixes are represented as ranges on a number line between 0 and 1 without expanding to the maximum length. The best match to a given input address is the smallest range that includes the input. The priority trie is based on the trie structure, with empty internal nodes in the trie replaced by the priority prefix which is the longest among those in the sub-trie rooted by the empty nodes. The search ends when an input matches a priority prefix, which significantly improves the search performance. Performance evaluation using real routing data shows that the proposed priority trie is very good in performance metrics such as lookup speed, memory size, update performance, and scalability.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=0d8cb4ebfea0bf7bb0211489efaf0834&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=0d8cb4ebfea0bf7bb0211489efaf0834&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.38</guid>
		</item>
		<item>
			<title>PrePrint: Improving Networked File System Performance Using a Locality-Aware Cooperative Cache Protocol</title>
			<link>http://www.pheedcontent.com/click.phdo?i=bf98085b7f9320f65a3a0eca7bc72914</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.37</pheedo:origLink>
			<description>Cooperative caching has been proposed to increase cache utilization by coordinating the shared usage of distributed caches, allowing clients that would more greatly benefit from larger caches to forward data objects to peer clients with relatively underutilized caches. To support such coordination, global cache utilization must be dynamically evaluated. This in turn relies on effective analysis of application data access patterns. Existing coordination protocols are demonstrably suboptimal in this respect, exhibiting inefficient memory utilization and undue interference among clients. We propose a locality-aware cooperative caching protocol, called LAC, that is based on analysis and manipulation of data-block reuse distance to effectively predict cache utilization and the probability of data reuse at each client. Using a dynamically adaptive synchronization technique, we keep local information consistently comparable and up to date across clients. The system is highly scalable in the sense that global coordination is achieved without centralized control. We have conducted thorough trace-driven simulation experiments to assess the performance differences between LAC and various existing protocols representative of the general class.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=bf98085b7f9320f65a3a0eca7bc72914&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=bf98085b7f9320f65a3a0eca7bc72914&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.37</guid>
		</item>
		<item>
			<title>PrePrint: Preventing Silent Data Corruptions from Propagating During Data Reconstruction</title>
			<link>http://www.pheedcontent.com/click.phdo?i=f25c458b13adfbc012e652a2f03618be</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.36</pheedo:origLink>
			<description>One recent technical challenge facing the designers of erasure-coded storage systems is how to prevent silent data corruptions from propagating during data reconstruction. This paper proposes a new technique of exploiting erasure-coded storage systems to cope with silent data corruptions during data reconstruction. To develop a data reconstruction method that can prevent silent data corruptions from propagating, we first define the consistency of a strip group and then study the impact of silent data corruptions on the consistency of strip groups. Based on the conclusions obtained from the study, an efficient adaptive data reconstruction method is developed for data reconstruction in the presence of silent data corruptions. A performance analysis of our new data reconstruction method is then made using a probabilistic method. Our results show that the overall performance impact of our data reconstruction method is negligible in practical systems. A comparison of techniques for coping with silent data corruptions in erasure-coded storage systems is also made. The comparison shows that the technique based on our data reconstruction method is a better choice to cope with silent data corruptions when periodic validation is used in an erasure-coded storage system.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=f25c458b13adfbc012e652a2f03618be&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=f25c458b13adfbc012e652a2f03618be&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.36</guid>
		</item>
		<item>
			<title>PrePrint: Statistical Approach to Networks-on-Chip</title>
			<link>http://www.pheedcontent.com/click.phdo?i=9826c45e74408d2ae9e2ca9c02d4f0d8</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.35</pheedo:origLink>
			<description>Chip multiprocessors (CMPs) combine increasingly many general-purpose processor cores on a single chip. These cores run several tasks with unpredictable communication needs, resulting in uncertain and often-changing traffic patterns. This unpredictability leads network-on-chip (NoC) designers to plan for the worst-case traffic patterns, and significantly over-provision link capacities. In this paper, we provide NoC designers with an alternative statistical approach. We first present the traffic-load distribution plots (T-Plots), illustrating how much capacity over-provisioning is needed to service 90%, 99%, or 100% of all traffic patterns. We prove that in the general case, plotting T-Plots is #P-complete, and therefore extremely complex. We then show how to determine the exact mean and variance of the traffic load on any edge, and use these to provide Gaussian-based models for the T-Plots, as well as guaranteed performance bounds. We also explain how to practically approximate T-Plots using random-walk-based methods. Finally, we use T-Plots to reduce the network power consumption by providing an efficient capacity allocation algorithm with predictable performance guarantees.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=9826c45e74408d2ae9e2ca9c02d4f0d8&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=9826c45e74408d2ae9e2ca9c02d4f0d8&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.35</guid>
		</item>
		<item>
			<title>PrePrint: Adaptation of Reputation Management Systems to Dynamic Network Conditions in Ad hoc Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=3bc5499d43c2a91e7fa6e0ee37fdc13e</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.34</pheedo:origLink>
			<description>Reputation management (RM) has been proposed as a cooperation enforcement solution in ad hoc networks. Typically, the functions of RM (evaluation, detection, and reaction) are carried out homogeneously across time and space. However, the dynamic nature of ad hoc networks causes node behavior to vary both spatially and temporally due to changes in local and network-wide conditions. When RM functions do not adapt to such changes, their effectiveness, measured in terms of accuracy (correct identification of node behavior) and promptness (timely identification of node misbehavior), may be compromised. We propose an adaptive reputation management system that realizes that changes in node behavior may be driven by changes in network conditions and that accommodates such change by adapting its operating parameters. We introduce a time-slotted approach to allow the evaluation function to quickly and accurately capture changes in node behavior. We show how the duration of an evaluation slot can adapt according to the network's activity to enhance the system accuracy and promptness. We then show how the detection function can utilize a Sequential Probability Ratio Test (SPRT) to distinguish between cooperative and misbehaving neighbors. We compare our proposed solution to a non-adaptive system, showing the ability of our system to achieve high accuracy and promptness in dynamic environments.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=3bc5499d43c2a91e7fa6e0ee37fdc13e&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=3bc5499d43c2a91e7fa6e0ee37fdc13e&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.34</guid>
		</item>
		<item>
			<title>PrePrint: Concurrent Structure-Independent Fault Detection Schemes for the Advanced Encryption Standard</title>
			<link>http://www.pheedcontent.com/click.phdo?i=290e530eebcfd08e440e44f3dd6a1d8e</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.33</pheedo:origLink>
			<description>The Advanced Encryption Standard (AES) has been lately accepted as the symmetric cryptography standard for confidential data transmission. However, the natural and malicious injected faults reduce its reliability and may cause confidential information leakage. In this paper, we study concurrent fault detection schemes for reaching a reliable AES architecture. Specifically, we propose low-cost structure-independent fault detection schemes for the AES encryption and decryption. We have obtained new formulations for the fault detection of SubBytes and inverse SubBytes using the relation between the input and the output of the S-box and the inverse S-box. The proposed schemes are independent of the way the S-box and the inverse S-box are constructed. Therefore, they can be used for both the S-boxes and the inverse S-boxes using look-up tables and those utilizing logic gate implementations based on composite fields. Our simulation results show the error coverage of greater than 99% for the proposed schemes. Moreover, the proposed and the previously reported fault detection schemes have been implemented on the most recent Xilinx Virtex FPGAs. Their area and delay overheads have been compared and it is shown that the proposed schemes outperform the previously reported ones.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=290e530eebcfd08e440e44f3dd6a1d8e&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=290e530eebcfd08e440e44f3dd6a1d8e&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.33</guid>
		</item>
		<item>
			<title>PrePrint: Hardware Support for Secure Processing in Embedded Systems</title>
			<link>http://www.pheedcontent.com/click.phdo?i=459a60e127b2b9a877e69c81669eaea3</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.32</pheedo:origLink>
			<description>The inherent limitations of embedded systems make them particularly vulnerable to attacks. We have developed a hardware monitor that operates in parallel to an embedded processor and detects any attack that causes the embedded processor to deviate from its originally programmed behavior. We explore several different characteristics that can be used for monitoring and quantitative tradeoffs between these approaches. Our results show that our proposed hash-based monitoring pattern can detect attacks within one instruction cycle at lower memory requirements than traditional approaches that use control-flow information.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=459a60e127b2b9a877e69c81669eaea3&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=459a60e127b2b9a877e69c81669eaea3&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.32</guid>
		</item>
		<item>
			<title>PrePrint: A General Framework for Parameterized Schedulability Bound Analysis of Real-Time Systems</title>
			<link>http://www.pheedcontent.com/click.phdo?i=f38106d90c639bf5d6520bfc2df5d37e</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.31</pheedo:origLink>
			<description>In real-time systems, utilization-based schedulability test is a common approach to determine whether or not tasks can be admitted without violating deadline requirements. The test is extremely simple, since it only needs to compare the utilization of the tasks with a pre-determined bound. As such, utilization-based schedulability tests are suitable for on-line use. The challenge is how to derive a reasonable utilization bound for a given system. Most existing results are obtained on a case-by-case basis because of their analytical complexity. In this paper, we develop a flexible and unified representation framework of real-time systems (i.e., tasks and schedulers) based on network calculus techniques. Our representation framework, together with the proposed bound derivation method, leads to a general bound result, which is applicable to a large family of real-time systems.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=f38106d90c639bf5d6520bfc2df5d37e&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=f38106d90c639bf5d6520bfc2df5d37e&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.31</guid>
		</item>
		<item>
			<title>PrePrint: High Performance Robust Latches</title>
			<link>http://www.pheedcontent.com/click.phdo?i=981368a256cf3f2fbc2922366eaddbdd</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.24</pheedo:origLink>
			<description>First a new high performance robust latch (referred to as HiPeR latch) is presented, that is insensitive to transient faults (TFs) affecting its internal and output nodes by design, independently of the size of its transistors. Then, a modified version of the HiPeR latch (referred as HiPeR-CG) is proposed, that is suitable to be used together with clock gating. Both proposed latches are faster than the latches most recently presented in the literature, while providing better or comparable robustness to transient faults, at comparable or lower costs in terms of area and power, respectively. Therefore, thanks to the good tradeoffs in terms of performance, robustness and cost, our proposed latches are particularly suitable to be adopted on critical paths.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=981368a256cf3f2fbc2922366eaddbdd&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=981368a256cf3f2fbc2922366eaddbdd&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.24</guid>
		</item>
		<item>
			<title>PrePrint: Anonymous Multi-Receiver Identity-Based Encryption</title>
			<link>http://www.pheedcontent.com/click.phdo?i=d25f2d79030445b56d6045910e603b87</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.23</pheedo:origLink>
			<description>Recently, many multi-receiver identity-based encryption schemes have been proposed in the literature. However, none can protect the privacy of message receivers among these schemes. In this manuscript, we present an anonymous multi-receiver identity-based encryption scheme where we adopt Lagrange interpolating polynomial mechanisms to cope with the above problem. Our scheme makes it impossible for an attacker or any other message receiver to derive the identity of a message receiver such that the privacy of every receiver can be guaranteed. Furthermore, the proposed scheme is quite receiver efficient since each of the receivers merely needs to perform constant times (twice in fact) of pairing computation to decrypt the received ciphertext. We prove that our scheme is secure against adaptive chosen plaintext attacks and adaptive chosen ciphertext attacks. Finally, we also formally show that every receiver in the proposed scheme is anonymous to any other receiver.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=d25f2d79030445b56d6045910e603b87&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=d25f2d79030445b56d6045910e603b87&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.23</guid>
		</item>
		<item>
			<title>PrePrint: DACO: A High-Performance Disk Architecture Designed Specially for Large-Scale Erasure-Coded Storage Systems</title>
			<link>http://www.pheedcontent.com/click.phdo?i=93ccfaa79fe562e0cddedbfe2e0c0952</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.22</pheedo:origLink>
			<description>Large-scale erasure-coded storage systems have a serious performance problem due to I/O congestion and disk media access congestion caused by read-modify-write operations involved in small-write operations. All the existing technologies based on the conventional disk can provide very limited performance improvement. This paper presents a new Disk Architecture with Composite Operation (DACO), whose disk media access interface consists of three kinds of operations: READ, WRITE, and Composite Operation (CO). The CO adopts a sector-based pipeline technology to implement block-level data modify operations and thus can replace the read-modify-write operations involved in small-write operations. When the DACO is adopted in a large-scale erasure-coded storage system with t fault tolerance, t I/Os and t disk media access operations can be reduced in each small-write operation, respectively. This alleviates both I/O congestion and disk media access congestion in nature and thus can remarkably improve the performance of large-scale erasure-coded storage systems. A simulation study shows that the DACO can provide significant performance improvement: reducing the average I/O response time by up to 31.16% even in the worst case where t=1. This paper also discusses the important implementation issues of the DACO and investigates the additional cost involved in the DACO.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=93ccfaa79fe562e0cddedbfe2e0c0952&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=93ccfaa79fe562e0cddedbfe2e0c0952&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.22</guid>
		</item>
		<item>
			<title>PrePrint: An Improved Algorithm for Construction of Decision Diagrams by Rearranging and Partitioning of the Input Cube Set</title>
			<link>http://www.pheedcontent.com/click.phdo?i=27eccd42230ffd3bfcbde8d03beb9cca</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.21</pheedo:origLink>
			<description>Decision diagrams (DDs) are a data structure that allows for compact representation of discrete functions. Efficient construction of DDs in terms of space and time is often considered problem. A particular problem is that during construction of DD a large number of temporary nodes are created. In this paper we address this problem when functions are specified in the PLA format. A common practice is to construct a DD by recursively processing all the cubes in PLA specification. The DD representing a subfunction defined by a single cube is merged with the DD for the subfunction defined by all the previously processed cubes. We proposed a method for reordering and partitioning the set of cubes in PLA specification that results in the reduction of both space and time complexities of construction of DDs. First, we arranged cubes by their suffices. Then we partition the set of cubes, construct DDs for subfunctions represented each partition separately and merge them into a final DD. The proposed method is applied on construction of DDs for the set of standard benchmark functions. The experiments show that the total number of created nodes is reduced on the average by 34.65%, while the construction time is decreased by 48.6%.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=27eccd42230ffd3bfcbde8d03beb9cca&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=27eccd42230ffd3bfcbde8d03beb9cca&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.21</guid>
		</item>
		<item>
			<title>PrePrint: Heterogenous Quorum-Based Wakeup Scheduling in Wireless Sensor Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=388ebb1dafcd5a4ceb8aec308b6b208f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.20</pheedo:origLink>
			<description>We present heterogenous quorum-based asynchronous wakeup scheduling schemes for wireless sensor networks. The schemes can ensure that two nodes that adopt different quorum systems as their wakeup schedules can hear each other at least once in bounded time intervals. We propose two such schemes: cyclic quorum system pair (cqs-pair) and grid quorum system pair (gqs-pair ). The cqs-pair which contains two cyclic quorum systems provides an optimal solution in terms of energy saving ratio for asynchronous wakeup scheduling. To quickly assemble a cqs-pair, we present a fast construction scheme which is based on the multiplier theorem and the (N; k; M; l)-difference pair defined by us. Regarding the gqs-pair, we prove that any two grid quorum systems will automatically form a gqspair. We further analyze the performance of both designs, in terms of average discovery delay, quorum ratio, and energy saving ratio. We show that our designs achieve better tradeoff between the average discovery delay and quorum ratio (and thus energy consumption) for different cycle lengthes. We implemented the proposed designs in a wireless sensor network platform of Telosb motes. Our implementation-based measurements further validate the analytically-established performance trade-off of our designs.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=388ebb1dafcd5a4ceb8aec308b6b208f&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=388ebb1dafcd5a4ceb8aec308b6b208f&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.20</guid>
		</item>
		<item>
			<title>PrePrint: Stochastic Contention Level Simulation for Single Chip Heterogeneous Multiprocessors</title>
			<link>http://www.pheedcontent.com/click.phdo?i=3c84263be7ffb9dcf2a73e060cfeb28a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.19</pheedo:origLink>
			<description>Single chip systems, featuring multiple heterogeneous processors and a variety of communication and memory architectures, have emerged to satisfy the demand for networking, handheld computing, and other custom devices. When simulated at cycle-accurate level, these system models are slow to build and execute, severely limiting the number of design iterations that can be considered. A key challenge in raising the simulation level above the clock cycle is an effective method for estimating contention for shared resources such as memories and busses. This paper introduces a new level of design called the Stochastic Contention Level (SCL). Instead of considering shared resource accesses at the clock cycle granularity, SCL simulations operate on blocks that are thousands to millions of clock cycles long, stochastically capturing contention for shared resources via sampled access attributes, while still retaining an event-based simulation framework. The SCL approach results in speedups of 40X over cycle-accurate simulation, with average simulation errors of less than 1% with 95% confidence intervals of about &amp;#x2213;3%, providing a unique combination of simulation capabilities, performance, and accuracy. This significant increase in simulation performance enables the system designers to explore more of the design space than possible with traditional simulation approaches.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=3c84263be7ffb9dcf2a73e060cfeb28a&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=3c84263be7ffb9dcf2a73e060cfeb28a&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.19</guid>
		</item>
		<item>
			<title>PrePrint: Automatic Refinement Checking of Pipelines with Out-of-Order Execution</title>
			<link>http://www.pheedcontent.com/click.phdo?i=976e684a25d90790fb09dc5d4488f7c9</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.18</pheedo:origLink>
			<description>We show how to automatically verify pipelined machines with out-of-order execution using refinement. Our notion of refinement is based on Well-Founded Equivalence Bisimulations. Proving refinement guarantees that a pipelined machine will preserve all safety and liveness properties of its instruction set architecture. Checking liveness&amp;#x2014;used to ensure that deadlock cannot occur, i.e., there is always forward progress&amp;#x2014;is essential for out-of-order machines as the control logic is involved and prone to deadlock defects. In previous work on out-of-order verification, liveness checking was either ignored or not automated. We develop two automatic methods based on refinement that check both safety and liveness of out-of-order pipelined machines. We use extensive experimentation based on 14 out-of-order machine models to study and compare these methods. We find overall that the cost of proving both safety and liveness is about 81% more than the cost of proving safety alone.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=976e684a25d90790fb09dc5d4488f7c9&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=976e684a25d90790fb09dc5d4488f7c9&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.18</guid>
		</item>
		<item>
			<title>PrePrint: Checking Completeness of Tests for Finite State Machines</title>
			<link>http://www.pheedcontent.com/click.phdo?i=866f4e4142087fd39fae7570345bae96</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.17</pheedo:origLink>
			<description>In testing from a Finite State Machine (FSM), the generation of test suites which guarantee full fault detection, known as complete test suites, has been a long-standing research topic. In this paper, we present conditions that are sufficient for a test suite to be complete. We demonstrate that the existing conditions are special cases of the proposed ones. An algorithm that checks whether a given test suite is complete is given. The experimental results show that the algorithm can be used for relatively large FSMs and test suites.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=866f4e4142087fd39fae7570345bae96&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=866f4e4142087fd39fae7570345bae96&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.17</guid>
		</item>
		<item>
			<title>PrePrint: ($t$, $k$)-Diagnosability for Regular Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=34a9fd33b7ee6076a8ea5606e773eba0</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.16</pheedo:origLink>
			<description>System-level diagnosis aims to identify faulty processors in a multiprocessor system by means of analyzing the test results among the processors. ($t$, $k$)-diagnosis, which is one of the most important system-level diagnosis strategies, requires at least $k$ faulty processors identified in each iteration provided there are at most $t$; faultyprocessors, where $t$ &amp;#x2265; $k$. In this paper, a new ($t$, $k$)-diagnosis algorithm for regular networks is proposed. Experimental results show that the proposed algorithm has larger values of $t$ for tori and matching composition networks, in comparison with previous ($t$, $k$)-diagnosis algorithms.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=34a9fd33b7ee6076a8ea5606e773eba0&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=34a9fd33b7ee6076a8ea5606e773eba0&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.16</guid>
		</item>
		<item>
			<title>PrePrint: Distributed Virtual Bit-Slice Synchronizer: A Scalable Hardware Barrier Mechanism for n-Dimensional Meshes</title>
			<link>http://www.pheedcontent.com/click.phdo?i=fae97c3fae3ab843f6ad0d1bf036179b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.15</pheedo:origLink>
			<description>The work presents a distributed hardware-level barrier mechanism for n-dimentional mesh-connected MIMD computers, called Distributed Virtual Bit-Slice Synchronizer (DVBSS). The proposed mechanism is structured around an m-bit dedicated control network, whose topology is a directed mesh-embeddable graph, with an additional m-bit-wide wraparound connection. By using a specific virtualization scheme making it possible to have p virtual m-bit barrier networks superposed on a physical one, the DVBSS model allows to synchronize more than m barrier groups. To minimize synchronization latency, the DVBSS scheme uses a distributed circulating wave clocking (DCW-clocking) technique to switch between virtual barrier networks in a pipeline fashion. The DVBSS scheme is shown to be general, configurable, and MPI-compatible. Unlike proposed distributed hardware barriers, and hardware tree-based schemes, the DVBSS mechanism accepts dynamically defined (possibly overlapping) barrier groups of arbitrary size and shape, allowing noncontiguous group member allocations.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=fae97c3fae3ab843f6ad0d1bf036179b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=fae97c3fae3ab843f6ad0d1bf036179b&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.15</guid>
		</item>
		<item>
			<title>PrePrint: A Hybrid Approach to NAND-Flash-Based Solid-State Disks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=dec25e5640834d1e6a6c526f5992064b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.14</pheedo:origLink>
			<description>Replacing power-hungry disks with NAND-flash-based solid-state disks (SSDs) is a recently emerging trend in flash-memory applications. One important SSD design issue is achieving a good balance between cost, performance, and lifetime. This study introduces a hybrid approach to large SSDs that combines MLC NAND flash and SLC NAND flash. Each of these flash architectures has its own drawbacks and benefits, and this study proposes that the two can complement each other. However, there are technical challenges pertaining to data placement, data migration, and wear leveling in heterogeneous NAND flash. The experimental results of our study show that combining 256 MB SLC flash with 20 GB MLC flash produces a hybrid SSD. This hybrid SSD is 1.8 times faster than a purely MLC-flash-based SSD in terms of average response time, and improves energy consumption by 46%. The proposed hybrid SSD costs only 4% more than a purely MLC-flash-based SSD. The extra cost of a hybrid SSD is very limited and rewarding.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=dec25e5640834d1e6a6c526f5992064b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=dec25e5640834d1e6a6c526f5992064b&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.14</guid>
		</item>
		<item>
			<title>PrePrint: Garbage Collection for Flexible Hard Real-Time Systems</title>
			<link>http://www.pheedcontent.com/click.phdo?i=ce0afb80bf4e930b992c6776fe648f50</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2010.13</pheedo:origLink>
			<description>Hard real-time systems always choose not to use garbage collection in order to avoid their unpredictable executions. Much effort has been expended trying to build predictable garbage collectors which can provide both temporal and spatial guarantees. Unfortunately, most existing work leads to systems that cannot easily achieve a balance between temporal and spatial performances. Moreover, the scheduling of garbage collectors has not been integrated into modern real-time scheduling frameworks, which makes the benefits provided by the advancement of scheduling techniques very difficult to obtain. This paper argues that the existing design criteria for real-time garbage collectors do not reflect the unique requirements of flexible hard real-time systems. As a part of our design criteria, a new performance indicator is proposed to describe the capability of a real-time garbage collector to achieve a better balance between temporal and spatial performances. A hybrid garbage collection algorithm is designed accordingly which also uses dual priority scheduling algorithm to reclaim spare capacity whilst guaranteeing deadlines. The algorithm has been implemented and evaluated in a real-time Java environment.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=ce0afb80bf4e930b992c6776fe648f50&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=ce0afb80bf4e930b992c6776fe648f50&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2010.13</guid>
		</item>
		<item>
			<title>PrePrint: Area-Time Efficient Implementation of the Elliptic Curve Method of Factoring in Reconfigurable Hardware for Application in the Number Field Sieve</title>
			<link>http://www.pheedcontent.com/click.phdo?i=56a64b93d8fdff33408825ff2174766b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.191</pheedo:origLink>
			<description>A novel portable hardware architecture of the Elliptic Curve Method of factoring, designed and optimized for application in the relation collection step of the Number Field Sieve, is described and analyzed. A comparison with an earlier proof-of-concept design by Pelzl, Simka, et al. has been performed, and a substantial improvement has been demonstrated in terms of both the execution time and the areatime product. The ECM architecture has been ported across five different families of FPGA devices in order to select the family with the best performance to cost ratio. A timing comparison with the highly optimized software implementation, GMP-ECM, has been performed. Our results indicate that low-cost families of FPGAs, such as Spartan-3 and Spartan-3E, offer at least an order of magnitude improvement over the same generation of microprocessors in terms of the performance to cost ratio, without the use of embedded FPGA resources, such as embedded multipliers.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=56a64b93d8fdff33408825ff2174766b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=56a64b93d8fdff33408825ff2174766b&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.191</guid>
		</item>
		<item>
			<title>PrePrint: Scratchpad Memory Management Techniques for Code in Embedded Systems Without an MMU</title>
			<link>http://www.pheedcontent.com/click.phdo?i=cd006b0f17b8bd6d55d68cf973f9b2b5</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.188</pheedo:origLink>
			<description>We propose a code scratchpad memory (SPM) management technique with demand paging for embedded systems that have no memory management unit. Based on profiling information, a post-pass optimizer analyzes and optimizes application binaries in a fully automated process. It classifies the code of the application including libraries into three classes based on a mixed integer linear programming formulation: External code is executed directly from the external memory. Pinned code is loaded into the SPM when the application starts and stays in the SPM. Paged code is loaded into/unloaded from the SPM on demand. We evaluate the proposed technique by running fourteen embedded applications both on a cycle-accurate ARM processor simulator and an ARM1136JF-S core. On the simulator, the reference case, a 4-way set-associative cache, is compared to a direct-mapped cache and an SPM of comparable die area. On average, we observe an improvement of 12% in runtime performance and a 21% reduction in energy consumption. On the ARM11 board, the reference case run on the 16-KB 4-way set-associative cache is compared to the demand paging solution on the 16-KB SPM, optionally supported by the cache. The measured results show both a runtime performance improvement and a reduction of the energy consumption by 23%, on average.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=cd006b0f17b8bd6d55d68cf973f9b2b5&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=cd006b0f17b8bd6d55d68cf973f9b2b5&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.188</guid>
		</item>
		<item>
			<title>PrePrint: Packet-Mode Emulation of Output-Queued Switches</title>
			<link>http://www.pheedcontent.com/click.phdo?i=0f7a4c52a001fac83c593a9a1a1d7e18</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.186</pheedo:origLink>
			<description>Most common network protocols transmit variable size packets, whereas contemporary switches still operate with fixed size cells, which are easier to transmit and buffer. This necessitates packet segmentation and reassembly modules, resulting in significant computation and communication overhead that might be too costly as switches become faster and bigger. It is therefore imperative to investigate an alternative mode of scheduling, in which packets are scheduled contiguously over the switch fabric. This paper investigates the cost of packet-mode scheduling for the combined-input-output-queued (CIOQ) switch architecture. We devise frame-based schedulers that allow a packet-mode CIOQ switch with small speedup to mimic an ideal output-queued switch, with bounded relative queuing delay. Our schedulers demonstrate a trade-off between the switch speedup and the relative queuing delay incurred while mimicking an output-queued switch. When the switch is allowed to incur high relative queuing delay, a speedup arbitrarily close to 2 suffices to mimic an ideal output-queued switch. This implies that packet-mode scheduling does not require higher speedup than a cell-based scheduler. Simulations confirm that packet-mode emulation with reasonable relative queuing delay can be achieved with moderate speedup. Furthermore, a simple and practical heuristic is shown by simulations to also provide effective packet-mode emulation.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=0f7a4c52a001fac83c593a9a1a1d7e18&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=0f7a4c52a001fac83c593a9a1a1d7e18&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.186</guid>
		</item>
		<item>
			<title>PrePrint: Network Applications on Simultaneous MultiThreading Processors</title>
			<link>http://www.pheedcontent.com/click.phdo?i=874e91a3ddffa6735309ad2401e83a70</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.185</pheedo:origLink>
			<description>As network applications become increasingly sophisticated and Internet traffic is getting heavier, future network processors must continue processing computation-intensive network applications at line rates. Most programmable network processors on the market today, such as the Intel IXP2800, target relatively low performance (from 100 Mbps to 10 Gbps). However, low cost edge routers will find it hard to cope with the forthcoming sophistication of network applications to be processed at those speeds. Hence, new architectures should be designed for the programmable network processors of the future. The goal of this paper is to evaluate the applicability and efficiency of Simultaneous MultiThreaded (SMT) as the base architecture of a network processor. Indeed, the SMT model inherently allows the multiple parallel threads which must be dealt with in network processor applications. In this paper, we investigate the architectural implications of network applications on the SMT architecture. We demonstrate that, when executed as independent threads, applications chosen from different network layers show an improved Instructions Per Cycle (IPC) and cache behavior when compared with the situation where the program executed comes from a single network application. Finally, a new architectural solution to cope with packet dependency is proposed and evaluated.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=874e91a3ddffa6735309ad2401e83a70&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=874e91a3ddffa6735309ad2401e83a70&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.185</guid>
		</item>
		<item>
			<title>PrePrint: Performability Analysis of Multi-State Computing Systems Using Multi-Valued Decision Diagrams</title>
			<link>http://www.pheedcontent.com/click.phdo?i=3a420eb43632634f35b4756074c77e90</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.184</pheedo:origLink>
			<description>A distinct characteristic of multi-state systems (MSS) is that the systems and/or their components may exhibit multiple performance levels (or states) varying from perfect operation to complete failure. MSS can model behaviors such as shared loads, performance degradation, imperfect fault coverage, standby redundancy, limited repair resources, and limited link capacities. The non-binary state property of MSS &amp; their components as well as dependencies existing among different states of the same component make the analysis of MSS difficult. This paper proposes efficient algorithms for analyzing MSS using multi-valued decision diagrams (MDD). Various reliability, availability, and performability measures based on state probabilities or failure frequencies are considered. The application and advantages of the proposed algorithms are demonstrated through two examples. Furthermore, experimental results on a set of benchmark examples are presented to illustrate the advantages of the proposed MDD-based method for the performability analysis of MSS, as compared to the existing methods.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=3a420eb43632634f35b4756074c77e90&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=3a420eb43632634f35b4756074c77e90&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.184</guid>
		</item>
		<item>
			<title>PrePrint: Low Complexity Cubing and Cube Root Computation over $\F_{3^m}$ in Polynomial Basis</title>
			<link>http://www.pheedcontent.com/click.phdo?i=589239a0563db388cc78765ff856d1f2</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2009.183</pheedo:origLink>
			<description>We present low complexity formulae for the computation of cubing and cube root over $\F_{3^m}$ constructed using special classes of irreducible trinomials, tetranomials and pentanomials. We show that for all those special classes of polynomials, field cubing and field cube root operation have the same computational complexity when implemented in hardware or software platforms. As one of the main applications of these two field arithmetic operations lies in pairing-based cryptography, we also give in this paper a selection of irreducible polynomials that lead to low cost field cubing and field cube root computations for supersingular elliptic curves defined over $\F_{3^m}$, where $m$ is a prime number in the pairing-based cryptographic range of interest, namely, $m &amp;#x2208; [47, 541]$.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=589239a0563db388cc78765ff856d1f2&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=589239a0563db388cc78765ff856d1f2&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;!-- foo --&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2009.183</guid>
		</item>
	</channel>
</rss>