<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet href="/css/rss20.xsl" type="text/xsl"?>
<rss version="2.0" xmlns:pheedo="http://www.pheedo.com/namespace/pheedo">
	<channel>
		<title>IEEE Computer Architecture Letters</title>
		<link>http://www.computer.org/cal</link>
		<description></description>
		<language>en-us</language>
		<pubDate>Fri, 6 Nov 2009 11:00:03 GMT</pubDate>
		<image>
			<url>http://csdl.computer.org/common/images/logos/cal.gif</url>
			<title>IEEE Computer Society</title>
			<description>List of recently published journal articles</description>
			<link>http://www.computer.org/cal</link>
		</image>
		<item>
			<title>PrePrint: Exploiting Locality to Improve Circuit-level Timing Speculation</title>
			<link>http://www.pheedcontent.com/click.phdo?i=1677b5c5b6f9935b15be049629e2e9fb</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/L-CA.2009.50</pheedo:origLink>
			<description>Circuit-level timing speculation has been proposed as a technique to reduce dependence on design margins, eliminating power and performance overheads. Recent work has proposed microarchitectural methods to dynamically detect and recover from timing errors in processor logic. This work has not evaluated or exploited the disparity of error rates at the level of static instructions. In this paper, we demonstrate pronounced locality in error rates at the level of static instructions. We propose timing error prediction to dynamically anticipate timing errors at the instruction-level and reduce the costly recovery penalty. This allows us to achieve 43.6% power savings when compared to a baseline policy and incurs only 6.9% performance penalty.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=1677b5c5b6f9935b15be049629e2e9fb&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=1677b5c5b6f9935b15be049629e2e9fb&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/L-CA.2009.50</guid>
		</item>
		<item>
			<title>PrePrint: PRR-PRR Dynamic Relocation</title>
			<link>http://www.pheedcontent.com/click.phdo?i=e1ec1ad431012fb1324755eb8419e903</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/L-CA.2009.49</pheedo:origLink>
			<description>Partial bitstream relocation (PBR) on FPGAs has been gaining attention in recent years as a potentially promising technique to scale parallelism of accelerator architectures at run time, enhance fault tolerance, etc. PBR techniques to date have focused on reading inactive bitstreams stored in memory, on-chip or off-chip, whose contents are generated for a specific partial reconfiguration region (PRR) and modified on demand for configuration into a PRR at a different location. As an alternative, we propose a PRR-PRR relocation technique to generate source and destination addresses, read the bitstream from an active PRR (source) in a non-intrusive manner, and write it to destination PRR. We describe two options of realizing this on Xilinx Virtex 4 FPGAs: (a) hardware-based accelerated relocation circuit (ARC) and (b) a software solution executed on Microblaze. A comparative performance analysis to highlight the speed-up obtained using ARC is presented. For real test cases, performance of our implementations are compared to estimated performances of two state of the art methods.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=e1ec1ad431012fb1324755eb8419e903&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=e1ec1ad431012fb1324755eb8419e903&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/L-CA.2009.49</guid>
		</item>
		<item>
			<title>PrePrint: A process-variation aware technique for tile-based, massive multi-core processors</title>
			<link>http://www.pheedcontent.com/click.phdo?i=78a345475a3149cc3d2494fdb8e1f632</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/L-CA.2009.48</pheedo:origLink>
			<description>Process variations in advanced nodes introduce significant core-to-core performance differences in single-chip multi-core architectures. Isolating each core with its own frequency and voltage island helps improving the performance of the multi-core architecture by operating at the highest frequency possible rather than operating all the cores at the frequency of the slowest core. However, inter-core communication suffers from additional cross-clock-domain latencies that can offset the performance benefits. This work proposes the concept of the configurable, variable-size frequency and voltage domain, and it is described in the context of a tile-based, massive multi-core architecture.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=78a345475a3149cc3d2494fdb8e1f632&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=78a345475a3149cc3d2494fdb8e1f632&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/L-CA.2009.48</guid>
		</item>
		<item>
			<title>PrePrint: Characterizing the Energy Consumption of Software Transactional Memory</title>
			<link>http://www.pheedcontent.com/click.phdo?i=aecf4c1bace1ac7dd95757620c8ec519</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/L-CA.2009.47</pheedo:origLink>
			<description>The well-known drawbacks imposed by lock-based synchronization have forced researchers to devise new alternatives for concurrent execution, of which transactional memory is a promising one. Extensive research has been carried out on Software Transaction Memory (STM), most of all concentrated on program performance, leaving unattended other metrics of great importancel like energy consumption. This letter presents a thorough evaluation of energy consumption in a state-of-the-art STM. We show that energy and performance results do not always follow the same trend and, therefore, it might be appropriate to consider different strategies depending on the focus of the optimization. We also introduce a novel strategy based on dynamic voltage and frequency scaling for contention managers, revealing important energy and energy-delay product improvements in high-contended scenarios. This work is a first study towards a better understanding of the energy consumption behavior of STM systems, and could prompt STM designers to research new optimizations in this area, paving the way for an energy-aware transactional memory.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=aecf4c1bace1ac7dd95757620c8ec519&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=aecf4c1bace1ac7dd95757620c8ec519&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/L-CA.2009.47</guid>
		</item>
		<item>
			<title>PrePrint: Power Management of Datacenter Workloads Using Per-Core Power Gating</title>
			<link>http://www.pheedcontent.com/click.phdo?i=b109eaa7b639cb6e9f014e36b016233e</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/L-CA.2009.46</pheedo:origLink>
			<description>While modern processors offer a wide spectrum of software-controlled power modes, most datacenters only rely on Dynamic Voltage and Frequency Scaling (DVFS, a.k.a. P-states) to achieve energy efficiency. This paper argues that, in the case of datacenter workloads, DVFS is not the only option for processor power management. We make the case for per-core power gating (PCPG) as an additional power management knob for multi-core processors. PCPG is the ability to cut the voltage supply to selected cores, thus reducing to almost zero the leakage power for the gated cores. Using a testbed based on a commercial 4-core chip and a set of real-world application traces from enterprise environments, we have evaluated the potential of PCPG. We show that PCPG can significantly reduce a processor's energy consumption (up to 40%) without significant performance overheads. When compared to DVFS, PCPG is highly effective saving up to 30% more energy than DVFS. When DVFS and PCPG operate together they can save up to almost 60%.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=b109eaa7b639cb6e9f014e36b016233e&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=b109eaa7b639cb6e9f014e36b016233e&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/L-CA.2009.46</guid>
		</item>
		<item>
			<title>PrePrint: Operand Registers and Explicit Operand Forwarding</title>
			<link>http://www.pheedcontent.com/click.phdo?i=18c3ded83e67f76dd29465c825ecf491</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/L-CA.2009.45</pheedo:origLink>
			<description>Operand register files are small, inexpensive register files that are integrated with function units in the execute stage of the pipeline, effectively extending the pipeline operand registers into register files. Explicit operand forwarding lets software opportunistically orchestrate the routing of operands through the forwarding network to avoid writing ephemeral values to registers. Both mechanisms let software capture short-term reuse and locality close to the function units, improving energy efficiency by allowing a significant fraction of operands to be delivered from inexpensive registers that are integrated with the function units. An evaluation shows that capturing operand bandwidth close to the function units allows operand registers to reduce the energy consumed in the register files and forwarding network of an embedded processor by 61%, and allows explicit forwarding to reduce the energy consumed by 26%.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=18c3ded83e67f76dd29465c825ecf491&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=18c3ded83e67f76dd29465c825ecf491&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/L-CA.2009.45</guid>
		</item>
		<item>
			<title>PrePrint: Accurate Functional-First Multicore Simulators</title>
			<link>http://www.pheedcontent.com/click.phdo?i=32d9e09b58e1e80dd82b4990c3be26ab</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/L-CA.2009.44</pheedo:origLink>
			<description>Fast and accurate simulation of multicore systems requires a parallelized simulator. This paper describes a novel method to build cycle-accurate-capable and parallelizable functional-first simulators of multicore targets.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=32d9e09b58e1e80dd82b4990c3be26ab&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=32d9e09b58e1e80dd82b4990c3be26ab&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/L-CA.2009.44</guid>
		</item>
		<item>
			<title>IEEE Computer Architecture Letters - January-June 2009 (Vol. 8, No. 1)</title>
			<link>http://www.computer.org/portal/site/cal/</link>
			<description>IEEE Computer Architecture Letters</description>
			<guid isPermaLink="true">http://www.computer.org/portal/site/cal/</guid>
		</item>
	</channel>
</rss>