<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Dark Silicon Redux: System Design Problem or Fundamental Law?</title>
	<atom:link href="http://www.embeddedinsights.com/channels/2011/02/01/dark-silicon-redux-system-design-problem-or-fundamental-law/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.embeddedinsights.com/channels/2011/02/01/dark-silicon-redux-system-design-problem-or-fundamental-law/</link>
	<description>Shedding Light on the Hidden World of Embedded Systems</description>
	<lastBuildDate>Mon, 28 Jul 2014 16:18:37 -0400</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: Leigh</title>
		<link>http://www.embeddedinsights.com/channels/2011/02/01/dark-silicon-redux-system-design-problem-or-fundamental-law/#comment-5292</link>
		<dc:creator>Leigh</dc:creator>
		<pubDate>Wed, 02 Feb 2011 17:58:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.embeddedinsights.com/channels/?p=433#comment-5292</guid>
		<description>One existing approach to the &#039;dark silicon&#039; problem is kind of like the UCSD/MIT approach.  Employ hundreds of simpler, lighter weight cores that are far more energy-efficient than  typical SMP/Von Neumann/shared/virtual memory cores such as x86 CPUs.  Then arrange them on silicon in a MIMD, hierarchical mesh of interconnect so they can still express highly complex programs such as H.264 video compression.  Decompose application architectures so that the cores only do single tasks.  Each core would have efficient, dedicated, distributed memories, instead of constantly churning virtual/shared memory, (do allow shared SDRAM access for larger data chunks such as video frame buffers).  Oh, and to ease programmability and software design/debug have the MIMD interconnect enforce task/core &#039;encapsulation&#039; in the object-oriented sense, enabled in silicon circuits...  this encapsulation being crucial to practical programmability and debug of complex software systems on hundreds or thousands of cores.    For an example of this architecture, see Nethra&#039;s &#039;Ambric-Architecture&#039; chips with over 300 cores per chip.  As an example, these massively-parallel-processor-array (MPPA) chips are deployed in a 13,000 core X-ray processing system of 40 TeraOPS and 2,000 GMACS compute, in under 500 W.  This 40-chip system is housed in an ATCA chassis.  If the same chassis could possibly be stuffed with 40 Intel CPUs or Nvidia GPU&#039;s of equivalent compute, it would melt into a smoking puddle of aluminum/burnt plastic from over 6KW of energy (GPUs).    See   www.nethra.us.com/products_am2045_overview.php   for a description of the current generation chip.  The next gen of this type of architecture can push toward 1K cores and enormous compute on one die without a &#039;dark silicon&#039; problem.  The key to practical use is programmability which has been solved by the programming model, the encapsulation enforced by the self-synchronizing interconnect, and mature design/debug tools.</description>
		<content:encoded><![CDATA[<p>One existing approach to the &#8216;dark silicon&#8217; problem is kind of like the UCSD/MIT approach.  Employ hundreds of simpler, lighter weight cores that are far more energy-efficient than  typical SMP/Von Neumann/shared/virtual memory cores such as x86 CPUs.  Then arrange them on silicon in a MIMD, hierarchical mesh of interconnect so they can still express highly complex programs such as H.264 video compression.  Decompose application architectures so that the cores only do single tasks.  Each core would have efficient, dedicated, distributed memories, instead of constantly churning virtual/shared memory, (do allow shared SDRAM access for larger data chunks such as video frame buffers).  Oh, and to ease programmability and software design/debug have the MIMD interconnect enforce task/core &#8216;encapsulation&#8217; in the object-oriented sense, enabled in silicon circuits&#8230;  this encapsulation being crucial to practical programmability and debug of complex software systems on hundreds or thousands of cores.    For an example of this architecture, see Nethra&#8217;s &#8216;Ambric-Architecture&#8217; chips with over 300 cores per chip.  As an example, these massively-parallel-processor-array (MPPA) chips are deployed in a 13,000 core X-ray processing system of 40 TeraOPS and 2,000 GMACS compute, in under 500 W.  This 40-chip system is housed in an ATCA chassis.  If the same chassis could possibly be stuffed with 40 Intel CPUs or Nvidia GPU&#8217;s of equivalent compute, it would melt into a smoking puddle of aluminum/burnt plastic from over 6KW of energy (GPUs).    See   <a href="http://www.nethra.us.com/products_am2045_overview.php" rel="nofollow">http://www.nethra.us.com/products_am2045_overview.php</a>   for a description of the current generation chip.  The next gen of this type of architecture can push toward 1K cores and enormous compute on one die without a &#8216;dark silicon&#8217; problem.  The key to practical use is programmability which has been solved by the programming model, the encapsulation enforced by the self-synchronizing interconnect, and mature design/debug tools.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
