Very Large Scale Integration (VLSI): 10/01/2012

Tuesday 30 October 2012

IBM's New Chip Tech.

IBM has put the chip industry on notice by inventing a new technology that would replace silicon with a new material, carbon nanotubes.

IBM has found a new way to put what seems like an impossibly large number of transistors into an insanely small area, the width of only a few atoms. That's 10,000 times thinner than a strand of human hair and less than half the size of the leading silicon technology.

Or as IBM explains:

Carbon nanotubes are single atomic sheets of carbon rolled up into a tube. The carbon nanotube forms the core of a transistor device that will work in a fashion similar to the current silicon transistor, but will be better performing. They could be used to replace the transistors in chips that power our data-crunching servers, high performing computers and ultra fast smart phones.

Inventing the tech is one thing, being able to manufacture it at scale is another. And that's the real breakthrough IBM announced. It has put more than 10,000 of these "nano-sized tubes of carbon" onto single chip using a standard fabricating method.

It will still be years, maybe even a decade, before carbon nanotubes would really replace silicon-based chips in our servers and our smartphones. But this breakthrough is important because the chip industry is reaching a point where it physically can't squeeze much more processing power onto existing forms of chips. Some have predicted that we'll soon reach an end to Moore's Law which tries to double the density of chips on a wafer every two years.

Chip transistors are already super tiny—or nanoscale.

This is what a nanotube looks like under a microscope.

Earlier this year Intel dumped $4.1 billion into two new techniques to help the chip industry continue to get more powerful at smaller scales. These two new technologies are not the same as what IBM is working on.

IBM's carbon-based method may represent a whole new beginning for Moore's Law, the industry maxim that chips keep getting cheaper, more powerful, and smaller.

Wednesday 24 October 2012

Combinational Loop in Design

Combinational loops are logical structures that contain no synchronous feedback element. This kind of loops cause stability and reliability problemas, as we will see in this article, violating the synchronous principles by making feedback with no register in the loop.

WHY? HOW? IS GENERATED A COMBINATIONAL LOOP?

Basically, a combinational loop es implemented in hardware (gates) when in the written VHDL code describing combinational logic a signal that is in the left side of an assignment statement (that is, to the left of the <= symbol) it also is on the expression at the right side of the signal assignment statement (right of <=). For example the following lines of code generate a combinational loop, as long as they are written in a combinational process or in a concurrent signal assignment statement.

1 acc <= acc + data;
2
3 Z <= Z nand B;
4
5 cnt <= cnt + 1;

However, it's important to point out that if these same statements are written in a clocked process, each of them will generate the respective sequential logic. This is due to the fact that the signal assignment statement in clocked process will generate a register for the assigned signal, therefore the loop will be registered in this case, therefore no combinational loop is generated.

HARDWARE

The following figure shows a diagram of a combinational loop.

As it is shown in the figure, the combinational logic output is fedbacked to the same combinational logic without any register in the loop. The logic between the first input an the last output can be made up of one or several levels of combinational logic. It can also have different signals coming in and coming out of that piece of logic, but at least one of the signal is going back (feedback) to the first logic level, as it can be seen in the following figure.

This kind of logic circuit usually is not desired, no wanted to be implemented. Hence, when the synthesis tool finds out about this combinational loop generates a warning message.

Here is an example of VHDL code that generate a combinational loop when is implemented.

1 library ieee;
2 use ieee.std_logic_1164.all;
3
4 entity lazo_comb is
5 port(
6     a: in std_logic;
7     z: out std_logic);
8 end lazo_comb;
9
10 architecture beh of lazo_comb is
11
12 signal y: std_logic;
13
14 begin
15     z <= y;
16
17 process(a,y)
18 begin
19     y <= y nand a;
20 end process;
21
22 end beh;

The synthesis tool, Synplify in this case, generates the following warning regarding the combinational loop.

The warning message "found combinational loop at 'y'" means that the signal 'y' is feed-backed to the input of the combinational logic without any register in the loop. This loop can be easily found when seeing the RTL view of the synthesized system, as it can be seen in the following figure.

SIMULATION

The simulation of the system (very simple system) is shown in the following figure.

The ModelSim windows details a lot of information that deserve a detailed analysis. First of all, the top window plots the waveforms of the signals from the described system, whose main expression is in the line 24 of the middle window. The bottom window, Transcript window, generates an error message, saying that the limit of iterations has been reached at 50ns and no stable value has been gotten. In other words, this mean that the system has began to oscillated and remained oscillating. The maximum number of iterations is configurable in ModelSim (Simulate->Runtime Options); as it's in most simulators. By default this value is set to 5000. Another important piece of information can be found in the bottom of the waveform window. There you can read that the number of Delta reached 5000, which is exactly the number maximum of iterations set in the runtime options, and even after that amount of deltas the system is not stable.

Why this simple logic is oscillating?
Well, analyzing the true table of the nand gate, while one of the input is stuck at '0', the output will be always '1'. That is happening in the simulation shown above. Whereas, when the input (signal a in the simulation) tries to change to '1', due to the fact the other input is still at '1' the output change to '0', then since the feedback input is '0', the output should go to '1', then that '1' is going back with the other input at '1', the output will go to '0' again, and so on...This is what is called an "unstable combinational loop". This kind of loop should NEVER be used in a real design.

Other point to bring out on this example is the importance of simulating a system. Assuming that we configured the FPGA without any simulation, based on the fact that the synthesis tool just gave us a 'warning'), we'd see an no stable output, spending some time (maybe a lot of time) trying to find out why the output is not stable. Conversely, by doing the simulation the problem would appear at first shot.

CODE STYLE

In designs with a large, very large, amount of code lines it is very easy to make mistakes and generate a combinational loop with no intention (as it can be seen in the example above). So, follow certain order when writting the code, trying to maintain a certain flow of data. Also, take a close look at the warnings generated by the synthesis tool.

In case you deliberately want to implement a combinational loop, write a detailed description of the reason for doing that, and also write a comment in the constraint file. The reason for this last point is due to the fact that the Static Timing Analysis tool (STA) usually increase the minimum period of the system when it founds a combinational loop. Therefore, in this case you should tell to the STA tool to 'ignore' that particular path. The syntax for ignoring a path is 'set_false_path' for the Quartus (Altera) software, and for the ISE (Xilinx) you should use TIG with its resepctivs syntax in both cases.

Get free daily email updates!

Intel's Haswell chips coming into your PC in first half of next year

Laptops and desktops with Intel's next-generation Core processor, code-named Haswell, will be available in the first half of next year, Intel CEO Paul Otellini said during a financial conference call on Tuesday.

The Haswell chip will succeed current Core processors code-named Ivy Bridge,which became widely available in April. Intel has said that Haswell will deliver twice the performance of Ivy Bridge, and in some cases will double the battery life of ultrabooks, which are a new category of thin and light laptops with battery life of roughly six to eight hours.

Intel shed some light on Haswell at its Intel Developer Forum trade show in September, saying its power consumption had been cut to the point where the chips could be used in tablets. Haswell chips will draw a minimum of 10 watts of power, while Ivy Bridge's lowest power draw is 17 watts. Intel has splintered future Haswell chips into two families: 10-watt chips for ultrabooks that double as tablets, and 15-watt and 17-watt chips designed for other ultrabooks and laptops.

Haswell will be "qualified for sale" in the first half of 2013, said Stacy Smith, chief financial officer at Intel, during the conference call. Chips go through a qualification process internally and externally, after which Intel can put the chip into production.

The Haswell chip could provide a spark to the ultrabook segment, which has stagnated in a slumping PC market. Worldwide PC shipments dropped between 8 percent and 9 percent during the third quarter, according to research firms IDC and Gartner. They said ultrabook sales were lower than expected due to high prices and soft demand for consumer products.

Many ultrabook models with Ivy Bridge processors are expected to ship in the coming weeks with the launch of Windows 8, which is Microsoft's first touch-centric OS. Otellini said more than 140 Core-based ultrabooks will be in the market, of which 40 will have touch capabilities. A few models -- between five and eight -- will be convertible ultrabooks that can also function as tablets. A majority of the ultrabooks will have prices either at or above US$699, with a few models perhaps priced lower, Otellini said.

The new graphics processor in Haswell will support 4K graphics, allowing for a resolution of 4096 by 3072 pixels. Ultrabooks with Haswell will also include wireless charging, NFC capabilities, voice interaction and more security features.

Otellini said Intel can't tell how the segment will perform in the coming quarter. A number of factors needed to be considered including Microsoft's Windows 8 and the launch of new ultrabooks, he said. Intel reported a profit and revenue decline in the third fiscal quarter of 2012.

"We saw a softening in the consumer segments" in the third fiscal quarter, Otellini said. "The surprise there was China, which was strong, [but] turned weak on us."

Tablets have changed the way people use computers, and Microsoft is bringing touch to mainstream PCs for the first time with Windows 8, Otellini said. PCs with Windows 8 are expected to ship later this month, and it's hard to predict what the response will be until people go out and play with the devices and the OS, Otellini said.

"I see the computing market in a period of transition," with an opportunity for breakthroughs in research and creativity, Otellini said. New usage models for laptops are emerging with detachable touchscreens, voice recognition and other features, and Intel is trying to tap into those opportunities, Otellini said.

The company has a history of overcoming slumps through research and innovation, Otellini said.

Get free daily email updates!

Sunday 14 October 2012

Cyclic Redundancy Check - CRC

CRC Example

Error detection is an important part of communication systems when there is a chance of data getting corrupted. Whether it’s a piece of stored code or a data transmission, you can add a piece of redundant information to validate the data and protect it against corruption. Cyclic redundancy checking is a robust error-checking algorithm, which is commonly used to detect errors either in data transmission or data storage. In this multipart article we explain a few basic principles.

Modulo two arithmetic is simple single-bit binary arithmetic with all carries or borrows ignored. Each digit is considered independently. This article talks about how modulo two addition is equivalent to modulo two subtraction, and can be performed using an exclusive OR operation followed by a brief on Polynomial division where remainder forms the CRC checksum.

For example, we can add two binary numbers X and Y as follows:

10101001 (X) + 00111010 (Y) = 10010011 (Z)

From this example the modulo two addition is equivalent to an exclusive OR operation. What is less obvious is that modulo two subtraction gives the same results as an addition.

From the previous example let’s add X and Z:
10101001 (X) + 10010011 (Z) = 00111010 (Y)

In our previous example we have seen how X + Y = Z therefore Y = Z – X, but the example above shows that Z+X = Y also, hence modulo two addition is equivalent to modulo two subtraction, and can be performed using an exclusive OR operation.

In integer division dividing A by B will result in a quotient Q, and a remainder R. Polynomial division is similar except that when A and B are polynomials, the remainder is a polynomial, whose degree is less than B.

The key point here is that any change to the polynomial A causes a change to the remainder R. This behavior forms the basis of the cyclic redundancy checking.

If we consider a polynomial, whose coefficients are zeros and ones (modulo two), this polynomial can be easily represented by its coefficients as binary powers of two.

In terms of cyclic redundancy calculations, the polynomial A would be the binary message string or data and polynomial B would be the generator polynomial. The remainder R would be the cyclic redundancy checksum. If the data changed or became corrupt, then a different remainder would be calculated.

Although the algorithm for cyclic redundancy calculations looks complicated, it only involves shifting and exclusive OR operations. Using modulo two arithmetic, division is just a shift operation and subtraction is an exclusive OR operation.

Cyclic redundancy calculations can therefore be efficiently implemented in hardware, using a shift register modified with XOR gates. The shift register should have the same number of bits as the degree of the generator polynomial and an XOR gate at each bit, where the generator polynomial coefficient is one.

Augmentation is a technique used to produce a null CRC result, while preserving both the original data and the CRC checksum. In communication systems using cyclic redundancy checking, it would be desirable to obtain a null CRC result for each transmission, as the simplified verification will help to speed up the data handling.

Traditionally, a null CRC result is generated by adding the cyclic redundancy checksum to the data, and calculating the CRC on the new data. While this simplifies the verification, it has the unfortunate side effect of changing the data. Any node receiving the data+CRC result will be able to verify that no corruption has occurred, but will be unable to extract the original data, because the checksum is not known. This can be overcome by transmitting the checksum along with the modified data, but any data-handling advantage gained in the verification process is offset by the additional steps needed to recover the original data.

Augmentation allows the data to be transmitted along with its checksum, and still obtain a null CRC result. As explained before when obtain a null CRC result, the data changes, when the checksum is added. Augmentation avoids this by shifting the data left or augmenting it with a number of zeros, equivalent to the degree of the generator polynomial. When the CRC result for the shifted data is added, both the original data and the checksum are preserved.

In this example, our generator polynomial (x3 + x2 + 1 or 1101) is of degree 3, so the data (0xD6B5) is shifted to the left by three places or augmented by three zeros.
0xD6B5 = 1101011010110101 becomes 0x6B5A8 = 1101011010110101000.

Note that the original data is still present within the augmented data.

0x6B5A8 = 1101011010110101000
Data = D6B5 Augmentation = 000

Calculating the CRC result for the augmented data (0x6B5A8) using our generator polynomial (1101), gives a remainder of 101 (degree 2). If we add this to the augmented data, we get:

0x6B5A8 + 0b101 = 1101011010110101000 + 101
= 1101011010110101101
= 0x6B5AD

As discussed before, calculating the cyclic redundancy checksum for 0x6B5AD will result in a null checksum, simplifying the verification. What is less apparent is that the original data is still preserved intact.

0x6B5AD = 1101011010110101101
Data = D6B5 CRC = 101

The degree of the remainder or cyclic redundancy checksum is always less than the degree of the generator polynomial. By augmenting the data with a number of zeros equivalent to the degree of the generator polynomial, we ensure that the addition of the checksum does not affect the augmented data.

In any communications system using cyclic redundancy checking, the same generator polynomial will be used by both transmitting and receiving nodes to generate checksums and verify data. As the receiving node knows the degree of the generator polynomial, it is a simple task for it to verify the transmission by calculating the checksum and testing for zero, and then extract the data by discarding the last three bits.

Thus augmentation preserves the data, while allowing a null cyclic redundancy checksum for faster verification and data handling.

Get free daily email updates!

Saturday 13 October 2012

VHDL- Delta Delay

VHDL allows the designer to describe systems at various levels of abstraction. As such, timing and delay information may not always be included in a VHDL description.

A delta (or delta cycle) is essentially an infinitesimal, but quantized, unit of time. The delta delay mechanism is used to provide a minimum delay in a signal assignment statement so that the simulation cycle described earlier can operate correctly when signal assignment statements do not include explicitly specified delays. That is:

1) all active processes can execute in the same simulation cycle

2) each active process will suspend at wait statement

3) when all processes are suspended simulation is advanced the minimum time necessary so that some signals can take on their new values

4) processes then determine if the new signal values satisfy the conditions to proceed from the wait statement at which they are suspended

Get free daily email updates!

VHDL-Inertial Delay

The keyword INERTIAL may be used in the signal assignment statement to specify an inertial delay, or it may be left out because inertial delay is used by default in VHDL signal assignment statements which contain “after” clauses.

If the optional REJECT construct is not used, the specified delay is then used as both the ‘inertia’ (i.e. minimum input pulse width requirement) and the propagation delay for the signal. Note that in the example above, pulses on Input narrower than 10ns are not observed on Output.

Get free daily email updates!

VHDL-Transport Delay

The keyword TRANSPORT must be used to specify a transport delay.

Transport delay is the simplest in that when it is specified, any change in an input signal value may result in a new value being assigned to the output signal after the specified propagation delay.

Note that no restrictions are specified on input pulse widths. In this example, Output will be an inverted copy of Input delayed by the 10ns propagation delay regardless of the pulse widths seen on Input .

Get free daily email updates!

Thursday 11 October 2012

VHDL Timing Model

The VHDL timing model controls the stimulus and response sequence of signals in a VHDL model. At the start of a simulation, signals with default values are assigned those values. In the first execution of the simulation cycle, all processes are executed until they reach their first wait statement. These process executions will include signal assignment statements that assign new signal values after prescribed delays.

After all the processes are suspended at their respective wait statements, the simulator will advance simulation time just enough so that the first pending signal assignments can be made (e.g. 1 ns, 3 ns, 1 delta cycle).

After the relevant signals assume their new values, all processes examine their wait conditions to determine if they can proceed. Processes that can proceed will then execute concurrently again until they all reach their respective subsequent wait conditions.

This cycle continues until the simulation termination conditions are met or until all processes are suspended indefinitely because no new signal assignments are scheduled to unsuspend any waiting processes.

There are several types of delay in VHDL, and understanding of how delay works in a process is key to writing and understanding VHDL.

It bears repeating that any signal assignment in VHDL is actually a scheduling for a future value to be placed on that signal. When a signal assignment statement is executed, the signal maintains its original value until the time for the scheduled update to the new value. Any signal assignment statement will incur a delay of one of the three types listed below.

Get free daily email updates!

Wednesday 10 October 2012

Digital Logic in Analog Block - How To Test It?

Analog IP blocks these days have increasing amounts of digital control logic. With very small amounts of digital logic, it's possible to just draw gates on the schematic and run targeted tests that will hopefully catch any errors. But when you have several thousand digital gates, a new approach is needed, as I discovered in a recent discussion.

"2,000 gates is probably a good transition point where people switch from manually inserting gates in a schematic to synthesis," said Bob Melchiorre, director of field operations for digital implementation at Cadence. "Beyond 2,000 gates you have to start thinking about how you're going to test it, because you cannot guarantee that simulation or targeted testing will catch all the issues. Around 10,000 gates it starts getting completely unbearable, and you cannot write enough targeted tests to find all the faults in your digital logic."

Click here to read more ...

Get free daily email updates!

Sunday 7 October 2012

DDR4 SDRAM Standards published by JEDEC

The PC industry hasn't seen an updated memory spec in a while, and it was long past due. That upgrade came last week, as the memory standards group JEDEC revealed that it had published a spec for DDR4 SDRAM, defining "features, functionalities, AC and DC characteristics, packages and ball/signal assignments," that builds on the DDR3 spec, first published in 2007. The DDR4 spec applies to SDRAM devices from 2 GB through 16 GB for x4, x8 and x16 buses. Here's a look at some of the particulars.

“The new standard will enable next generation systems to achieve greater performance, significantly increased packaging density and improved reliability, with lower power consumption,” Macri said.

Double Data Rate

First and foremost, DDR4 memory doubles the maximum transfer rate of DDR3. The new spec supports a per-pin data rate of up to 3.2 giga transfers per second (GT/s), twice that of its predecessor's eventual maximum of 1.6 GT/s (the ceiling was raised over time). And, DDR4's max could likewise go higher, as necessary, to accommodate faster components and bus speeds. So far, the only processor roadmap we've seen in support of DDR4 has been Intel's, with its Haswell server processor slated for 2014; consumer-platform support isn't expected until sometime in 2015.

Meanwhile, JEDEC member company Samsung announced in July that it had begun sampling the "industry's first" 16-GB DDR4 RDIMMs, and that it will also offer a 32-GB module; and Samsung, Micron and other companies already offer smaller-denomination DIMMs that comply with the spec.

Lower Power

The DDR4 spec defines memory that operates on 1.2V, compared with DDR3's 1.5V and 1.35V low-voltage spec. According to Samsung, its DDR4 RDIMMs consume about 40 percent less power than DDR3 memory modules operating at 1.35V. We're not sure what math they used to arrive at that finding, but in a world increasingly mindful of power consumption and rising energy costs, 1.2V is better than 1.35V.

More, Wider Memory

While DDR3 supported DIMM sizes between 512 MB and 8 GB in as many as eight banks, DDR4 quadruples memory top-end by doubling the module maximum to 16 GB (with a 2-GB minimum) in as many as 16 banks. That's math we can handle. What's more, DDR4 can arrange memory banks into as many as four groups, providing faster burst access to memory and separate read, write, activation and refresh operations for each group.

Incidentally, memory speeds of DDR4 will start at 1,600MHz and balloon to 3,200MHz. DDR3 mobiles are available mostly at frequencies between 800MHz and 1,600MHz, although the spec supports 1,866MHz and 2,133MHz memory, according to a comparison chart published by memory maker Micron.

DOWNLOAD

Get free daily email updates!

Very Large Scale Integration (VLSI)

Featured post

Top 5 books to refer for a VHDL beginner

Tuesday 30 October 2012

IBM's New Chip Tech.

Wednesday 24 October 2012

Combinational Loop in Design

HARDWARE

SIMULATION

CODE STYLE

Intel's Haswell chips coming into your PC in first half of next year

Sunday 14 October 2012

Cyclic Redundancy Check - CRC

Saturday 13 October 2012

VHDL- Delta Delay

VHDL-Inertial Delay

VHDL-Transport Delay

Thursday 11 October 2012

VHDL Timing Model

Wednesday 10 October 2012

Digital Logic in Analog Block - How To Test It?

Sunday 7 October 2012

DDR4 SDRAM Standards published by JEDEC

Double Data Rate

Lower Power

More, Wider Memory

Followers