Today we feel to revise what we know about cache memory. A
cache is a memory device that improves performance of the processor by
transparently storing data such that future requests for that data can be
served faster. The data that is stored within a cache might be values that have
been computed earlier or duplicates of original values that are stored
elsewhere.
Access
to cache can result in either one of the following: cache
miss or cache hit.Cache hit means
that the requested data is contained in the cache and cache miss means data is
not found there in cache.On cache hit processor takes data from cache itself
for processing.On cache miss the data is fetched from the original memory
location.Cache memories are volatile and small in storage size.Since the
storage size is small the address decoding takes less time and hence caches are
faster then normal physical memories(RAM's) in computers.
As
I said the data is stored transparently in cache.This means that the user who
is requesting data from the cache need not know whether data is stored in cache
or system memory.It is handled by the processor.The word cache means "conceal" in
French.
A simple
cache contains three fields.
1. An index which
is local to the cache.
2. A tag which
is the index with reference to the main memory.This will let the processor know
the location in main memory where an exact copy of data is stored.
3. Data, which
is actual data needed by the processor.
When processor needs some data from the memory it first checks in cache.It sees
all the tag fields in the cache to see whether same data is available in
cache.If the tag is found then the corresponding data is taken.Otherwise a
cache miss error is asserted and the main memory is accessed.Also the cache
memory is updated with the recent memory access.This is called cache update on
cache miss.
During a cache update if the cache is full, then it has to delete a row.This is
decided on a cache replacement algorithm.Some algorithms are:
1. LRU - Least recently used data is
replaced.
2. MRU - Most recently used data is replaced.
3. Random replacement - Simple, used in ARM processors.
4. Belady's Algorithm - discards the data which may not be
used for the longest time in future.Not perfectly implementable in practice.
The
average memory access time of a cache enabled system can be calculated using
the hit and miss ratio of a cache.
Average memory access time =
(Time_cache * Hit_counts ) + ( (Time_cache + Time_mm) * Miss_counts)
where,
Time_cache
and Time_mm is the time needed to access a location for cache and main memory
respectively.
Hit_counts and Miss_counts are the hit and miss probabilities.
Hit_counts and Miss_counts are the hit and miss probabilities.
There
are two types cache writing: write back(copy-back)
and write
through.
When the
data at a particular memory location is updated then this data must be written
back to cache.If the data is updated only in the cache then it is called write
back.If the updating of data happens both in cache and main memory then it is
called write through.Write through keeps the cache and memory synchronized.In
the write back operation since the cache data is not same as the main memory
data it is marked as "dirty" data.These dirty data will be written back
into main memory when the particular data is cleared from the cache.If a miss
happens in a write-back cache it may sometimes require two memory accesses to
service : one to first write the dirty location to memory and then another to
read the new location from memory.
The main memory locations may be altered without proper updating in cache by
peripherals using DMA or by a multi core processor.This results in a out of
date data in cache.These type of data is called "stale"
data.To solve these stale data problems we have to use cache coherence
protocols between the cache managers to keep the data consistent.
All
caches are CAM(content accessible memory).And for efficiency we have to scan all the memory contents
in one cycle.This requires parallel hardware.Also higher the memory size the
more is the memory access time.
Let us see now,how a cache is made.Say we have a 32 bit main memory in our
system and the cache chip size is 4 Kb.Also say each line in cache stores 32
bytes so that there are totally 128 lines.Each line in cache have two
fields. Address(4 bytes) and Data.The address is further divided into two
fields- Tag(27 bits) and offset(5 bits for indexing a particular byte among the
data).Remember that the tag contain the MSB 27 bits of the address here.These
kind of caches are called Fully associative caches.Since the tag is 27 bits(relatively long) it takes more time
to read data from Fully associative cache.Also more hardware circuit is
required for parallel reading of tags from the CAM type cache.So they are expensive
but more efficient.
Fully associative cache |
Another
type of cache architecture is known as direct mapped cache.In
this the address is divided into three fields named tag(20 bits),index(7 bits
used as an index the 128 lines in cache) and offset(5 bits).The problem with
this type of cache is that the cache is less efficient since the main memory
cannot be copied to any line in cache as in fully associative cache.This is
because the addresses with the same index will be mapped to the same line in
cache.But the cache access time is less here.In certain situations you may get
a cache miss for almost every access.So they are cheap
but less efficient.
Direct mapped Cache |
Another
type of cache is called set associative cache which has the advantages of both direct mapped and
fully associative caches.These are again subdivided based on the number of bits
in the index field.
2. 2 way set associative cache - In this type of cache we have two group of lines,each containing 64 lines.The cache has the same number of fields as direct mapped cache but tag has 21 bits and index has 6 bits here.
2. 4 way set associative cache - Here we have 4 groups each contains 32 lines.index has 5 bits and tag has 22 bits.
2. 2 way set associative cache - In this type of cache we have two group of lines,each containing 64 lines.The cache has the same number of fields as direct mapped cache but tag has 21 bits and index has 6 bits here.
2. 4 way set associative cache - Here we have 4 groups each contains 32 lines.index has 5 bits and tag has 22 bits.
2-way and 4-way set associative caches |