# tactiq.io free youtube transcript

# How does Computer Memory Work? 💻ðŸ›

# https://www.youtube.com/watch/7J7X7aZvMXQ

 

00:00:00.300 Have you ever wondered what’s happening inside your computer when you load a program or video Â

00:00:05.220 game?  Well, millions of operations are happening, but perhaps the most common is simply just copying Â

00:00:12.480 data from a solid-state drive or SSD into dynamic random-access memory or DRAM.   An SSD stores all Â

00:00:22.440 the programs and data for long-term storage, but when your computer wants to use that data, Â

00:00:27.840 it has to first move the appropriate files into DRAM, which takes time, Â

00:00:33.120 hence the loading bar.  Because your CPU works only with data after it’s been moved to DRAM, Â

00:00:39.600 it’s also called working memory or main memory.The reason why your desktop uses both SSDs and Â

00:00:47.100 DRAM is because Solid-State Drives permanently store data in massive 3D arrays composed of a Â

00:00:54.300 trillion or so memory cells, yielding terabytes of storage, whereas DRAM temporarily stores data in Â

00:01:02.040 2D arrays composed of billions of tiny capacitor memory cells yielding gigabytes of working memory. Â

00:01:09.480 Accessing any section of cells in the massive SSD array and reading or writing data takes Â

00:01:16.200 about 50 microseconds whereas reading or writing from any DRAM capacitor memory Â

00:01:22.020 cell takes about 17 nanoseconds, which is 3000 times faster.  For comparison, a supersonic jet Â

00:01:30.540 going at Mach 3 is around 3000 times faster than a moving tortoise.  So, the speed of Â

00:01:36.780 17 nanosecond DRAM versus 50 microsecond SSD is like comparing a supersonic jet to a tortoise. Â

00:01:45.420   However, speed is just one factor.  DRAM is limited to a 2D array and temporarily stores Â

00:01:52.620 one bit per memory cell. For example, this stick of DRAM with 8 chips holds 16 gigabytes of data, Â

00:02:00.420 whereas a solid-state drive of a smaller size can hold 2 terabytes of data, more Â

00:02:06.960 than 100 times that of DRAM.  Additionally, DRAM requires power to continuously store Â

00:02:13.620 and refresh the data held in its capacitors.  Therefore, computers use both SSDs and DRAM and, Â

00:02:21.900 by spending a few seconds of loading time to copy data from the SSD to the DRAM, Â

00:02:27.360 and then prefetching, which is the process of moving data before it’s needed, your computer can Â

00:02:34.020 store terabytes of data on the SSD and then access the data from programs that were preemptively Â

00:02:40.680 copied into the DRAM in a few nanoseconds. For example, many video games have a loading Â

00:02:46.980 time to start up the game itself, and then a separate loading time to load a save file.  Â

00:02:52.260 During the process of loading a save file, all the 3D models, textures, and the environment of Â

00:02:59.220 your game state are moved from the SSD into DRAM so any of it can be accessed in a few nanoseconds, Â

00:03:06.480 which is why video games have DRAM capacity requirements.  Just imagine, without DRAM, Â

00:03:12.540 playing a game would be 3,000 times slower.  We covered solid-state drives in other videos, Â

00:03:19.140 so in this video, we’re going to take a deep dive into this 16-gigabyte stick of DRAM.  First, Â

00:03:26.160 we’ll see exactly how the CPU communicates and moves data from an SSD to DRAM.  Then Â

00:03:32.880 we’ll open up a DRAM microchip and see how billions of memory cells are organized into Â

00:03:38.400 banks and how data is written to and read from groups of memory cells.  In the process, we’ll Â

00:03:45.300 dive into the nanoscopic structures inside individual memory cells and see how each Â

00:03:50.940 capacitor physically stores 1 bit of data.  Finally, we’ll explore some breakthroughs and Â

00:03:56.880 optimizations such as the burst buffer and folded DRAM layouts that enable DRAM to move data around Â

00:04:04.440 at incredible speeds. A few quick notes.  First, you can find similar DRAM chips inside Â

00:04:11.400 GPUs, Smartphones, and many other devices, but with different optimizations.  As examples, Â

00:04:18.420 GPU DRAM or VRAM, located all around the GPU chip, has a larger bandwidth and can Â

00:04:26.040 read and write simultaneously, but operates at a lower frequency, and DRAM in your smartphone Â

00:04:32.280 is stacked on top of the CPU and is optimized for smaller packaging and lower power consumption.  Â

00:04:39.360 Second, this video is sponsored by Crucial.  Although they gave me this Â

00:04:44.280 stick of DRAM to model and use in the video, the content was independently Â

00:04:49.080 researched and not influenced by them.  Third, there are faster memory structures Â

00:04:54.120 in your CPU called cache memory and even faster registers.  All these types of memory create a Â

00:05:01.140 memory hierarchy, with the main trade-off being speed versus capacity while keeping Â

00:05:06.660 prices affordable to consumers and optimizing the size of each microchip for manufacturing.Â

00:05:12.480 Fourth, you can see how much of your DRAM is being utilized by Â

00:05:17.220 each program by opening your computer’s resource monitor and clicking on memory.Â

00:05:22.440 Fifth, there are different generations of DRAM, and we’ll explore DDR5.  Many of the key concepts Â

00:05:29.460 that we explain apply to prior generations, although the numbers may be different.  Â

00:05:34.020 Sixth, 17 nanoseconds is incredibly fast!  Electricity travels at around 1 foot per Â

00:05:41.820 nanosecond, and 17 nanoseconds is about the time it takes for light to travel across a room.Â

00:05:47.460 Finally, this video is rather long as it covers a lot of what there is to know around DRAM.  We Â

00:05:54.840 recommend watching it first at one point two five times speed, and then a second time at Â

00:06:00.960 one and a half speed to fully comprehend this complex technology.  Stick around because this Â

00:06:06.900 is going to be an incredibly detailed video.  To start, a stick of DRAM is also called a Dual Â

00:06:14.100 Inline Memory Module or DIMM and there are 8 DRAM chips on this particular DIMM.  On the Â

00:06:21.780 motherboard, there are 4 DRAM slots, and when plugged in, the DRAM is directly connected to Â

00:06:28.500 the CPU via 2 memory channels that run through the motherboard.  Note that the left two DRAM Â

00:06:34.320 slots share these memory channels, and the right two share a separate channel.  Let’s move to Â

00:06:40.380 look inside the CPU at the processor.  Along with numerous cores and many other elements, Â

00:06:46.320 we find the memory controller which manages and communicates with the DRAM.  There’s also Â

00:06:52.080 a separate section for communicating with SSDs plugged into the M2 slots and with SSDs and Â

00:06:58.800 hard drives plugged into SATA connectors.  Using these sections, along with data mapping tables, Â

00:07:04.800 the CPU manages the flow of data from the SSD to DRAM, as well as from DRAM Â

00:07:10.920 to cache memory for processing by the cores.Let’s move back to see the memory channels.  Â

00:07:16.380 For DDR5 each memory channel is divided into two parts, Channel A and Channel B. These two memory Â

00:07:25.020 channels A and B independently transfer 32 bits at a time using 32 data wires.   Using 21 additional Â

00:07:35.100 wires each memory channel carries an address specifying where to read or write data and, using Â

00:07:42.300 7 control signal wires, commands are relayed.The addresses and commands are sent to and shared Â

00:07:49.200 by all 4 chips on the memory channel which work in parallel.  However, the 32-bit data Â

00:07:55.800 lines are divided among the chips and thus each chip only reads or writes 8 bits at a time.  Â

00:08:01.800 Additionally, power for DRAM is supplied by the motherboard and Â

00:08:07.200 managed by these chips on the stick itself.Next, let’s open and look inside one of these Â

00:08:13.140 DRAM microchips.  Inside the exterior packaging, we find an interconnection matrix that connects Â

00:08:20.400 the ball grid array at the bottom with the die which is the main part of this microchip.  This 2 Â

00:08:26.460 gigabyte DRAM die is organized into 8 bank groups composed of 4 banks each, totaling 32 banks.  Â

00:08:34.799 Within each bank is a massive array, 65,536 memory cells tall by 8192 cells across, essentially rows Â

00:08:47.100 and columns in a grid, with tens of thousands of wires, and supporting circuitry running outside Â

00:08:53.340 each bank.  Instead of looking at this die, we’re going to transition to a functional diagram, Â

00:08:59.040 and then reorganize the banks and bank groups.In order to access 17 billion memory cells, Â

00:09:06.240 we need a 31-bit address.  3 bits are used to select the appropriate bank group, then 2 bits Â

00:09:13.560 to select the bank.  Next 16 bits of the address are used to determine the exact row out of 65 Â

00:09:21.840 thousand.  Because this chip reads or writes 8 bits at a time, the 8192 columns are grouped by Â

00:09:30.420 8 memory cells, all read or written at a time, or ‘by 8’, and thus only 10 bits are needed for Â

00:09:37.740 the column address.  One optimization is that this 31-bit address is separated into two parts Â

00:09:44.220 and sent using only 21 wires.  First, the bank group, bank, and row address are sent, and then Â

00:09:52.260 after that the column address.  Next, we’ll look inside these physical memory cells, but first, Â

00:09:58.200 let’s briefly talk about how these structures are manufactured as well as this video’s sponsor.   Â

00:10:03.840 This incredibly complicated die, also called an integrated circuit, Â

00:10:09.540 is manufactured on 300-millimeter silicon wafers, 2500ish dies at a time.  On each die are billions Â

00:10:18.360 of nanoscopic memory cells that are fabricated using dozens of tools and hundreds of steps in Â

00:10:24.420 a semiconductor fabrication plant or fab.  This one was made by Micron which manufactures around Â

00:10:31.020 a quarter of the world’s DRAM, including both Nvidia’s and AMD’s VRAM in their GPUs Micron also Â

00:10:39.600 has its own product line of DRAM and SSDs under the brand Crucial which, as mentioned earlier, Â

00:10:46.080 is the sponsor of this video.  In addition to DRAM, Micron is one of the world’s leading Â

00:10:51.600 suppliers of solid-state drives such as this Crucial P5+ M2 NVME SSD.   By installing your Â

00:11:00.840 operating system and video games on a Crucial NVMe solid-state drive, you’ll be sure to have Â

00:11:07.260 incredibly fast loading times and smooth gameplay, and if you do video editing, make sure all those Â

00:11:14.160 files are on a fast SSD like this one as well.  This is because the main speed bottleneck for Â

00:11:20.160 loading is predominantly limited by the speed of the SSD or hard drive where the files are stored.Â

00:11:26.340 For example, this hard drive can only transfer data at around 150 megabytes a second whereas Â

00:11:34.080 this Crucial NVMe SSD can transfer data at a rate of up to 6,600 megabytes a second, which, Â

00:11:42.720 for comparison is the speed of a moving tortoise versus a galloping horse.  By using a Crucial NVMe Â

00:11:50.580 SSD, loading a video game that requires gigabytes of DRAM is reduced from a minute or more down to Â

00:11:58.560 a couple seconds.  Check out the Crucial NVMe SSDs using the link in the description below.Â

00:12:09.300 Let’s get back to the details of how DRAM works and zoom in to explore a single memory cell Â

00:12:15.780 situated in a massive array. This memory cell is called a 1T1C cell and is a few dozen nanometers Â

00:12:24.240 in size.  It has two parts, a capacitor to store one bit of data in the form of electrical charges Â

00:12:31.080 or electrons and a transistor to access and read or write data.  The capacitor is shaped like a Â

00:12:37.740 deep trench dug into silicon and is composed of two conductive surfaces separated by a dielectric Â

00:12:44.400 insulator or barrier just a few atoms thick, which stops the flow of electrons but allows electric Â

00:12:50.940 fields to pass through.  If this capacitor is charged up with electrons to 1 volt, Â

00:12:56.760 it’s a binary 1, and if no charges are present and it’s at 0 volts, it’s a binary 0, and thus Â

00:13:04.740 this cell only holds one bit of data.  Designs of capacitors are constantly evolving but in Â

00:13:11.400 this trench capacitor, the depth of the silicon is utilized to allow for larger capacitive storage, Â

00:13:17.400 while taking up as little area as possible.Next let’s look at the access transistor and Â

00:13:24.300 add in two wires.  The wordline wire connects to the gate of the transistor while the bitline wire Â

00:13:31.260 connects to the other side of the transistor’s channel.  Applying a voltage to the wordline Â

00:13:36.480 turns on the transistor, and, while it’s on, electrons can flow through the channel thus Â

00:13:42.000 connecting the capacitor to the bitline.  This allows us to access and charge up the capacitor Â

00:13:47.520 to write a 1 or discharge the capacitor to write a 0.  Additionally, we can read the stored value Â

00:13:54.840 in the capacitor by measuring the amount of charge.  However, when the wordline is off, Â

00:14:00.000 the transistor is turned off, and the capacitor is isolated from the bitline thus saving the Â

00:14:05.700 data or charge that was previously written.  Note that because this transistor is incredibly small, Â

00:14:12.120 only a few dozen nanometers wide, electrons slowly leak across the channel, and thus over time the Â

00:14:19.800 capacitor needs to be refreshed to recharge the leaked electrons. We’ll cover exactly how Â

00:14:25.860 refreshing memory cells works a little later.As mentioned earlier, this 1T1C memory cell is Â

00:14:33.540 one of 17 billion inside this single die and is organized into massive arrays called banks.  So, Â

00:14:41.340 let’s build a small array for illustrative purposes.  In our array, each of the wordlines Â

00:14:47.640 is connected in rows, and then the bitlines are connected in columns.  Wordlines and bitlines Â

00:14:53.820 are on different vertical layers so one can cross over the other, and they never touch. Â

00:15:00.540 Let’s simplify the visual and use symbols for the capacitors and the transistors.  Just as before, Â

00:15:06.720 the wordlines connect to each transistor’s control gate in rows, and then all the bitlines in columns Â

00:15:13.980 connect to the channel opposite each capacitor. As a result, when a wordline is active, Â

00:15:19.980 all the capacitors in only that row are connected to their corresponding bitlines, Â

00:15:24.780 thereby activating all the memory cells in that row.  At any given time only one wordline is Â

00:15:31.920 active because, if more than one wordline were active, then multiple capacitors in a column Â

00:15:37.620 would be connected to the bitline and the data storage functionalities of these capacitors would Â

00:15:42.960 interfere with one another, making them useless.  As mentioned earlier, within a single bank there Â

00:15:48.900 are 65,536 rows and 8,192 columns and the 31-bit address is used to activate a group of just 8 Â

00:15:59.760 memory cells.  The first 5 bits select the bank, and the next 16-bits are sent to a row decoder Â

00:16:06.420 to activate a single row.  For example, this binary number turns on the wordline row 27,524, Â

00:16:15.120 thus turning on all transistors in that row and connecting the 8,192 capacitors to their bitlines, Â

00:16:23.640 while at the same time the other 65 thousandish wordlines are all off.  Â

00:16:29.340 Here’s the logic diagram for a simple decoder.The remaining 10 bits of the address are sent Â

00:16:35.640 to the column multiplexer.  This multiplexer takes in the 8192 bitlines on the top, and, Â

00:16:42.540 depending on the 10-bit address, connects a specific group of 8 bitlines to the 8 input Â

00:16:48.600 and output IO wires at the bottom.  For example, if the 10-bit address we this, then only the Â

00:16:55.860 bitlines 4,784 through 4,791 would be connected to the IO wires, and the rest of the 8000ish Â

00:17:05.579 bitlines would be connected to nothing.  Here’s the logic diagram for a simple multiplexer.  Â

00:17:11.460 We now have the means of accessing any memory cell in this massive array; however, Â

00:17:16.680 to understand the three basic operations, reading, writing, and refreshing let’s add Â

00:17:22.859 two elements to our layout:  A sense amplifier at the bottom of each bitline, and a read and Â

00:17:28.800 write driver outside of the column multiplexer.Let’s look at reading from a group of memory Â

00:17:34.440 cells.  First the read command and 31-bit address are sent from the CPU to the DRAM.  The first 5 Â

00:17:42.360 bits select a specific bank. The next step is to turn off all the wordlines in that bank, Â

00:17:48.300 thereby isolating all the capacitors, and then precharge all 8000ish bitlines to .5 volts.  Next Â

00:17:57.240 the 16-bit row address turns on a row, and all the capacitors in that row are connected to their Â

00:18:03.360 bitlines.  If an individual capacitor holds a 1 and is charged to 1 volt, then some charge flows Â

00:18:10.980 from the capacitor onto the .5-volt bitline, and the voltage on the bitline increases.  The sense Â

00:18:17.640 amplifier then detects this slight change or perturbation of voltage on the bitline, Â

00:18:22.320 amplifies the change, and pushes the voltage on the bitline all the way up to 1 volt. However, Â

00:18:28.980 if a 0 is stored in the capacitor, charge flows from the bitline into the capacitor, Â

00:18:35.100 and the .5-volt bitline decreases in voltage.  The sense amplifier then sees this change, Â

00:18:41.700 amplifies it and drives the bitline voltage down to 0 volts or ground.  The sense amplifier is Â

00:18:49.080 necessary because the capacitor is so small, and the bitline is rather long, and thus the Â

00:18:54.600 capacitor needs to have an additional component to sense and amplify whatever value is stored.  Â

00:19:00.420 Now, all 8000ish bitlines are driven to 1 volt or 0 volts corresponding to the stored Â

00:19:08.100 charge in the capacitors of the activated row, and this row is now considered open.  Â

00:19:12.900 Next, the column select multiplexer uses the 10-bit column address to connect the Â

00:19:19.020 corresponding 8 bitlines to the read driver which then sends these 8 values Â

00:19:24.420 and voltages over the 8 data wires to the CPU. Writing data to these memory cells is similar Â

00:19:31.380 to reading, however with a few key differences.First the write command, address, and 8 bits to Â

00:19:39.060 be written are sent to the DRAM chip.  Next, just like before the bank is selected, the capacitors Â

00:19:46.860 are isolated, and the bitlines are precharged to .5 volts.  Then, using a 16-bit address, Â

00:19:54.300 a single row is activated, the capacitors perturb the bitline, and the sense amplifiers sense this Â

00:20:01.440 and drive the bitlines to a 1 or 0 thus opening the row.  Next the column address goes to the Â

00:20:09.180 multiplexer, but, this time, because a write command was sent, the multiplexer connects the Â

00:20:15.180 specific 8 bitlines to the write driver which contains the 8 bits that the CPU had sent along Â

00:20:20.820 the data wires and requested to write.  These write drivers are much stronger than the sense Â

00:20:26.160 amplifier and thus they override whatever voltage was previously on the bitline, and drive each of Â

00:20:32.160 the 8 bitlines to 1 volt for a 1 to be written, or 0 volts for a 0.  This new bitline voltage Â

00:20:39.780 overrides the previously stored charges or values in each of the 8 capacitors in the open row, Â

00:20:45.780 thereby writing 8 bits of data to the memory cells corresponding to the 31-bit address.Â

00:20:52.500 Three quick notes.  First, as a reminder, writing and reading happens concurrently with all the 4 Â

00:20:58.440 chips in the shared memory channel, using the same 31-bit address and command wires, Â

00:21:03.540 but with different data wires for each chip.  Second, with DDR5 for a binary 1 the voltage Â

00:21:11.220 is actually 1.1 volts, for DDR4 it’s 1.2 volts, and prior generations had even higher voltages, Â

00:21:19.860 with the bitline precharge voltages being half of these voltages.  However, for DDR5, Â

00:21:26.460 when writing or refreshing a higher voltage, around 1.4 volts is applied and stored in each Â

00:21:33.060 capacitor for a binary 1 because charge leaks out over time. However, for simplicity, we’re Â

00:21:39.900 going to stick with 1 and 0.  Third, the number of bank groups, banks, bitlines and wordlines Â

00:21:46.860 varies widely between different generations and capacities but is always in powers of 2.Â

00:21:53.400 Let’s move on and discuss the third operation which is refreshing the memory cells in a bank.  Â

00:21:59.220 As mentioned earlier, the transistors used to isolate the capacitors are incredibly small, Â

00:22:05.040 and thus charges leak across the channel.  The refresh operation is rather simple and is a Â

00:22:11.160 sequence of closing all the rows, precharging the bitlines to .5 volts, and opening a row.  Â

00:22:17.400 To refresh, just as before, the capacitors perturb the bitlines and then the sense amplifiers drive Â

00:22:24.240 the bitlines and capacitors of the open row fully up to 1 volt or down to 0 volts depending on the Â

00:22:31.740 stored value of the capacitor, thereby refilling the leaked charge.  This process of row closing, Â

00:22:38.395 precharging, opening, and sense amplifying happens row after row, taking 50 nanoseconds for each row, Â

00:22:46.080 until all 65 thousandish rows are refreshed taking a total of 3 milliseconds or so to Â

00:22:53.400 complete.  The refresh operation occurs once every 64 milliseconds for each bank, Â

00:22:58.620 because that’s statistically below the worst-case time it takes for a memory Â

00:23:03.360 cell to leak too much charge to make a stored 1 turn into a 0, thus resulting in a loss of data.Â

00:23:12.120 Let’s take a step back and consider the incredible amount of data that is moved Â

00:23:17.220 through DRAM memory cells. These banks of memory cells handle up to 4 thousand 8 hundred million Â

00:23:24.600 requests to read and write data every second while refreshing every memory cell in each Â

00:23:30.900 bank row by row around 16 times a second. That’s a staggering amount of data movement Â

00:23:37.440 and illustrates the true strength of computers. Yes, they do simple things like comparisons, Â

00:23:44.100 arithmetic, and moving data around, but at a rate of billions of times a second. Â

00:23:50.700 Now, you might wonder why computers need to do so much data movement. Well, Â

00:23:55.920 take this video game for example. You have obvious calculations like the movement of your character Â

00:24:01.980 and the horse. But then there are individual grasses, trees, rocks, and animals whose Â

00:24:07.680 positions and geometries are stored in DRAM. And then the environment such as the lighting Â

00:24:14.040 and shadows change the colors and textures of the environment in order to create a realistic world.Â

00:24:21.420 Next, we’re going to explore breakthroughs and optimizations that allow DRAM to be incredibly Â

00:24:28.260 fast. But, before we get into all those details, we would greatly appreciate it Â

00:24:33.660 if you could take a second to hit that like button, subscribe if you haven’t already, Â

00:24:37.980 and type up a quick comment below, as it helps get this video out to others.  Also, we have a Patreon Â

00:24:45.060 and would appreciate any support.  This is our longest and most detailed video by far, and we’re Â

00:24:51.600 planning more videos that get into the inner details of how computers work.  We can’t do it Â

00:24:57.240 without your help, so thank you for watching and doing these three quick things. It helps a ton.Â

00:25:07.740 The first complex topic which we’ll explore is why there are 32 banks, as well as what the Â

00:25:14.340 parameters on the packaging of DRAM are.  After that, we’ll explore burst buffers, Â

00:25:19.800 sub-arrays, and folded DRAM architecture and what’s inside the sense amplifier.Â

00:25:25.320 Let’s take a look at the banks.  As mentioned earlier opening a single Â

00:25:30.000 row within a bank requires all these steps and this process takes time.

00:25:34.440 However, if a row were already open, we could read or write to any section of Â

00:25:39.600 8 memory cells using only the 10-bit column address and the column select Â

00:25:44.580 multiplexer.   When the CPU sends a read or write command to a row that’s already open, Â

00:25:51.000 it’s called a row hit or page hit, and this can happen over and over.  With a row hit, Â

00:25:56.580 we skip all the steps required to open a row, and just use the 10-bit column address to multiplex a Â

00:26:03.120 different set of 8 columns or bitlines, connecting them to the read or write driver, thereby saving Â

00:26:09.000 a considerable amount of time.  A row miss is when the next address is for a different row, Â

00:26:14.580 which requires the DRAM to close and isolate the currently open row, and then open the new row. Â

00:26:20.580 On a package of DRAM there are typically 4 numbers specifying timing parameters regarding row hits, Â

00:26:27.464 precharging, and row misses.  The first number refers to the time it takes between sending an Â

00:26:33.060 address with a row open, thus a row hit, to receiving the data stored in those columns.  Â

00:26:38.880 The next number is the time it takes to open a row if all the lines are isolated and the Â

00:26:44.640 bitlines are precharged.  Then the next number is the time it takes to precharge the bitlines Â

00:26:49.980 before opening a row, and the last number is the time it takes between a row activation and Â

00:26:55.620 the following precharge.  Note that these numbers are measured in clock cycles.  Â

00:27:00.240 Row hits are also the reason why the address is sent in two sections, first the bank selection and Â

00:27:07.500 row address called RAS and then the column address called CAS. If the first part, the bank selection Â

00:27:14.640 and row address, matches a currently open row, then it’s a row hit, and all the DRAM needs is the Â

00:27:20.880 column address and the new command, and then the multiplexer simply moves around the open row.  Â

00:27:26.040 Because of the time saving in accessing an open row, the CPU memory controller, programs, Â

00:27:32.220 and compilers are optimized for increasing the number of subsequent row hits. The opposite, Â

00:27:38.460 called thrashing, is when a program jumps around from one row to a different row over and over, Â

00:27:44.400 and is obviously incredibly inefficient both in terms of energy and time.  Â

00:27:49.860 Additionally, DDR5 DRAM has 32 banks for this reason.  Each bank’s rows, columns, Â

00:27:57.180 sense amplifiers and row decoders operate independently of one another, and thus multiple Â

00:28:03.360 rows from different banks can be open all at the same time, increasing the likelihood of a row hit, Â

00:28:09.660 and reducing the average time it takes for the CPU to access data.  Furthermore, by having multiple Â

00:28:16.200 bank groups, the CPU can refresh one bank in each bank group at a time while using the other three, Â

00:28:22.800 thus reducing the impact of refreshing. A question you may have had earlier is why Â

00:28:28.620 are banks significantly taller than they are wide? Well, by combining all the banks together Â

00:28:34.560 one next to the other you can think of this chip as actually being 65 thousand rows tall by 262 Â

00:28:44.280 thousand columns wide. And, by adding 31 equally spaced divisions between the columns, thus Â

00:28:50.760 creating banks, we allow for much more flexibility and efficiency in reading, writing and refreshing.Â

00:28:58.380 Also, note that on the DRAM packaging are its capacity in Gigabytes, the number of Â

00:29:04.800 millions of data transfers per second, which is two times the clock frequency, and the peak Â

00:29:10.560 data transfer rate in Megabytes per second.The next design optimization we’ll explore Â

00:29:16.620 is the burst buffer and burst length.  Let’s add a 128-bit read and write temporary storage location, Â

00:29:23.760 called a burst buffer to our functional diagram.  Instead of 8 wires coming out of the multiplexer, Â

00:29:30.600 we’re going to have 128 wires that connect to these 128-bit buffer locations.  Next Â

00:29:38.340 the 10-bit column address is broken into two parts, 6 bits are used for the multiplexer, Â

00:29:44.340 and 4 bits are for the burst buffer. Let’s explore a reading command.  With Â

00:29:49.560 our burst buffer in place, 128 memory cells and bitlines are connected to the burst buffer using Â

00:29:56.760 the 6 column bits, thereby temporarily loading, or caching 128 values into the burst buffer.  Â

00:30:04.140 Using the 4 bits for the buffer, 8 quickly accessed data locations in the burst buffer Â

00:30:10.140 are connected to the read drivers and the data is sent to the CPU.  By cycling through these 4 bits, Â

00:30:16.800 all 16 sets of 8 bits are read out, and thus the burst length is 16.  After that a new set of 128 Â

00:30:25.320 bitlines and values are connected and loaded into the burst buffer.  There’s also a write Â

00:30:31.080 burst buffer which operates in a similar way.The benefit of this design is that 16 sets of Â

00:30:37.260 8 bits per microchip, totaling 1024 bits, can be accessed and read or written extremely quickly, Â

00:30:45.060 as long as the data is all next to one another, but at the same time we still Â

00:30:49.800 have the granularity and ability to access any set of 8 bits if our data requests jump around.Â

00:30:56.400 The next design optimization is that this bank of 65536 rows by 8192 columns is rather massive, Â

00:31:07.740 and results in extremely long wordlines and bitlines, especially when compared to the size of Â

00:31:14.280 each trench capacitor memory cell.  Therefore, the massive array is broken up into smaller Â

00:31:20.400 blocks 1,024 by 1,024, with intermediate sense amplifiers below each subarray, Â

00:31:27.840 and subdividing wordlines and using a hierarchical row decoding scheme.  By subdividing the bitlines, Â

00:31:34.800 the distance and amount of wire that each tiny capacitor is connected to as it perturbs the Â

00:31:40.860 bitline to the sense amplifier is reduced, and thus the capacitor doesn’t have to be as big.  By Â

00:31:47.160 subdividing the wordlines the capacitive load from eight thousandish transistor gates and channels is Â

00:31:53.280 decreased, and thus the time it takes to turn on all the access transistors in a row is decreased.Â

00:31:59.880 The final topic we’re going to talk about is the most complicated.  Remember how we had Â

00:32:05.160 a sense amplifier connected to the bottom of each bitline?  Well, this optimization has two Â

00:32:10.500 bitlines per column going to each sense amplifier and alternating rows of memory cells connected to Â

00:32:17.040 the left and right bitlines, thus doubling the number of bitlines.  When one row is active, Â

00:32:22.500 half of the bitlines are active while the other half are passive and vice versa when the next row Â

00:32:28.020 is active.   Moving down to see inside the sense amplifier we find a cross-coupled inverter.  How Â

00:32:34.680 does this work?  Well, when the active bitline is a 1, the passive bitline will be driven by this Â

00:32:41.040 cross-coupled inverter to the opposite value of 0, and when the active is a 0, the passive Â

00:32:46.680 becomes a 1.  Note that the inverted passive bitline isn’t connected to any memory cells, Â

00:32:52.200 and thus it doesn’t mess up any stored data.  The cross-coupled inverter makes it such that these Â

00:32:58.680 two bitlines are always going to be opposite one another, and they’re called a differential Â

00:33:03.840 pair.  There are three benefits to this design.  First, during the precharge step, we want to bring Â

00:33:09.600 all the bitlines to .5 volts and, by having a differential pair of active and passive bitlines, Â

00:33:15.960 the easiest solution is to disconnect the cross coupled inverters and open a channel between the Â

00:33:22.260 two using a transistor.  The charge easily flows from the 1 bitline to the 0, and they Â

00:33:28.860 both average out and settle at .5 volts.  The other two benefits are noise immunity, Â

00:33:34.320 and a reduction in parasitic capacitance of the bitline.  These benefits are related to that fact Â

00:33:39.720 that by creating two oppositely charged electric wires with electric fields going from one to Â

00:33:45.540 the other we reduce the amount of electric fields emitted in stray directions and relatedly increase Â

00:33:51.960 the ability of the sense amplifier to amplify one bitline to 1 volt and the other to 0 volts.  Â

00:33:58.680 One final note is that when discussing DRAM, one major topic is the timing of addresses, Â

00:34:04.680 command signals and data, and the related acronyms DDR or double data rate, and SDRAM, Â

00:34:12.179 or Synchronous DRAM.  These topics were omitted from this video because it would have taken an Â

00:34:17.760 additional 15 minutes to properly explore.  That’s Â

00:34:24.900 pretty much it for the DRAM, and we are grateful you made it this far into the video.  We believe Â

00:34:30.600 the future will require a strong emphasis on engineering education and we’re thankful to all Â

00:34:36.780 our Patreon and YouTube Membership Sponsors for supporting this dream.  If you want to Â

00:34:41.760 support us on YouTube Memberships, or Patreon, you can find the links in the description. Â

00:34:47.100 A huge thanks goes to the Nathan, Peter, and Jacob who are doctoral students at the Florida Â

00:34:53.219 Institute for Cybersecurity Research for helping to research and review this video’s content!  They Â

00:35:00.000 do foundational research on finding the weak points in device security and whether hardware Â

00:35:05.220 is compromised.  If you want to learn more about the FICS graduate program or their work, check out Â

00:35:11.460 the website using the link in the description.  This is Branch Education, and we create 3D Â

00:35:18.480 animations that dive deep into the technology that drives our modern world.  Watch another Branch Â

00:35:24.360 video by clicking one of these cards or click here to subscribe.  Thanks for watching to the end!