# tactiq.io free youtube transcript

# How does Computer Memory Work? ðŸ’»ðŸ›

# https://www.youtube.com/watch/7J7X7aZvMXQ

00:00:00.300 Have you ever wondered whatâ€™s happening insideÂ your computer when you load a program or videoÂ Â

00:00:05.220 game?Â Well, millions of operations are happening,Â but perhaps the most common is simply just copyingÂ Â

00:00:12.480 data from a solid-state drive or SSD into dynamicÂ random-access memory or DRAM. Â An SSD stores allÂ Â

00:00:22.440 the programs and data for long-term storage,Â but when your computer wants to use that data,Â Â

00:00:27.840 it has to first move the appropriateÂ files into DRAM, which takes time,Â Â

00:00:33.120 hence the loading bar.Â Because your CPU worksÂ only with data after itâ€™s been moved to DRAM,Â Â

00:00:39.600 itâ€™s also called working memory or main memory.The reason why your desktop uses both SSDs andÂ Â

00:00:47.100 DRAM is because Solid-State Drives permanentlyÂ store data in massive 3D arrays composed of aÂ Â

00:00:54.300 trillion or so memory cells, yielding terabytes ofÂ storage, whereas DRAM temporarily stores data inÂ Â

00:01:02.040 2D arrays composed of billions of tiny capacitorÂ memory cells yielding gigabytes of working memory.Â Â

00:01:09.480 Accessing any section of cells in the massiveÂ SSD array and reading or writing data takesÂ Â

00:01:16.200 about 50 microseconds whereas reading orÂ writing from any DRAM capacitor memoryÂ Â

00:01:22.020 cell takes about 17 nanoseconds, which is 3000Â times faster.Â For comparison, a supersonic jetÂ Â

00:01:30.540 going at Mach 3 is around 3000 times fasterÂ than a moving tortoise.Â So, the speed ofÂ Â

00:01:36.780 17 nanosecond DRAM versus 50 microsecond SSD isÂ like comparing a supersonic jet to a tortoise.Â Â

00:01:45.420 Â Â However, speed is just one factor.Â DRAM isÂ limited to a 2D array and temporarily storesÂ Â

00:01:52.620 one bit per memory cell. For example, this stickÂ of DRAM with 8 chips holds 16 gigabytes of data,Â Â

00:02:00.420 whereas a solid-state drive of a smallerÂ size can hold 2 terabytes of data, moreÂ Â

00:02:06.960 than 100 times that of DRAM.Â Additionally,Â DRAM requires power to continuously storeÂ Â

00:02:13.620 and refresh the data held in its capacitors.Â Â Therefore, computers use both SSDs and DRAM and,Â Â

00:02:21.900 by spending a few seconds of loading timeÂ to copy data from the SSD to the DRAM,Â Â

00:02:27.360 and then prefetching, which is the process ofÂ moving data before itâ€™s needed, your computer canÂ Â

00:02:34.020 store terabytes of data on the SSD and then accessÂ the data from programs that were preemptivelyÂ Â

00:02:40.680 copied into the DRAM in a few nanoseconds.Â For example, many video games have a loadingÂ Â

00:02:46.980 time to start up the game itself, and then aÂ separate loading time to load a save file.Â Â Â

00:02:52.260 During the process of loading a save file, allÂ the 3D models, textures, and the environment ofÂ Â

00:02:59.220 your game state are moved from the SSD into DRAMÂ so any of it can be accessed in a few nanoseconds,Â Â

00:03:06.480 which is why video games have DRAM capacityÂ requirements.Â Just imagine, without DRAM,Â Â

00:03:12.540 playing a game would be 3,000 times slower.Â Â We covered solid-state drives in other videos,Â Â

00:03:19.140 so in this video, weâ€™re going to take a deepÂ dive into this 16-gigabyte stick of DRAM.Â First,Â Â

00:03:26.160 weâ€™ll see exactly how the CPU communicatesÂ and moves data from an SSD to DRAM.Â ThenÂ Â

00:03:32.880 weâ€™ll open up a DRAM microchip and see howÂ billions of memory cells are organized intoÂ Â

00:03:38.400 banks and how data is written to and read fromÂ groups of memory cells.Â In the process, weâ€™llÂ Â

00:03:45.300 dive into the nanoscopic structures insideÂ individual memory cells and see how eachÂ Â

00:03:50.940 capacitor physically stores 1 bit of data.Â Â Finally, weâ€™ll explore some breakthroughs andÂ Â

00:03:56.880 optimizations such as the burst buffer and foldedÂ DRAM layouts that enable DRAM to move data aroundÂ Â

00:04:04.440 at incredible speeds. A few quick notes.Â Â First, you can find similar DRAM chips insideÂ Â

00:04:11.400 GPUs, Smartphones, and many other devices, butÂ with different optimizations.Â As examples,Â Â

00:04:18.420 GPU DRAM or VRAM, located all around theÂ GPU chip, has a larger bandwidth and canÂ Â

00:04:26.040 read and write simultaneously, but operates atÂ a lower frequency, and DRAM in your smartphoneÂ Â

00:04:32.280 is stacked on top of the CPU and is optimized forÂ smaller packaging and lower power consumption.Â Â Â

00:04:39.360 Second, this video is sponsored byÂ Crucial.Â Although they gave me thisÂ Â

00:04:44.280 stick of DRAM to model and use in theÂ video, the content was independentlyÂ Â

00:04:49.080 researched and not influenced by them.Â Â Third, there are faster memory structuresÂ Â

00:04:54.120 in your CPU called cache memory and even fasterÂ registers.Â All these types of memory create aÂ Â

00:05:01.140 memory hierarchy, with the main trade-offÂ being speed versus capacity while keepingÂ Â

00:05:06.660 prices affordable to consumers and optimizingÂ the size of each microchip for manufacturing.Â

00:05:12.480 Fourth, you can see how much ofÂ your DRAM is being utilized byÂ Â

00:05:17.220 each program by opening your computerâ€™sÂ resource monitor and clicking on memory.Â

00:05:22.440 Fifth, there are different generations of DRAM,Â and weâ€™ll explore DDR5.Â Many of the key conceptsÂ Â

00:05:29.460 that we explain apply to prior generations,Â although the numbers may be different.Â Â Â

00:05:34.020 Sixth, 17 nanoseconds is incredibly fast!Â Â Electricity travels at around 1 foot perÂ Â

00:05:41.820 nanosecond, and 17 nanoseconds is about theÂ time it takes for light to travel across a room.Â

00:05:47.460 Finally, this video is rather long as it coversÂ a lot of what there is to know around DRAM.Â WeÂ Â

00:05:54.840 recommend watching it first at one point twoÂ five times speed, and then a second time atÂ Â

00:06:00.960 one and a half speed to fully comprehend thisÂ complex technology.Â Stick around because thisÂ Â

00:06:06.900 is going to be an incredibly detailed video.Â Â To start, a stick of DRAM is also called a DualÂ Â

00:06:14.100 Inline Memory Module or DIMM and there are 8Â DRAM chips on this particular DIMM.Â On theÂ Â

00:06:21.780 motherboard, there are 4 DRAM slots, and whenÂ plugged in, the DRAM is directly connected toÂ Â

00:06:28.500 the CPU via 2 memory channels that run throughÂ the motherboard.Â Note that the left two DRAMÂ Â

00:06:34.320 slots share these memory channels, and the rightÂ two share a separate channel.Â Letâ€™s move toÂ Â

00:06:40.380 look inside the CPU at the processor.Â AlongÂ with numerous cores and many other elements,Â Â

00:06:46.320 we find the memory controller which managesÂ and communicates with the DRAM.Â Thereâ€™s alsoÂ Â

00:06:52.080 a separate section for communicating with SSDsÂ plugged into the M2 slots and with SSDs andÂ Â

00:06:58.800 hard drives plugged into SATA connectors.Â UsingÂ these sections, along with data mapping tables,Â Â

00:07:04.800 the CPU manages the flow of data fromÂ the SSD to DRAM, as well as from DRAMÂ Â

00:07:10.920 to cache memory for processing by the cores.Letâ€™s move back to see the memory channels.Â Â Â

00:07:16.380 For DDR5 each memory channel is divided into twoÂ parts, Channel A and Channel B. These two memoryÂ Â

00:07:25.020 channels A and B independently transfer 32 bits atÂ a time using 32 data wires.Â Â Using 21 additionalÂ Â

00:07:35.100 wires each memory channel carries an addressÂ specifyingÂ where to read or write data and, usingÂ Â

00:07:42.300 7 control signal wires,Â commands are relayed.The addresses and commands are sent to and sharedÂ Â

00:07:49.200 by all 4 chips on the memory channel whichÂ work in parallel.Â However, the 32-bit dataÂ Â

00:07:55.800 lines are divided among the chips and thus eachÂ chip only reads or writes 8 bits at a time.Â Â Â

00:08:01.800 Additionally, power for DRAM isÂ supplied by the motherboard andÂ Â

00:08:07.200 managed by these chips on the stick itself.Next, letâ€™s open and look inside one of theseÂ Â

00:08:13.140 DRAM microchips.Â Inside the exterior packaging,Â we find an interconnection matrix that connectsÂ Â

00:08:20.400 the ball grid array at the bottom with the dieÂ which is the main part of this microchip.Â This 2Â Â

00:08:26.460 gigabyte DRAM die is organized into 8 bank groupsÂ composed of 4 banks each, totaling 32 banks.Â Â Â

00:08:34.799 Within each bank is a massive array, 65,536 memoryÂ cells tall by 8192 cells across, essentially rowsÂ Â

00:08:47.100 and columns in a grid, with tens of thousands ofÂ wires, and supporting circuitry running outsideÂ Â

00:08:53.340 each bank.Â Instead of looking at this die, weâ€™reÂ going to transition to a functional diagram,Â Â

00:08:59.040 and then reorganize the banks and bank groups.In order to access 17 billion memory cells,Â Â

00:09:06.240 we need a 31-bit address.Â 3 bits are used toÂ select the appropriate bank group, then 2 bitsÂ Â

00:09:13.560 to select the bank.Â Next 16 bits of the addressÂ are used to determine the exact row out of 65Â Â

00:09:21.840 thousand.Â Because this chip reads or writes 8Â bits at a time, the 8192 columns are grouped byÂ Â

00:09:30.420 8 memory cells, all read or written at a time,Â or â€˜by 8â€™, and thus only 10 bits are needed forÂ Â

00:09:37.740 the column address.Â One optimization is thatÂ this 31-bit address is separated into two partsÂ Â

00:09:44.220 and sent using only 21 wires.Â First, the bankÂ group, bank, and row address are sent, and thenÂ Â

00:09:52.260 after that the column address.Â Next, weâ€™ll lookÂ inside these physical memory cells, but first,Â Â

00:09:58.200 letâ€™s briefly talk about how these structures areÂ manufactured as well as this videoâ€™s sponsor.Â Â Â Â

00:10:03.840 This incredibly complicated die,Â also called an integrated circuit,Â Â

00:10:09.540 is manufactured on 300-millimeter silicon wafers,Â 2500ish dies at a time.Â On each die are billionsÂ Â

00:10:18.360 of nanoscopic memory cells that are fabricatedÂ using dozens of tools and hundreds of steps inÂ Â

00:10:24.420 a semiconductor fabrication plant or fab.Â ThisÂ one was made by Micron which manufactures aroundÂ Â

00:10:31.020 a quarter of the worldâ€™s DRAM, including bothÂ Nvidiaâ€™s and AMDâ€™s VRAM in their GPUs Micron alsoÂ Â

00:10:39.600 has its own product line of DRAM and SSDs underÂ the brand Crucial which, as mentioned earlier,Â Â

00:10:46.080 is the sponsor of this video.Â In additionÂ to DRAM, Micron is one of the worldâ€™s leadingÂ Â

00:10:51.600 suppliers of solid-state drives such as thisÂ Crucial P5+ M2 NVME SSD.Â Â By installing yourÂ Â

00:11:00.840 operating system and video games on a CrucialÂ NVMe solid-state drive, youâ€™ll be sure to haveÂ Â

00:11:07.260 incredibly fast loading times and smooth gameplay,Â and if you do video editing, make sure all thoseÂ Â

00:11:14.160 files are on a fast SSD like this one as well.Â Â This is because the main speed bottleneck forÂ Â

00:11:20.160 loading is predominantly limited by the speed ofÂ the SSD or hard drive where the files are stored.Â

00:11:26.340 For example, this hard drive can only transferÂ data at around 150 megabytes a second whereasÂ Â

00:11:34.080 this Crucial NVMe SSD can transfer data at aÂ rate of up to 6,600 megabytes a second, which,Â Â

00:11:42.720 for comparison is the speed of a moving tortoiseÂ versus a galloping horse.Â By using a Crucial NVMeÂ Â

00:11:50.580 SSD, loading a video game that requires gigabytesÂ of DRAM is reduced from a minute or more down toÂ Â

00:11:58.560 a couple seconds.Â Check out the Crucial NVMeÂ SSDs using the link in the description below.Â

00:12:09.300 Letâ€™s get back to the details of how DRAM worksÂ and zoom in to explore a single memory cellÂ Â

00:12:15.780 situated in a massive array. This memory cell isÂ called a 1T1C cell and is a few dozen nanometersÂ Â

00:12:24.240 in size.Â It has two parts, a capacitor to storeÂ one bit of data in the form of electrical chargesÂ Â

00:12:31.080 or electrons and a transistor to access and readÂ or write data.Â The capacitor is shaped like aÂ Â

00:12:37.740 deep trench dug into silicon and is composed ofÂ two conductive surfaces separated by a dielectricÂ Â

00:12:44.400 insulator or barrier just a few atoms thick, whichÂ stops the flow of electrons butÂ allows electricÂ Â

00:12:50.940 fields to pass through.Â If this capacitorÂ is charged up with electrons to 1 volt,Â Â

00:12:56.760 itâ€™s a binary 1, and if no charges are presentÂ and itâ€™s at 0 volts, itâ€™s a binary 0, and thusÂ Â

00:13:04.740 this cell only holds one bit of data.Â DesignsÂ of capacitors are constantly evolving but inÂ Â

00:13:11.400 this trench capacitor, the depth of the silicon isÂ utilized to allow for larger capacitive storage,Â Â

00:13:17.400 while taking up as little area as possible.Next letâ€™s look at the access transistor andÂ Â

00:13:24.300 add in two wires.Â The wordline wire connects toÂ the gate of the transistor while the bitline wireÂ Â

00:13:31.260 connects to the other side of the transistorâ€™sÂ channel.Â Applying a voltage to the wordlineÂ Â

00:13:36.480 turns on the transistor, and, while itâ€™s on,Â electrons can flow through the channel thusÂ Â

00:13:42.000 connecting the capacitor to the bitline.Â ThisÂ allows us to access and charge up the capacitorÂ Â

00:13:47.520 to write a 1 or discharge the capacitor to writeÂ a 0.Â Additionally, we can read the stored valueÂ Â

00:13:54.840 in the capacitor by measuring the amount ofÂ charge.Â However, when the wordline is off,Â Â

00:14:00.000 the transistor is turned off, and the capacitorÂ is isolated from the bitline thus saving theÂ Â

00:14:05.700 data or charge that was previously written.Â NoteÂ that because this transistor is incredibly small,Â Â

00:14:12.120 only a few dozen nanometers wide, electrons slowlyÂ leak across the channel, and thus over time theÂ Â

00:14:19.800 capacitor needs to be refreshed to rechargeÂ the leaked electrons. Weâ€™ll cover exactly howÂ Â

00:14:25.860 refreshing memory cells works a little later.As mentioned earlier, this 1T1C memory cell isÂ Â

00:14:33.540 one of 17 billion inside this single die and isÂ organized into massive arrays called banks.Â So,Â Â

00:14:41.340 letâ€™s build a small array for illustrativeÂ purposes.Â In our array, each of the wordlinesÂ Â

00:14:47.640 is connected in rows, and then the bitlines areÂ connected in columns.Â Wordlines and bitlinesÂ Â

00:14:53.820 are on different vertical layers so one canÂ cross over the other, and they never touch.Â Â

00:15:00.540 Letâ€™s simplify the visual and use symbols for theÂ capacitors and the transistors.Â Just as before,Â Â

00:15:06.720 the wordlines connect to each transistorâ€™s controlÂ gate in rows, and then all the bitlines in columnsÂ Â

00:15:13.980 connect to the channel opposite each capacitor.Â As a result, when a wordline is active,Â Â

00:15:19.980 all the capacitors in only that row areÂ connected to their corresponding bitlines,Â Â

00:15:24.780 thereby activating all the memory cells in thatÂ row.Â At any given time only one wordline isÂ Â

00:15:31.920 active because, if more than one wordline wereÂ active, then multiple capacitors in a columnÂ Â

00:15:37.620 would be connected to the bitline and the dataÂ storage functionalities of these capacitors wouldÂ Â

00:15:42.960 interfere with one another, making them useless.Â Â As mentioned earlier, within a single bank thereÂ Â

00:15:48.900 are 65,536 rows and 8,192 columns and the 31-bitÂ address is used to activate a group of just 8Â Â

00:15:59.760 memory cells.Â The first 5 bits select the bank,Â and the next 16-bits are sent to a row decoderÂ Â

00:16:06.420 to activate a single row.Â For example, thisÂ binary number turns on the wordline row 27,524,Â Â

00:16:15.120 thus turning on all transistors in that row andÂ connecting the 8,192 capacitors to their bitlines,Â Â

00:16:23.640 while at the same time the other 65Â thousandish wordlines are all off.Â Â Â

00:16:29.340 Hereâ€™s the logic diagram for a simple decoder.The remaining 10 bits of the address are sentÂ Â

00:16:35.640 to the column multiplexer.Â This multiplexerÂ takes in the 8192 bitlines on the top, and,Â Â

00:16:42.540 depending on the 10-bit address, connects aÂ specific group of 8 bitlines to the 8 inputÂ Â

00:16:48.600 and output IO wires at the bottom.Â For example,Â if the 10-bit address we this, then only theÂ Â

00:16:55.860 bitlines 4,784 through 4,791 would be connectedÂ to the IO wires, and the rest of the 8000ishÂ Â

00:17:05.579 bitlines would be connected to nothing.Â Hereâ€™sÂ the logic diagram for a simple multiplexer.Â Â Â

00:17:11.460 We now have the means of accessing anyÂ memory cell in this massive array; however,Â Â

00:17:16.680 to understand the three basic operations,Â reading, writing, and refreshing letâ€™s addÂ Â

00:17:22.859 two elements to our layout:Â A sense amplifierÂ at the bottom of each bitline, and a read andÂ Â

00:17:28.800 write driver outside of the column multiplexer.Letâ€™s look at reading from a group of memoryÂ Â

00:17:34.440 cells.Â First the read command and 31-bit addressÂ are sent from the CPU to the DRAM.Â The first 5Â Â

00:17:42.360 bits select a specific bank. The next step isÂ to turn off all the wordlines in that bank,Â Â

00:17:48.300 thereby isolating all the capacitors, and thenÂ precharge all 8000ish bitlines to .5 volts.Â NextÂ Â

00:17:57.240 the 16-bit row address turns on a row, and allÂ the capacitors in that row are connected to theirÂ Â

00:18:03.360 bitlines.Â If an individual capacitor holds a 1Â and is charged to 1 volt, then some charge flowsÂ Â

00:18:10.980 from the capacitor onto the .5-volt bitline, andÂ the voltage on the bitline increases.Â The senseÂ Â

00:18:17.640 amplifier then detects this slight changeÂ or perturbation of voltage on the bitline,Â Â

00:18:22.320 amplifies the change, and pushes the voltage onÂ the bitline all the way up to 1 volt. However,Â Â

00:18:28.980 if a 0 is stored in the capacitor, chargeÂ flows from the bitline into the capacitor,Â Â

00:18:35.100 and the .5-volt bitline decreases in voltage.Â Â The sense amplifier then sees this change,Â Â

00:18:41.700 amplifies it and drives the bitline voltage downÂ to 0 volts or ground.Â The sense amplifier isÂ Â

00:18:49.080 necessary because the capacitor is so small,Â and the bitline is rather long, and thus theÂ Â

00:18:54.600 capacitor needs to have an additional componentÂ to sense and amplify whatever value is stored.Â Â Â

00:19:00.420 Now, all 8000ish bitlines are driven to 1Â volt or 0 volts corresponding to the storedÂ Â

00:19:08.100 charge in the capacitors of the activatedÂ row, and this row is now considered open.Â Â Â

00:19:12.900 Next, the column select multiplexer usesÂ the 10-bit column address to connect theÂ Â

00:19:19.020 corresponding 8 bitlines to the readÂ driver which then sends these 8 valuesÂ Â

00:19:24.420 and voltages over the 8 data wires to the CPU.Â Writing data to these memory cells is similarÂ Â

00:19:31.380 to reading, however with a few key differences.First the write command, address, and 8 bits toÂ Â

00:19:39.060 be written are sent to the DRAM chip.Â Next, justÂ like before the bank is selected,Â the capacitorsÂ Â

00:19:46.860 are isolated, and the bitlines are prechargedÂ to .5 volts.Â Then, using a 16-bit address,Â Â

00:19:54.300 a single row is activated, the capacitors perturbÂ the bitline, and the sense amplifiers sense thisÂ Â

00:20:01.440 and drive the bitlines to a 1 or 0 thus openingÂ the row.Â Next the column address goes to theÂ Â

00:20:09.180 multiplexer, but, this time, because a writeÂ command was sent, the multiplexer connects theÂ Â

00:20:15.180 specific 8 bitlines to the write driver whichÂ contains the 8 bits that the CPU had sent alongÂ Â

00:20:20.820 the data wires and requested to write.Â TheseÂ write drivers are much stronger than the senseÂ Â

00:20:26.160 amplifier and thus they override whatever voltageÂ was previously on the bitline, and drive each ofÂ Â

00:20:32.160 the 8 bitlines to 1 volt for a 1 to be written,Â or 0 volts for a 0.Â This new bitline voltageÂ Â

00:20:39.780 overrides the previously stored charges or valuesÂ in each of the 8 capacitors in the open row,Â Â

00:20:45.780 therebyÂ writing 8 bits of data to the memoryÂ cells corresponding to the 31-bit address.Â

00:20:52.500 Three quick notes.Â First, as a reminder, writingÂ and reading happens concurrently with all the 4Â Â

00:20:58.440 chips in the shared memory channel, usingÂ the same 31-bit address and command wires,Â Â

00:21:03.540 but with different data wires for each chip.Â Â Second, with DDR5 for a binary 1 the voltageÂ Â

00:21:11.220 is actually 1.1 volts, for DDR4 itâ€™s 1.2 volts,Â and prior generations had even higher voltages,Â Â

00:21:19.860 with the bitline precharge voltages beingÂ half of these voltages.Â However, for DDR5,Â Â

00:21:26.460 when writing or refreshing a higher voltage,Â around 1.4 volts is applied and stored in eachÂ Â

00:21:33.060 capacitor for a binary 1 because charge leaksÂ out over time. However, for simplicity, weâ€™reÂ Â

00:21:39.900 going to stick with 1 and 0.Â Third, the numberÂ of bank groups, banks, bitlines and wordlinesÂ Â

00:21:46.860 varies widely between different generationsÂ and capacities but is always in powers of 2.Â

00:21:53.400 Letâ€™s move on and discuss the third operationÂ which is refreshing the memory cells in a bank.Â Â Â

00:21:59.220 As mentioned earlier, the transistors used toÂ isolate the capacitors are incredibly small,Â Â

00:22:05.040 and thus charges leak across the channel.Â TheÂ refresh operation is rather simple and is aÂ Â

00:22:11.160 sequence of closing all the rows, prechargingÂ the bitlines to .5 volts, and opening a row.Â Â Â

00:22:17.400 To refresh, just as before, the capacitors perturbÂ the bitlines and then the sense amplifiers driveÂ Â

00:22:24.240 the bitlines and capacitors of the open row fullyÂ up to 1 volt or down to 0 volts depending on theÂ Â

00:22:31.740 stored value of the capacitor, thereby refillingÂ the leaked charge.Â This process of row closing,Â Â

00:22:38.395 precharging, opening, and sense amplifying happensÂ row after row, taking 50 nanoseconds for each row,Â Â

00:22:46.080 until all 65 thousandish rows are refreshedÂ taking a total of 3 milliseconds or so toÂ Â

00:22:53.400 complete.Â The refresh operation occursÂ once every 64 milliseconds for each bank,Â Â

00:22:58.620 because thatâ€™s statistically below theÂ worst-case time it takes for a memoryÂ Â

00:23:03.360 cell to leak too much charge to make a stored 1Â turn into a 0, thus resulting in a loss of data.Â

00:23:12.120 Letâ€™s take a step back and consider theÂ incredible amount of data that is movedÂ Â

00:23:17.220 through DRAM memory cells. These banks of memoryÂ cells handle up to 4 thousand 8 hundred millionÂ Â

00:23:24.600 requests to read and write data every secondÂ while refreshing every memory cell in eachÂ Â

00:23:30.900 bank row by row around 16 times a second.Â Thatâ€™s a staggering amount of data movementÂ Â

00:23:37.440 and illustrates the true strength of computers.Â Yes, they do simple things like comparisons,Â Â

00:23:44.100 arithmetic, and moving data around, butÂ at a rate of billions of times a second.Â Â

00:23:50.700 Now, you might wonder why computersÂ need to do so much data movement. Well,Â Â

00:23:55.920 take this video game for example. You have obviousÂ calculations like the movement of your characterÂ Â

00:24:01.980 and the horse. But then there are individualÂ grasses, trees, rocks, and animals whoseÂ Â

00:24:07.680 positions and geometries are stored in DRAM.Â And then the environment such as the lightingÂ Â

00:24:14.040 and shadows change the colors and textures of theÂ environment in order to create a realistic world.Â

00:24:21.420 Next, weâ€™re going to explore breakthroughs andÂ optimizations that allow DRAM to be incrediblyÂ Â

00:24:28.260 fast. But, before we get into all thoseÂ details, we would greatly appreciate itÂ Â

00:24:33.660 if you could take a second to hit that likeÂ button, subscribe if you havenâ€™t already,Â Â

00:24:37.980 and type up a quick comment below, as it helps getÂ this video out to others.Â Also, we have a PatreonÂ Â

00:24:45.060 and would appreciate any support.Â This is ourÂ longest and most detailed video by far, and weâ€™reÂ Â

00:24:51.600 planning more videos that get into the innerÂ details of how computers work.Â We canâ€™t do itÂ Â

00:24:57.240 without your help, so thank you for watching andÂ doing these three quick things. It helps a ton.Â

00:25:07.740 The first complex topic which weâ€™ll exploreÂ is why there are 32 banks, as well as what theÂ Â

00:25:14.340 parameters on the packaging of DRAM are.Â Â After that, weâ€™ll explore burst buffers,Â Â

00:25:19.800 sub-arrays, and folded DRAM architectureÂ and whatâ€™s inside the sense amplifier.Â

00:25:25.320 Letâ€™s take a look at the banks.Â AsÂ mentioned earlier opening a singleÂ Â

00:25:30.000 row within a bank requires all theseÂ steps and this process takes time.

00:25:34.440 However, if a row were already open, weÂ could read or write to any section ofÂ Â

00:25:39.600 8 memory cells using only the 10-bitÂ column address and the column selectÂ Â

00:25:44.580 multiplexer. Â When the CPU sends a read orÂ write command to a row thatâ€™s already open,Â Â

00:25:51.000 itâ€™s called a row hit or page hit, and thisÂ can happen over and over.Â With a row hit,Â Â

00:25:56.580 we skip all the steps required to open a row, andÂ just use the 10-bit column address to multiplex aÂ Â

00:26:03.120 different set of 8 columns or bitlines, connectingÂ them to the read or write driver, thereby savingÂ Â

00:26:09.000 a considerable amount of time.Â A row miss isÂ when the next address is for a different row,Â Â

00:26:14.580 which requires the DRAM to close and isolate theÂ currently open row, and then open the new row.Â Â

00:26:20.580 On a package of DRAM there are typically 4 numbersÂ specifying timing parameters regarding row hits,Â Â

00:26:27.464 precharging, and row misses.Â The first numberÂ refers to the time it takes between sending anÂ Â

00:26:33.060 address with a row open, thus a row hit, toÂ receiving the data stored in those columns.Â Â Â

00:26:38.880 The next number is the time it takes to openÂ a row if all the lines are isolated and theÂ Â

00:26:44.640 bitlines are precharged.Â Then the next numberÂ is the time it takes to precharge the bitlinesÂ Â

00:26:49.980 before opening a row, and the last number isÂ the time it takes between a row activation andÂ Â

00:26:55.620 the following precharge.Â Note that theseÂ numbers are measured in clock cycles.Â Â Â

00:27:00.240 Row hits are also the reason why the address isÂ sent in two sections, first the bank selection andÂ Â

00:27:07.500 row address called RAS and then the column addressÂ called CAS. If the first part, the bank selectionÂ Â

00:27:14.640 and row address, matches a currently open row,Â then itâ€™s a row hit, and all the DRAM needs is theÂ Â

00:27:20.880 column address and the new command, and then theÂ multiplexer simply moves around the open row.Â Â Â

00:27:26.040 Because of the time saving in accessing anÂ open row, the CPU memory controller, programs,Â Â

00:27:32.220 and compilers are optimized for increasing theÂ number of subsequent row hits. The opposite,Â Â

00:27:38.460 called thrashing, is when a program jumps aroundÂ from one row to a different row over and over,Â Â

00:27:44.400 and is obviously incredibly inefficientÂ both in terms of energy and time.Â Â Â

00:27:49.860 Additionally, DDR5 DRAM has 32 banks forÂ this reason.Â Each bankâ€™s rows, columns,Â Â

00:27:57.180 sense amplifiers and row decoders operateÂ independently of one another, and thus multipleÂ Â

00:28:03.360 rows from different banks can be open all at theÂ same time, increasing the likelihood of a row hit,Â Â

00:28:09.660 and reducing the average time it takes for the CPUÂ to access data.Â Furthermore, by having multipleÂ Â

00:28:16.200 bank groups, the CPU can refresh one bank in eachÂ bank group at a time while using the other three,Â Â

00:28:22.800 thus reducing the impact of refreshing.Â A question you may have had earlier is whyÂ Â

00:28:28.620 are banks significantly taller than they areÂ wide? Well, by combining all the banks togetherÂ Â

00:28:34.560 one next to the other you can think of this chipÂ as actually being 65 thousand rows tall by 262Â Â

00:28:44.280 thousand columns wide. And, by adding 31 equallyÂ spaced divisions between the columns, thusÂ Â

00:28:50.760 creating banks, we allow for much more flexibilityÂ and efficiency in reading, writing and refreshing.Â

00:28:58.380 Also, note that on the DRAM packaging areÂ its capacity in Gigabytes, the number ofÂ Â

00:29:04.800 millions of data transfers per second, whichÂ is two times the clock frequency, and the peakÂ Â

00:29:10.560 data transfer rate in Megabytes per second.The next design optimization weâ€™ll exploreÂ Â

00:29:16.620 is the burst buffer and burst length.Â Letâ€™s add aÂ 128-bit read and write temporary storage location,Â Â

00:29:23.760 called a burst buffer to our functional diagram.Â Â Instead of 8 wires coming out of the multiplexer,Â Â

00:29:30.600 weâ€™re going to have 128 wires that connectÂ to these 128-bit buffer locations.Â NextÂ Â

00:29:38.340 the 10-bit column address is broken into twoÂ parts, 6 bits are used for the multiplexer,Â Â

00:29:44.340 and 4 bits are for the burst buffer.Â Letâ€™s explore a reading command.Â WithÂ Â

00:29:49.560 our burst buffer in place, 128 memory cells andÂ bitlines are connected to the burst buffer usingÂ Â

00:29:56.760 the 6 column bits, thereby temporarily loading,Â or caching 128 values into the burst buffer.Â Â Â

00:30:04.140 Using the 4 bits for the buffer, 8 quicklyÂ accessed data locations in the burst bufferÂ Â

00:30:10.140 are connected to the read drivers and the data isÂ sent to the CPU.Â By cycling through these 4 bits,Â Â

00:30:16.800 all 16 sets of 8 bits are read out, and thus theÂ burst length is 16.Â After that a new set of 128Â Â

00:30:25.320 bitlines and values are connected and loadedÂ into the burst buffer.Â Thereâ€™s also a writeÂ Â

00:30:31.080 burst buffer which operates in a similar way.The benefit of this design is that 16 sets ofÂ Â

00:30:37.260 8 bits per microchip, totaling 1024 bits, can beÂ accessed and read or written extremely quickly,Â Â

00:30:45.060 as long as the data is all next to oneÂ another, but at the same time we stillÂ Â

00:30:49.800 have the granularity and ability to access anyÂ set of 8 bits if our data requests jump around.Â

00:30:56.400 The next design optimization is that this bankÂ of 65536 rows by 8192 columns is rather massive,Â Â

00:31:07.740 and results in extremely long wordlines andÂ bitlines, especially when compared to the size ofÂ Â

00:31:14.280 each trench capacitor memory cell.Â Therefore,Â the massive array is broken up into smallerÂ Â

00:31:20.400 blocks 1,024 by 1,024, with intermediateÂ sense amplifiers below each subarray,Â Â

00:31:27.840 and subdividing wordlines and using a hierarchicalÂ row decoding scheme.Â By subdividing the bitlines,Â Â

00:31:34.800 the distance and amount of wire that each tinyÂ capacitor is connected to as it perturbs theÂ Â

00:31:40.860 bitline to the sense amplifier is reduced, andÂ thus the capacitor doesnâ€™t have to be as big.Â ByÂ Â

00:31:47.160 subdividing the wordlines the capacitive load fromÂ eight thousandish transistor gates and channels isÂ Â

00:31:53.280 decreased, and thus the time it takes to turn onÂ all the access transistors in a row is decreased.Â

00:31:59.880 The final topic weâ€™re going to talk about isÂ the most complicated.Â Remember how we hadÂ Â

00:32:05.160 a sense amplifier connected to the bottom ofÂ each bitline?Â Well, this optimization has twoÂ Â

00:32:10.500 bitlines per column going to each sense amplifierÂ and alternating rows of memory cells connected toÂ Â

00:32:17.040 the left and right bitlines, thus doubling theÂ number of bitlines.Â When one row is active,Â Â

00:32:22.500 half of the bitlines are active while the otherÂ half are passive and vice versa when the next rowÂ Â

00:32:28.020 is active. Â Moving down to see inside the senseÂ amplifier we find a cross-coupled inverter.Â HowÂ Â

00:32:34.680 does this work?Â Well, when the active bitline isÂ a 1, the passive bitline will be driven by thisÂ Â

00:32:41.040 cross-coupled inverter to the opposite valueÂ of 0, and when the active is a 0, the passiveÂ Â

00:32:46.680 becomes a 1.Â Note that the inverted passiveÂ bitline isnâ€™t connected to any memory cells,Â Â

00:32:52.200 and thus it doesnâ€™t mess up any stored data.Â TheÂ cross-coupled inverter makes it such that theseÂ Â

00:32:58.680 two bitlines are always going to be oppositeÂ one another, and theyâ€™re called a differentialÂ Â

00:33:03.840 pair.Â There are three benefits to this design.Â Â First, during the precharge step, we want to bringÂ Â

00:33:09.600 all the bitlines to .5 volts and, by having aÂ differential pair of active and passive bitlines,Â Â

00:33:15.960 the easiest solution is to disconnect the crossÂ coupled inverters and open a channel between theÂ Â

00:33:22.260 two using a transistor.Â The charge easilyÂ flows from the 1 bitline to the 0, and theyÂ Â

00:33:28.860 both average out and settle at .5 volts.Â Â The other two benefits are noise immunity,Â Â

00:33:34.320 and a reduction in parasitic capacitance of theÂ bitline.Â These benefits are related to that factÂ Â

00:33:39.720 that by creating two oppositely charged electricÂ wires with electric fields going from one toÂ Â

00:33:45.540 the other we reduce the amount of electric fieldsÂ emitted in stray directions and relatedly increaseÂ Â

00:33:51.960 the ability of the sense amplifier to amplifyÂ one bitline to 1 volt and the other to 0 volts.Â Â Â

00:33:58.680 One final note is that when discussing DRAM,Â one major topic is the timing of addresses,Â Â

00:34:04.680 command signals and data, and the relatedÂ acronyms DDR or double data rate, and SDRAM,Â Â

00:34:12.179 or Synchronous DRAM.Â These topics were omittedÂ from this video because it would have taken anÂ Â

00:34:17.760 additional 15 minutes to properly explore.Â Â Thatâ€™sÂ Â

00:34:24.900 pretty much it for the DRAM, and we are gratefulÂ you made it this far into the video.Â We believeÂ Â

00:34:30.600 the future will require a strong emphasis onÂ engineering education and weâ€™re thankful to allÂ Â

00:34:36.780 our Patreon and YouTube Membership SponsorsÂ for supporting this dream.Â If you want toÂ Â

00:34:41.760 support us on YouTube Memberships, or Patreon,Â you can find the links in the description.Â Â

00:34:47.100 A huge thanks goes to the Nathan, Peter, andÂ Jacob who are doctoral students at theÂ FloridaÂ Â

00:34:53.219 Institute for Cybersecurity Research for helpingÂ to research and review this videoâ€™s content!Â TheyÂ Â

00:35:00.000 do foundational research on finding the weakÂ points in device security and whetherÂ hardwareÂ Â

00:35:05.220 is compromised.Â If you want to learn more aboutÂ the FICS graduate program or their work, check outÂ Â

00:35:11.460 the website using the link in the description.Â This is Branch Education, and we create 3DÂ Â

00:35:18.480 animations that dive deep into the technology thatÂ drives our modern world.Â Watch another BranchÂ Â

00:35:24.360 video by clicking one of these cards or click hereÂ to subscribe.Â Thanks for watching to the end!