DREPPER MEMORY PDF

What every programmer should know about memory, Part 1 Did you know? Please help out by buying a subscription and keeping LWN on the net. We did not have to look at the text for long to realize that it would be of interest to many LWN readers. Memory usage is often the determining factor in how software performs, but good information on how to avoid memory bottlenecks is hard to find. This series of articles should change that situation. The original document prints out at over pages.

Author:Gozuru Mejar
Country:Azerbaijan
Language:English (Spanish)
Genre:Software
Published (Last):3 June 2009
Pages:249
PDF File Size:3.4 Mb
ePub File Size:2.74 Mb
ISBN:288-4-56302-836-9
Downloads:53434
Price:Free* [*Free Regsitration Required]
Uploader:Tomi



What every programmer should know about memory, Part 1 Did you know? Please help out by buying a subscription and keeping LWN on the net. We did not have to look at the text for long to realize that it would be of interest to many LWN readers. Memory usage is often the determining factor in how software performs, but good information on how to avoid memory bottlenecks is hard to find. This series of articles should change that situation. The original document prints out at over pages.

We will be splitting it into about seven segments, each run weeks after its predecessor. Once the entire series is out, Ulrich will be releasing the full text. Reformatting the text from the original LaTeX has been a bit of a challenge, but the results, hopefully, will be good. Hyperlinked cross-references and [bibliography references] will not be possible until the full series is published.

Many thanks to Ulrich for allowing LWN to publish this material; we hope that it will lead to more memory-efficient software across our systems in the near future. The various components of a system, such as the CPU, memory, mass storage, and network interfaces, were developed together and, as a result, were quite balanced in their performance. For example, the memory and network interfaces were not much faster than the CPU at providing data.

This situation changed once the basic structure of computers stabilized and hardware developers concentrated on optimizing individual subsystems. Suddenly the performance of some components of the computer fell significantly behind and bottlenecks developed.

This was especially true for mass storage and memory subsystems which, for cost reasons, improved more slowly relative to other components. The slowness of mass storage has mostly been dealt with using software techniques: operating systems keep most often used and most likely to be used data in main memory, which can be accessed at a rate orders of magnitude faster than the hard disk. Cache storage was added to the storage devices themselves, which requires no changes in the operating system to increase performance.

Unlike storage subsystems, removing the main memory as a bottleneck has proven much more difficult and almost all solutions require changes to the hardware. Today these changes mainly come in the following forms: RAM hardware design speed and parallelism.

Memory controller designs. CPU caches. Direct memory access DMA for devices. For the most part, this document will deal with CPU caches and some effects of memory controller design. In the process of exploring these topics, we will explore DMA and bring it into the larger picture. This is a prerequisite to understanding the problems and the limitations of efficiently using memory subsystems.

We will also learn about, in some detail, the different types of RAM and illustrate why these differences still exist. This document is in no way all inclusive and final.

It is limited to commodity hardware and further limited to a subset of that hardware. Also, many topics will be discussed in just enough detail for the goals of this paper. For such topics, readers are recommended to find more detailed documentation. When it comes to operating-system-specific details and solutions, the text exclusively describes Linux. At no time will it contain any information about other OSes.

The author has no interest in discussing the implications for other OSes. One last comment before the start. The technology discussed here exists in many, many variations in the real world and this paper only addresses the most common, mainstream versions.

It is rare that absolute statements can be made about this technology, thus the qualifiers. It does not go into enough technical details of the hardware to be useful for hardware-oriented readers. But before we can go into the practical information for developers a lot of groundwork must be laid. To that end, the second section describes random-access memory RAM in technical detail. Appropriate back references to the section are added in places where the content is required so that the anxious reader could skip most of this section at first.

The third section goes into a lot of details of CPU cache behavior. Graphs have been used to keep the text from being as dry as it would otherwise be. This content is essential for an understanding of the rest of the document.

This is also required groundwork for the rest. Section 6 is the central section of this paper. The very impatient reader could start with this section and, if necessary, go back to the earlier sections to freshen up the knowledge of the underlying technology.

Section 7 introduces tools which can help the programmer do a better job. Even with a complete understanding of the technology it is far from obvious where in a non-trivial software project the problems are. Some tools are necessary. In section 8 we finally give an outlook of technology which can be expected in the near future or which might just simply be good to have.

This includes updates made necessary by advances in technology but also to correct mistakes. Readers willing to report problems are encouraged to send email. Markus Armbruster provided a lot of valuable input on problems and omissions in the text. Scaling these days is most often achieved horizontally instead of vertically, meaning today it is more cost-effective to use many smaller, connected commodity computers instead of a few really large and exceptionally fast and expensive systems.

This is the case because fast and inexpensive network hardware is widely available. There are still situations where the large specialized systems have their place and these systems still provide a business opportunity, but the overall market is dwarfed by the commodity hardware market.

Bigger machines will be supported, but the quad socket, quad CPU core case is currently thought to be the sweet spot and most optimizations are targeted for such machines. Large differences exist in the structure of commodity computers. Note that these technical details tend to change rapidly, so the reader is advised to take the date of this writing into account.

Over the years the personal computers and smaller servers standardized on a chipset with two parts: the Northbridge and Southbridge. Figure 2. The Northbridge contains, among other things, the memory controller, and its implementation determines the type of RAM chips used for the computer. To reach all other system devices, the Northbridge must communicate with the Southbridge.

Older systems had AGP slots which were attached to the Northbridge. This was done for performance reasons related to insufficiently fast connections between the Northbridge and Southbridge. Such a system structure has a number of noteworthy consequences: All data communication from one CPU to another must travel over the same bus used to communicate with the Northbridge. All communication with RAM must pass through the Northbridge.

The RAM has only a single port. It can be found in specialized hardware such as network routers which depend on utmost speed. A couple of bottlenecks are immediately apparent in this design. One such bottleneck involves access to RAM for devices.

In the earliest days of the PC, all communication with devices on either bridge had to pass through the CPU, negatively impacting overall system performance. To work around this problem some devices became capable of direct memory access DMA.

Today all high-performance devices attached to any of the buses can utilize DMA. This problem, therefore, must to be taken into account. A second bottleneck involves the bus from the Northbridge to the RAM. The exact details of the bus depend on the memory types deployed.

On older systems there is only one bus to all the RAM chips, so parallel access is not possible. The Northbridge interleaves memory access across the channels. With limited bandwidth available, it is important to schedule memory access in ways that minimize delays. As we will see, processors are much faster and must wait to access memory, despite the use of CPU caches. If multiple hyper-threads, cores, or processors access memory at the same time, the wait times for memory access are even longer.

This is also true for DMA operations. There is more to accessing memory than concurrency, however. Access patterns themselves also greatly influence the performance of the memory subsystem, especially with multiple memory channels. Refer to Section 2.

On some more expensive systems, the Northbridge does not actually contain the memory controller. Instead the Northbridge can be connected to a number of external memory controllers in the following example, four of them.

This design also supports more memory. Concurrent memory access patterns reduce delays by simultaneously accessing different memory banks. This is especially true when multiple processors are directly connected to the Northbridge, as in Figure 2. For such a design, the primary limitation is the internal bandwidth of the Northbridge, which is phenomenal for this architecture from Intel. Intel will have support for the Common System Interface CSI starting with the Nehalem processors; this is basically the same approach: an integrated memory controller with the possibility of local memory for each processor.

On a quad-CPU machine the memory bandwidth is quadrupled without the need for a complicated Northbridge with enormous bandwidth. Having a memory controller integrated into the CPU has some additional advantages; we will not dig deeper into this technology here.

There are disadvantages to this architecture, too. First of all, because the machine still has to make all the memory of the system accessible to all processors, the memory is not uniform anymore hence the name NUMA - Non-Uniform Memory Architecture - for such an architecture. Local memory memory attached to a processor can be accessed with the usual speed.

LAS DIABOLICAS DE HITLER PDF

Subscribe to RSS

Vurg This has huge implications on the programmer which we will discuss in the remainder of this paper. A pretty straightforward design. With more complicated machines the derpper of levels can grow significantly. Refer to Section 2. It can be found in specialized hardware such as network routers which depend on utmost speed.

AGNOTOLOGY THE MAKING AND UNMAKING OF IGNORANCE PDF

What every programmer should know about memory, Part 1

.

KARANIYA METTA SUTTA ENGLISH PDF

Memory and Cache Paper

.

Related Articles