[ACCEPTED]-What happens when a computer program runs?-computer-architecture

Accepted answer
Score: 164

It really depends on the system, but modern 116 OSes with virtual memory tend to load their process images 115 and allocate memory something like this:

|  stack  |  function-local variables, return addresses, return values, etc.
|         |  often grows downward, commonly accessed via "push" and "pop" (but can be
|         |  accessed randomly, as well; disassemble a program to see)
| shared  |  mapped shared libraries (C libraries, math libs, etc.)
|  libs   |
|  hole   |  unused memory allocated between the heap and stack "chunks", spans the
|         |  difference between your max and min memory, minus the other totals
|  heap   |  dynamic, random-access storage, allocated with 'malloc' and the like.
|   bss   |  Uninitialized global variables; must be in read-write memory area
|  data   |  data segment, for globals and static variables that are initialized
|         |  (can further be split up into read-only and read-write areas, with
|         |  read-only areas being stored elsewhere in ROM on some systems)
|  text   |  program code, this is the actual executable code that is running.

This 114 is the general process address space on 113 many common virtual-memory systems. The 112 "hole" is the size of your total 111 memory, minus the space taken up by all 110 the other areas; this gives a large amount 109 of space for the heap to grow into. This 108 is also "virtual", meaning it 107 maps to your actual memory through a translation 106 table, and may be actually stored at any 105 location in actual memory. It is done this 104 way to protect one process from accessing 103 another process's memory, and to make each 102 process think it's running on a complete 101 system.

Note that the positions of, e.g., the 100 stack and heap may be in a different order 99 on some systems (see Billy O'Neal's answer below for more details 98 on Win32).

Other systems can be very different. DOS, for 97 instance, ran in real mode, and its memory allocation 96 when running programs looked much differently:

+-----------+ top of memory
| extended  | above the high memory area, and up to your total memory; needed drivers to
|           | be able to access it.
+-----------+ 0x110000
|  high     | just over 1MB->1MB+64KB, used by 286s and above.
+-----------+ 0x100000
|  upper    | upper memory area, from 640kb->1MB, had mapped memory for video devices, the
|           | DOS "transient" area, etc. some was often free, and could be used for drivers
+-----------+ 0xA0000
| USER PROC | user process address space, from the end of DOS up to 640KB
|command.com| DOS command interpreter
|    DOS    | DOS permanent area, kept as small as possible, provided routines for display,
|  kernel   | *basic* hardware access, etc.
+-----------+ 0x600
| BIOS data | BIOS data area, contained simple hardware descriptions, etc.
+-----------+ 0x400
| interrupt | the interrupt vector table, starting from 0 and going to 1k, contained 
|  vector   | the addresses of routines called when interrupts occurred.  e.g.
|  table    | interrupt 0x21 checked the address at 0x21*4 and far-jumped to that 
|           | location to service the interrupt.
+-----------+ 0x0

You 95 can see that DOS allowed direct access to 94 the operating system memory, with no protection, which 93 meant that user-space programs could generally 92 directly access or overwrite anything they 91 liked.

In the process address space, however, the 90 programs tended to look similar, only they 89 were described as code segment, data segment, heap, stack 88 segment, etc., and it was mapped a little 87 differently. But most of the general areas 86 were still there.

Upon loading the program 85 and necessary shared libs into memory, and 84 distributing the parts of the program into 83 the right areas, the OS begins executing 82 your process wherever its main method is 81 at, and your program takes over from there, making 80 system calls as necessary when it needs 79 them.

Different systems (embedded, whatever) may 78 have very different architectures, such 77 as stackless systems, Harvard architecture 76 systems (with code and data being kept in 75 separate physical memory), systems which 74 actually keep the BSS in read-only memory 73 (initially set by the programmer), etc. But 72 this is the general gist.

You said:

I also 71 know that a computer program uses two kinds 70 of memory: stack and heap, which are also 69 part of the primary memory of the computer.

"Stack" and 68 "heap" are just abstract concepts, rather 67 than (necessarily) physically distinct "kinds" of 66 memory.

A stack is merely a last-in, first-out 65 data structure. In the x86 architecture, it 64 can actually be addressed randomly by using 63 an offset from the end, but the most common 62 functions are PUSH and POP to add and remove 61 items from it, respectively. It is commonly 60 used for function-local variables (so-called 59 "automatic storage"), function 58 arguments, return addresses, etc. (more 57 below)

A "heap" is just a nickname for a chunk 56 of memory that can be allocated on demand, and 55 is addressed randomly (meaning, you can 54 access any location in it directly). It 53 is commonly used for data structures that 52 you allocate at runtime (in C++, using new and 51 delete, and malloc and friends in C, etc).

The stack 50 and heap, on the x86 architecture, both 49 physically reside in your system memory 48 (RAM), and are mapped through virtual memory 47 allocation into the process address space 46 as described above.

The registers (still on x86), physically 45 reside inside the processor (as opposed 44 to RAM), and are loaded by the processor, from 43 the TEXT area (and can also be loaded from 42 elsewhere in memory or other places depending 41 on the CPU instructions that are actually 40 executed). They are essentially just very 39 small, very fast on-chip memory locations 38 that are used for a number of different 37 purposes.

Register layout is highly dependent 36 on the architecture (in fact, registers, the 35 instruction set, and memory layout/design, are 34 exactly what is meant by "architecture"), and 33 so I won't expand upon it, but recommend 32 you take an assembly language course to 31 understand them better.

Your question:

At 30 what point is the stack used for the execution 29 of the instructions? Instructions go from 28 the RAM, to the stack, to the registers?

The 27 stack (in systems/languages that have and 26 use them) is most often used like this:

int mul( int x, int y ) {
    return x * y;       // this stores the result of MULtiplying the two variables 
                        // from the stack into the return value address previously 
                        // allocated, then issues a RET, which resets the stack frame
                        // based on the arg list, and returns to the address set by
                        // the CALLer.

int main() {
    int x = 2, y = 3;   // these variables are stored on the stack
    mul( x, y );        // this pushes y onto the stack, then x, then a return address,
                        // allocates space on the stack for a return value, 
                        // then issues an assembly CALL instruction.

Write 25 a simple program like this, and then compile 24 it to assembly (gcc -S foo.c if you have access to GCC), and 23 take a look. The assembly is pretty easy 22 to follow. You can see that the stack is 21 used for function local variables, and for 20 calling functions, storing their arguments 19 and return values. This is also why when 18 you do something like:

f( g( h( i ) ) ); 

All of these get called 17 in turn. It's literally building up a stack 16 of function calls and their arguments, executing 15 them, and then popping them off as it winds 14 back down (or up ;). However, as mentioned 13 above, the stack (on x86) actually resides 12 in your process memory space (in virtual 11 memory), and so it can be manipulated directly; it's 10 not a separate step during execution (or 9 at least is orthogonal to the process).

FYI, the 8 above is the C calling convention, also used by C++. Other 7 languages/systems may push arguments onto 6 the stack in a different order, and some 5 languages/platforms don't even use stacks, and 4 go about it in different ways.

Also note, these 3 aren't actual lines of C code executing. The 2 compiler has converted them into machine 1 language instructions in your executable. They are then (generally) copied from the TEXT area into the CPU pipeline, then into the CPU registers, and executed from there. [This was incorrect. See Ben Voigt's correction below.]

Score: 62

Sdaz has gotten a remarkable number of upvotes 69 in a very short time, but sadly is perpetuating 68 a misconception about how instructions move 67 through the CPU.

The question asked:

Instructions 66 go from the RAM, to the stack, to the registers?

Sdaz 65 said:

Also note, these aren't actual lines 64 of C code executing. The compiler has converted 63 them into machine language instructions 62 in your executable. They are then (generally) copied 61 from the TEXT area into the CPU pipeline, then 60 into the CPU registers, and executed from 59 there.

But this is wrong. Except for the 58 special case of self-modifying code, instructions 57 never enter the datapath. And they are 56 not, cannot be, executed from the datapath.

The 55 x86 CPU registers are:

  • General registers EAX EBX ECX EDX

  • Segment 54 registers CS DS ES FS GS SS

  • Index and pointers ESI 53 EDI EBP EIP ESP

  • Indicator EFLAGS

There are 52 also some floating-point and SIMD registers, but 51 for the purposes of this discussion we'll 50 classify those as part of the coprocessor 49 and not the CPU. The memory-management 48 unit inside the CPU also has some registers 47 of its own, we'll again treat that as a 46 separate processing unit.

None of these registers 45 are used for executable code. EIP contains 44 the address of the executing instruction, not 43 the instruction itself.

Instructions go through 42 a completely different path in the CPU from 41 data (Harvard architecture). All current 40 machines are Harvard architecture inside 39 the CPU. Most these days are also Harvard 38 architecture in the cache. x86 (your common 37 desktop machine) are Von Neumann architecture 36 in the main memory, meaning data and code 35 are intermingled in RAM. That's beside 34 the point, since we're talking about what 33 happens inside the CPU.

The classic sequence 32 taught in computer architecture is fetch-decode-execute. The 31 memory controller looks up the instruction 30 stored at the address EIP. The bits of the 29 instruction go through some combinational 28 logic to create all the control signals 27 for the different multiplexers in the processor. And 26 after some cycles, the arithmetic logic 25 unit arrives at a result, which is clocked 24 into the destination. Then the next instruction 23 is fetched.

On a modern processor, things 22 work a little differently. Each incoming 21 instruction is translated into a whole series 20 of microcode instructions. This enable 19 pipelining, because the resources used by 18 the first microinstruction aren't needed 17 later, so they can begin working on the 16 first microinstruction from the next instruction.

To 15 top it off, terminology is slightly confused 14 because register is an electrical engineering term 13 for a collection of D-flipflops. And instructions 12 (or especially microinstructions) may very 11 well be stored temporarily in such a collection 10 of D-flipflops. But this is not what is 9 meant when a computer scientist or software 8 engineer or run-of-the-mill developer uses 7 the term register. They mean the datapath registers 6 as listed above, and these are not used 5 for transporting code.

The names and number 4 of datapath registers vary for other CPU 3 architectures, such as ARM, MIPS, Alpha, PowerPC, but 2 all of them execute instructions without 1 passing them through the ALU.

Score: 17

The exact layout of the memory while a process 84 is executing is completely dependent on 83 the platform which you're using. Consider 82 the following test program:

#include <stdlib.h>
#include <stdio.h>

int main()
    int stackValue = 0;
    int *addressOnStack = &stackValue;
    int *addressOnHeap = malloc(sizeof(int));
    if (addressOnStack > addressOnHeap)
        puts("The stack is above the heap.");
        puts("The heap is above the stack.");

On Windows NT 81 (and it's children), this program is going 80 to generally produce:

The heap is above the 79 stack

On POSIX boxes, it's going to say:

The 78 stack is above the heap

The UNIX memory model 77 is quite well explained here by @Sdaz MacSkibbons, so 76 I won't reiterate that here. But that is 75 not the only memory model. The reason POSIX 74 requires this model is the sbrk system call. Basically, on 73 a POSIX box, to get more memory, a process 72 merely tells the Kernel to move the divider 71 between the "hole" and the "heap" further 70 into the "hole" region. There 69 is no way to return memory to the operating 68 system, and the operating system itself 67 does not manage your heap. Your C runtime 66 library has to provide that (via malloc).

This 65 also has implications for the kind of code 64 actually used in POSIX binaries. POSIX boxes 63 (almost universally) use the ELF file format. In 62 this format, the operating system is responsible 61 for communications between libraries in 60 different ELF files. Therefore, all the 59 libraries use position-independent code 58 (That is, the code itself can be loaded 57 into different memory addresses and still 56 operate), and all calls between libraries 55 are passed through a lookup table to find 54 out where control needs to jump for cross 53 library function calls. This adds some overhead 52 and can be exploited if one of the libraries 51 changes the lookup table.

Windows' memory 50 model is different because the kind of code 49 it uses is different. Windows uses the PE 48 file format, which leaves the code in position-dependent 47 format. That is, the code depends on where 46 exactly in virtual memory the code is loaded. There 45 is a flag in the PE spec which tells the 44 OS where exactly in memory the library or 43 executable would like to be mapped when 42 your program runs. If a program or library 41 cannot be loaded at it's preferred address, the 40 Windows loader must rebase the library/executable 39 -- basically, it moves the position-dependent 38 code to point at the new positions -- which 37 doesn't require lookup tables and cannot 36 be exploited because there's no lookup table 35 to overwrite. Unfortunately, this requires 34 very complicated implementation in the Windows 33 loader, and does have considerable startup 32 time overhead if an image needs to be rebased. Large 31 commercial software packages often modify 30 their libraries to start purposely at different 29 addresses to avoid rebasing; windows itself 28 does this with it's own libraries (e.g. ntdll.dll, kernel32.dll, psapi.dll, etc. -- all 27 have different start addresses by default)

On 26 Windows, virtual memory is obtained from 25 the system via a call to VirtualAlloc, and it is returned 24 to the system via VirtualFree (Okay, technically VirtualAlloc 23 farms out to NtAllocateVirtualMemory, but 22 that's an implementation detail) (Contrast 21 this to POSIX, where memory cannot be reclaimed). This 20 process is slow (and IIRC, requires that 19 you allocate in physical page sized chunks; typically 18 4kb or more). Windows also provides it's 17 own heap functions (HeapAlloc, HeapFree, etc.) as 16 part of a library known as RtlHeap, which 15 is included as a part of Windows itself, upon 14 which the C runtime (that is, malloc and friends) is 13 typically implemented.

Windows also has quite 12 a few legacy memory allocation APIs from 11 the days when it had to deal with old 80386s, and 10 these functions are now built on top of 9 RtlHeap. For more information about the 8 various APIs that control memory management 7 in Windows, see this MSDN article: http://msdn.microsoft.com/en-us/library/ms810627 .

Note 6 also that this means on Windows a single 5 process an (and usually does) have more 4 than one heap. (Typically, each shared library 3 creates it's own heap.)

(Most of this information 2 comes from "Secure Coding in C and 1 C++" by Robert Seacord)

Score: 5

The stack

In X86 architercture the CPU executes operations 40 with registers. The stack is only used for 39 convenience reasons. You can save the content 38 of your registers to stack before calling 37 a subroutine or a system function and then 36 load them back to continue your operation 35 where you left. (You could to it manually 34 without the stack, but it is a frequently 33 used function so it has CPU support). But 32 you can do pretty much anything without 31 the stack in a PC.

For example an integer 30 multiplication:


Multiplies AX register with 29 BX register. (The result will be in DX and 28 AX, DX containing the higher bits).

Stack 27 based machines (like JAVA VM) use the stack 26 for their basic operations. The above multiplication:


This 25 pops two values from the top of the stack 24 and multiplies tem, then pushes the result 23 back to the stack. Stack is essential for 22 this kind of machines.

Some higher level 21 programming languages (like C and Pascal) use 20 this later method for passing parameters 19 to functions: the parameters are pushed 18 to the stack in left to right order and 17 popped by the function body and the return 16 values are pushed back. (This is a choice 15 that the compiler manufacturers make and 14 kind of abuses the way the X86 uses the 13 stack).

The heap

The heap is an other concept that 12 exists only in the realm of the compilers. It 11 takes the pain of handling the memory behind 10 your variables away, but it is not a function 9 of the CPU or the OS, it is just a choice 8 of housekeeping the memory block wich is 7 given out by the OS. You could do this manyually 6 if you want.

Accessing system resources

The operating system has a public 5 interface how you can access its functions. In 4 DOS parameters are passed in registers of 3 the CPU. Windows uses the stack for passing 2 parameters for OS functions (the Windows 1 API).

More Related questions