My thoughts on software: January 2012

There were various reasons for having set out to understand more about the internals of linking and loading. What's under the hood when I hit on the build icon on the IDE? How do I create a plugin system in C++, where plugins are built with different compilers, C++ ABI compatibility etc.

I had studied linkers and loaders as an academic course (System software by Leland L.Beck). But that was a long time ago and was way too detailed, and it discussed the topic with an abstract machine. I could not go back to that.

I found some other good resources :

[1] Linkers and Loaders by John Levine (Buy it, its worth it)
[2] Linkers by Ian Taylor
[3] Linkers and loaders by Sandeep Grover
[4] dsohowto by Ulrich drepper

Although they were all good, none of them were concise as much as I wanted. Sometimes I got lost in long descriptions and varying terminologies. So, the aim of this post is to capture my understanding and probably be of help to anyone with the same goal.

- - -
Lets me assume a modern OS where each process has its own address space and for the most part, a linux platform using ELF file format.

As you might know, there are following kinds of files, during the transition from the source code of a language to the point where it is ready to be run by the OS.

Object files

files with .o extension -> the result of compiling individual source file or translation unit

Libraries

Static libraries

Conceptually, its a collection (archive) of object files with a master symbol table. These are meant for statically linking with program interested in its routines. When statically linked, the library object files are included in the output executable. So, these libraries need not be deployed on the target machine.
These libraries can have unresolved references; the program needing to use this library must provide the necessary dependent libraries to the linker to resolve them.
Also called archives. they have extension .ar on Linux and .LIB on Windows or Symbian.

Shared libraries or Shared objectsShared objects are executables, in the sense that they can be loaded into memory and run unlike static libraries. These are dynamically loaded. They are of two kinds

Statically aware

An executable might statically (at build time) depend on certain routines and data in shared libraries. In such case, the linker needs to be provided with the shared objects which resolve them. When the executable is started, the shared objects are automatically loaded into memory and dynamically linked.
They have .so extension on linux, .DLL on windows.

Statically unaware

A program might depend on routines & data in SOs at runtime. In such case, it can explicitly calls system APIs to load and use them. They have .so extension on linux, .DLL on windows.

Executable

a.out on linux or .exe on windows.

- - -

*.c => *.o => a.out

Lets look at the contents of an object file.

An object file has various parts such as .text, .data, .bss, .symtab etc generated by the compiler/assembler. These are called 'sections' in ELF file format terminology. An ELF object file is thus a concatenation of these sections with a header plus some bookkeeping, to be able to traverse these sections.

Since the compiler has no idea where the program eventually runs in memory,it assumes an address space starting at 0, to place text, data etc. So, for e.g., the first instruction starts at address 0x100 in the object file. So, if the code refers to a global variable 'p', its instruction code might refer to the address 0x2000, if that is the start of the data segment. But, such references would be incorrect when the object code is placed somewhere else in memory. So instead, it just places 0. The compiler delegates this task of placing the correct address to a later stage, by leaving a 'relocation entry'. This entry notes - (a). the address 'A' in .text section, in the middle of the instruction code, where the address of 'p' has to be updated (b). the symbol 'p' whose address needs to be placed at 'A'. This is just a reference to an index in symbol table. Lets say, this is a 'program relocation entry', as we are going to see the other type later.

You can see this for your self by creating a .o file and analysing the contents using objdump
gcc -c mytest.c
objdump -dhsx mytest.o

relocation entries can be viewed using
readelf -r mytest.o

Lets look at the contents of an executable file. On linux, the same ELF format is used for executables.
gcc *.o -o myprog.out

Only linking is done here done by 'program linker'. The prog linker scans all the object files and their sections and merges them. All .text sections from the object files are brought together, all .data sections are brought together, all reloc entries, symbol tables and so on. The linker then recalculates the collective target address space and starts laying out the merged sections accordingly (Needless to say, any references across object files are resolved here).

Since modern OSes provide a separate address space for each process, at a fixed location, for instance 0x8000000 (hypothetical).

The symbol tables and relocation entries are updated accordingly in linker's in-memory book keeping tables. Then, symbol references in instruction codes are fixed with the correct addresses from the updated symbol table.

The resultant .text section and other read-only sections put together form a segment. Each segment is just a collection of contiguous sections with similar properties (e.g., read-only, executable etc).

While sections are used for linker for symbol resolution and relocation, the segments are to be used by the program loader. The prog loader copies the segments from the executable file in the disk into memory pages, starting at the desired logical address space (0x8000000). If a segment ends in the middle of a page, a next segment starts at the next page.

- - -

*c => *.o => mylib.so

The process of creating an SO is almost same as that of creating an executable. Few differences might include,

SOs do not have a fixed load address, since a fixed address might already be used up by another SO. (so, invariably it starts at 0)

That is, the SOs have to be relocated again on loading. Lets called this 'dynamic relocation'. The program linker creates 'dynamic relocation entries' 'for the dynamic linker. The semantics of the entries is still the same. Add the address at which SO is loaded, to the address identified by the dynamic relocation entries.
So, all the addresses have to be added a fixed offset (load address) on loading. But this is mitigated by using PIC as we will see.

Retaining the relocation entries and symbol tables in the SO file, for dynamic linking
The use of create position independent code (PIC)

When a program that uses a shared object is built, the library needs to be provided to linker. The linker notes the SO library name used in the program in a special section called NEEDED (it does not store the full path, just the soname).
gcc main.c -L/home/user/lib -lmytest.so //-L specifies library search folder

There is also an 'interpreter' section in the program which identifies the dynamic linker program that links the shared object at runtime.

When the program is loaded, the loader finds that there are dependencies and launches the interpreter. The linker then formally looks up for the needed libraries and loads them. If SO is not found, the program terminates. If found, it scans reloc and symtab sections and loads them to its data-structures. It then loads the SO at an available logical address and performs relocations based on dynamic relocation entries.

Unlike program relocation, which is done after compilation on a build machine, dynamic relocation has performance impact directly proportional to the number of global, static & extern data, function references. A technique called position independent code (PIC), adds an indirection in such references, making the relocations limited to one relocation per variable or one relocation per function, rather than one relocation per use of variable/per function call. The indirection also helps in supporting a feature called 'function imposition', where a function in the executable can override the one in the shared library. Such indirections are localized in a table. Separate tables exist for data and functions, called - Global Offset Table (GOT), Procedure Lookup Table (PLT) respectively.

So, loading a variable to memory is equivalent to
addr = GOT[offset]
variable = *addr

Now, the table itself is placed at a fixed offset from code, for the code to be able to find it. In other words, the offset from SP to GOT (or PLC) is fixed when a shared object is built.

While GOT has a straightforward design, PLT is some what complicated. You can refer to [1] or [2]. In brief, the first call to any function is arranged to be a call to a routine dynamic linker via GOT table. The routine, then finds the correct address of that procedure and modifies the GOT table such that subsequent calls directly land in the correct routine.

The scheme in Windows/Symbian for shared libraries is similar except that windows uses import libraries for linking with the program, rather than the DLL itself. The data/procedure symbols that are to be 'exported' by the DLL are placed in an ordinal table with indices. The import library has this binding information and it statically resolves those symbols using stub code, during program linking. At runtime, the DLL is found and loaded (like SOs are) and the jump tables in the import libraries take care of binding the call with the routine in DLL.

- Statically unaware shared objects:
In this variant, an executable program would be loosely dependent on shared libraries. That is it may not directly refer to data or routines in shared libraries. Instead, it uses the system APIs (like dlopen()) to load SO libraries on demand and lookup routines for use. Obviously, the developer would know the nature of routines before hand and use mechanisms like function pointer or virtual interface classes to interact with data/routines on the SO. This kind of shared objects are mainly used in making plugins in plug and play frameworks.

- - -

-- Reduced memory footprint from using SO
Shared libraries save a lot of disk space when compared to static libraries. But there is some redundancy still, when SOs are loaded in memory. Various applications using SOs might result in multiple copies in memory. The solution that has been devised which involves separating the read-only parts from the other. With this, the system can load RO parts only once while making a new copy of writable (W) parts for each process.

Since the addresses in RO are bound to data in W, the entire SO has to be placed as a block in logical address space. So, for each process, paging is done such that, RO and W segments are only contiguous in LAS, while physically they may be apart. When paging is done for a new process, the RO physical page is mapped into the LAS of that process.

Tuesday, January 10, 2012

Linkers and loaders