visit
What is it?
How does it work?
Really, what is it!?To answer all of that, we’ll have to dig deep and get our hands dirty…When we build an application in Xcode, a lot of things happen at the same time. One of them is converting all the source code into an executable. This executable contains the byte code that will run on the CPU, the ARM processor on an iOS device, or the intel processor on Mac.This executable is called Mach-O.Well, this was easy and fun, Goodbye!Unless you want to stick around to know about the internals 😈
Every Mach-O file begins with a header structure that defines the structure of the file. It also contains information about file type and target architecture (armv7, armv7s, i386, etc).
Just below the header structure are a bunch of load commands which help in the layout and linking of the file. Also, load commands can specifyThe initial layout of the file in the virtual memory (we’ll come back to this)The addresses mentioned above are actually offset from the memory address where your Mach-O is loaded. This is done because the starting memory address is randomised every time your app launches using a nifty technique called Automatic Space Layout Randomisation or as we lovingly call it ASLR.
What this means is when your app process starts, you do not know from which address will it start beforehand.Let’s imagine its implications, assume you have a global variable which occupies some memory address in your RAM, but since you don’t know where your process started from, you cannot possibly determine the memory address of this global variable!As you might have guessed, this is done for security purposes, otherwise, it would become very easy to hack the binary if everything has the same address on every launch!This is the first segment of an executable file and it has no data inside so it takes up no space in the file. This segment is full of zeroes to catch NULL pointer dereferences. You might have faced a
EXC_BAD_ACCESS
crash, that is precisely because something in your code tried to access data from here, which is not allowed.As an aside, this segment can be a good place to hide malicious code 😉And, since the __TEXT segment is read-only, there are no changes that need to be saved back to the disk. If the kernel needs to free up memory, it will simply remove the __TEXT page and re-read them when needed.
This is the reason how iOS and OSX cache their dynamic libraries so aggressively.This segment contains writable data (e.g. globals, static variables, etc), and because it is writable, the __DATA segment of a framework or other shared library is logically copied for each process linking with the library.
If you have any experience with Swift, you must be familiar with copy-on-write, this essentially means do not create a copy until the thing being referenced is edited. Similarly, when the __DATA segment is copied, it isn’t really until some process modifies it, that process then receives its own private copy of the page.
This segment contains raw data for the linker (link editor) like symbols and string tables, compressed dynamic linking info, code signing info, and the indirect symbol table — all of which occupy regions as specified by the load commands.
So, now that we have an understanding of the individual segments, let’s try to look at the bigger picture and see how it all fits together.LC_MAIN
Let me tell you a secret! 🤫
Well, it’s not really a secret
When you launch your app by tapping the app icon, instead of launching your app the kernel launches
dyld
!I know right! This guy is a big deal around here.The kernel will actually load dyld at some random address space and it will itself has its own __TEXT segment, __DATA segment… well, you get the idea.
This is where the dyld reads Mach-O header to find out about the dependent dylibs. It then finds that library file on the file system and parses them.
This process is done recursively because a dylib A can be dependent on dylib B which can be dependent on dylib C, so it has to resolve this whole graph of dependencies and finally memory map all of these dylib’s segments to the original Mach-O header.
And, this whole transaction might look something like this.Now, remember we talked about ASLR and how you cannot know which address will be assigned to all the variables in your app. This is something that dyld has to fix using the below techniques.
__LINKEDIT section contains locations of all the pointers that need to be shifted. Dyld will go through all these pointers and shift them based on your application’s start address.
Notice that to do this, we have to read and write to the data pages, causing those pages to become dirty, and would need a copy on write. This is why Rebasing is expensive in IO.Once dyld loads the dependent libraries, it needs to search the symbol tables and find the implementation of these symbols. So, there’s actually a string named
_NSLog
inside your binary that is unresolved and what dyld will do is look up the symbol table and fill it up with the addresses of these functions from the dependent libraries.This is computationally complex and is expensiveAll Objc class definitions need to be registered, why? because you can construct an Objc class from a string calling
NSClassFromString(_:)
method.So, dyld has to build this table before the app can launch.Adding categories to method lists — what this means is, if you have created a category over UIView and added a bunch of new functions, those new functions will be added to the method list of UIView.It also ensures Objc
+load
methods are called at this pointC++ static initialisersThis happens in a bottom-up fashion so basically the dependent libraries will be initialised first.Whew!! After all of this is done, finally, your
main()
will execute.And, this is the story behind the elusive Mach-O file.+load
methods.main()
Previously published at