visit
Have you ever wondered what to do when the very tool you use to predict and handle crashes, Firebase Crashlytics, encounters a problem itself? You might think it's an impasse, but don't worry – we will do some detective work in this post. I have come across a unique deadlock within Firebase Crashlytics' urgent mode. After some deep digging, I've found an unexpected yet efficient solution, drawing inspiration from an unlikely place – XCTets' "expectation" implementation.
Let's jump back to the issue at hand. I observed my app was taking an unusually long time to launch. To dig into this, I used lldb
to pause my app and examined the issue in detail. As I went through the stack, it didn't take long to spot the culprit: Firebase Crashlytics was interrupting the launch process.
The function regenerateInstallIDIfNeededWithBlock
had appeared on the Main thread. This was odd because if you use a symbolic breakpoint, you'll notice that regenerateInstallIDIfNeededWithBlock
is normally invoked from a background thread, not the Main thread. This unusual shift was a clear red flag that the expected process flow was off.
Now, let's unravel this deadlock situation. A close examination reveals that regenerateInstallID
is preceded by prepareAndSubmitReport
, which is itself preceded by processExistingActiveReportPath
.
- (void)processExistingActiveReportPath:(NSString *)path
dataCollectionToken:(FIRCLSDataCollectionToken *)dataCollectionToken
asUrgent:(BOOL)urgent {
FIRCLSInternalReport *report = [FIRCLSInternalReport reportWithPath:path];
if (![report hasAnyEvents]) {
// call is scheduled to the background queue
[self.operationQueue addOperationWithBlock:^{
[self.fileManager removeItemAtPath:path];
}];
return;
}
if (urgent && [dataCollectionToken isValid]) {
// called from the Main thread
[self.reportUploader prepareAndSubmitReport:report
dataCollectionToken:dataCollectionToken
asUrgent:urgent
withProcessing:YES];
return;
}
The regenerateInstallID
waiting for the semaphore to signal, which should occur when [self.installations installationIDWithCompletion]
is completed. of regenerateInstallID
looks like this (for the sake of brevity, the code is simplified):
- (void)regenerateInstallID {
dispatch_semaphore_t semaphore = dispatch_semaphore_create(0);
// This runs Completion async, so wait a reasonable amount of time for it to finish.
[self.installations
installationIDWithCompletion:^(void) {
dispatch_semaphore_signal(semaphore);
}];
intptr_t result = dispatch_semaphore_wait(
semaphore, dispatch_time(DISPATCH_TIME_NOW, FIRCLSInstallationsWaitTime));
}
To figure out why the completion does not fire, I've dug down in the chain of calls to the installationIDWithCompletion
and did not notice any path that could ignore the completion.
The real issue revealed itself when I noticed the completion wrapped in a FBLPromise.then {}
block. This block is dispatched asynchronously on , as shown here:
@implementation FBLPromise (ThenAdditions)
- (FBLPromise *)then:(FBLPromiseThenWorkBlock)work {
// Where defaultDispatchQueue is gFBLPromiseDefaultDispatchQueue by default
return [self onQueue:FBLPromise.defaultDispatchQueue then:work];
}
@end
static dispatch_queue_t gFBLPromiseDefaultDispatchQueue;
+ (void)initialize {
if (self == [FBLPromise class]) {
gFBLPromiseDefaultDispatchQueue = dispatch_get_main_queue();
}
}
So, the deadlock essentially boils down to this: A semaphore is waiting on the Main thread for a signal from the completion handler to release it, but the completion handler itself is stuck, waiting for the main thread to execute dispatch_async
. This circular dependency was causing our app launch to stall.
So, what options are we left with?
If only we could execute an async callback on the Main thread while simultaneously waiting on it... Sounds familiar? Well, it should! We do have this capability in XCTest viawaitForExpectations
.
Here's an example:
// This test will pass
func testExample() throws {
let testExpectation = expectation(description: "")
DispatchQueue.main.asyncAfter(deadline: .now() + 0.5) {
testExpectation.fulfill()
}
assert(Thread.isMainThread == true)
waitForExpectations(timeout: .infinity)
}
func primitiveWait(using runLoop: RunLoop, duration timeout: TimeInterval) {
let timeIntervalToRun = min(0.1, timeout)
runLoop.run(mode: .default, before: Date(timeIntervalSinceNow: timeIntervalToRun))
}
- (void)regenerateInstallID {
dispatch_semaphore_t semaphore = nil;
bool isMainThread = NSThread.isMainThread;
if (!isMainThread) {
semaphore = dispatch_semaphore_create(0);
}
[self.installations
installationIDWithCompletion:^(void) {
NSAssert(NSThread.isMainThread, @"We expect to get a completion on the main thread");
completed = true;
if (!isMainThread) {
dispatch_semaphore_signal(semaphore);
}
}];
intptr_t result = 0;
if (isMainThread) {
NSDate *deadline =
[NSDate dateWithTimeIntervalSinceNow:FIRCLSInstallationsWaitTime / NSEC_PER_SEC];
while (!completed) {
NSDate *now = [[NSDate alloc] init];
if ([now timeIntervalSinceDate:deadline] > 0) {
break;
}
[[NSRunLoop mainRunLoop] runMode:NSDefaultRunLoopMode beforeDate:deadline];
}
if (!completed) {
result = -1;
}
} else { // isMainThread
result = dispatch_semaphore_wait(semaphore,
dispatch_time(DISPATCH_TIME_NOW, FIRCLSInstallationsWaitTime));
}
}
Although the proposed solution worked, the maintainers of the Firebase SDK discovered an even more elegant and streamlined solution. They found that calling regenerateInstallID
was not required. The most straightforward fix is the most effective, sidestepping the need for complex or solutions. And I want to highlight the importance of constantly refining and enhancing our solutions to focus on simplicity and efficiency in our code.