Debugging Concurrent Code Using Logs - Papertrail Blog

The revolution will be verbosely {,b}logged

Debugging Concurrent Code Using Logs

Posted by Henry Rivera on

Working with concurrent code can be a real pain, because it can be difficult to track execution in multiple parts of the code base. Even the trusty debugger can get difficult to use with this kind of code. I’ll show you how logs make it easy to see the application behavior and identify problems. This is especially true when your code is running in the wild on a remote server and you are trying to diagnose problems.

Logging is one of the first tools in a developer’s kit for fixing timing and deadlock issues. When you debug concurrent code, the debugger may appear to jump around as different parts of the code are executed. This is true for both multithreaded and asynchronous code. A log file allows you to quickly see the behavior of your application without slowly stepping through tasks in different parts of the code base. Let’s run through a famous example so you can see exactly what I mean.

Finding Bugs in the Dining Philosophers’ Meals

When it comes to concurrent code, there is a very old computer science challenge known as the Dining Philosophers Problem. In the problem, there are multiple philosophers who can either think or eat. The challenge is that there are a limited number of forks and in order to eat a philosopher must have both a right and left fork. If there are bugs, it could result in one or more of the philosopher’s starving to death! In a real-world application, this means your code stops executing, potentially leading to lost data and poor customer experience.

One of the solutions to address contention is to add a limit on how long a thinker can hold a single fork. In a production application, this is a good place to log an error. If a retry limit is being reached, it can indicate that things are not running smoothly.

for (var i = 1; i > 0; i++) {
	if (t.IsCancellationRequested) {
		break;
	}

	// ... Eating Logic ...

	if (++retryCount == 10) {
		Logger.LogError($"Thinker {Number} just starved to death");
		ReleaseForks();
		retryCount = 0;
	}

	// ...
}

Another common pain point is when threads are taking too long to do their jobs. In the exercise, eating times are randomized for illustration but it’s another good spot to log a warning.

// Eat!
if (ok) {
	var start = DateTime.Now;
	Digest(i); // This is where our hypothetical thread would do it's work.
	var duration = (int)(DateTime.Now - start).TotalMilliseconds;

	if (duration > ThresholdError) {
		throw new Exception($"Thinker {Number} ate for {duration}ms, max is {_durationMax}");
	} else if (duration > ThresholdSlow) {
		Logger.LogWarn($"Slow thread, Thinker {Number} is pigging out! Took {duration}ms");
	} else {
		Logger.LogDebug($"Thinker {Number} ate for {duration}ms");
	}

	ReleaseForks();
} else {
	ConflictCount++;
}

Using Logs to Debug Timeouts

In a production system, long execution times can also effectively starve other tasks trying to access a shared resource. This could be a connection timeout to a remote service or database. When looking at the logs, we can see how this affects the other threads that are running. We would then be able to look at what’s happening in our Digest method and apply a fix for our timeout scenario.

Apr 12 14:55:14 HungryThinker: WARN: Slow thread, Thinker 1 is pigging out! Took 755ms
Apr 12 14:55:18 HungryThinker: ERROR: Exception Caught! Thinker 1 ate for 1200ms, max is 800
Apr 12 14:55:24 HungryThinker: ERROR: Thinker 0 just starved to death!
Apr 12 14:55:24 HungryThinker: ERROR: Thinker 2 just starved to death!

Centralized Logging

When you are developing locally, simple log files do the job. Once you’re running in production, and especially on more than one server, it can be hard to track thread execution since it may occur on any number of servers and in any number of log files. The solution is to centralize your logs in a single place so you see the behavior of your entire application all at once.

SolarWinds® Papertrail is a great tool for managing not only the logging output from your applications but all of your servers and other logs as well. It is very easy to set up and offers several guides depending on your programming language or framework. Since our example is a C# app, I followed the instructions for log4net in their documentation.

Their live tail view shows log messages as they come in, which is great to monitor a system as it’s running. Papertrail also supports a search syntax that is quite expressive, so you can filter on errors or warnings, for example.

Papertrail – Live Tail View

Using our warning filter, we can configure an alert that might give you enough time to address issues proactively. You can use short frequencies for messages requiring immediate attention, or longer timeframes that you could use for a sort of digest containing places in your application that need some attention.

Papertrail – Alert Configuration.

Conclusion

It’s difficult to predict how your application is going to behave in production, especially when it’s using concurrent code. Data in production can flow very differently than it does in your development or staging environment. When you are running multiple servers, keeping track of what your applications (and threads) are doing can be a nightmare.

Using centralized logging with Papertrail is a great solution to keeping abreast of what’s happening in production and alerting you before small fires become raging infernos. Their alerts feature supports a long list of integrations with other services including Slack, SolarWinds AppOptics, Zapier, and more.