Riimu.net

title image

Xdebug will skew your performance

Posted on

Xdebug is an absolutely invaluable tool when it comes to PHP programming. Due to portability reasons, I don't usually like to rely too much on extensions that have not been bundled with PHP, but Xdebug is the one extension I will always include in my development environment.

Some of the smaller convenience features provided by Xdebug, like changes to var_dump() output, can be annoying at times (though, all of them can be configured). However, the extension provides three crucial features for developing an application larger than a few thousand lines:

  • Remote debugger
  • Code Coverage
  • Profiler

These great features come at a rather high cost, though. Xdebug also imparts a great overhead, which not only reduces performance, but it does it in an unpredictable manner.

These are not the optimizations you are looking for

Whenever there's an argument on what kind of code is the fastest way to solve problems, people usually like to rely on naive performance tests. Unfortunately, one of the questions I always have to ask first is: Did you run with or without Xdebug? If the performance degradation caused by Xdebug was even, there would be fewer problems, but usually, a piece of code can perform wildly differently on production environment compared to a development environment.

Let us take a look at a simple example. Imagine you have a multi-level array of data that you want to output as JSON. However, you know the array contains DateTime instances and you want to convert those to Unix timestamps instead. On top of my head, I could come up with two different solutions.

The first solution would be to make a simple recursive function like so:

function recurseArray(array $array): array
{
    foreach ($array as $key => $value) {
        if (\is_array($value)) {
            $array[$key] = recurseArray($value);
        } if ($value instanceof DateTimeInterface) {
            $array[$key] = $value->getTimestamp();
        }
    }

    return $array;
}

In this case, PHP also provides a convenient function array_walk_recursive(), which could also do the job:

function walkArray(array $array): array
{
    array_walk_recursive($array, function (& $value) {
        if ($value instanceof DateTimeInterface) {
            $value = $value->getTimestamp();
        }
    });

    return $array;
}

However, I can't quickly tell which of these functions is faster. On one hand, the walkArray() function has a lot of overhead due to numerous function calls, but on the other, it delegates the array recursion to PHP internals.

To test this, one would typically do a quick and naive speed test like the following:

$times = 10;
$testData = array_fill(0, 10000, [[['foo']], 2, new DateTime(), [[[[new DateTime(), [1]]]]]]);

$timer = microtime(true);

for ($i = 0; $i < $times; $i++) {
    $result = walkArray($testData);
}

echo 'Walk:    ' . round((microtime(true) - $timer) * 1000) . "ms\n";

$timer = microtime(true);

for ($i = 0; $i < $times; $i++) {
    $result = recurseArray($testData);
}

echo 'Recurse: ' . round((microtime(true) - $timer) * 1000) . "ms\n";

I run the code and get the following results:

Walk:    1284ms
Recurse: 1285ms

Looks like they're about the same based on this speed test. But wait, did I run this code with Xdebug enabled or not? A quick command line call reveals that I did, in fact, have Xdebug enabled:

$ php -v
PHP 7.2.14 (cli) (built: Jan 12 2019 05:21:04) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.2.0, Copyright (c) 1998-2018 Zend Technologies
    with Zend OPcache v7.2.14, Copyright (c) 1999-2018, by Zend Technologies
    with Xdebug v2.6.1, Copyright (c) 2002-2018, by Derick Rethans

So, let's disable the extension and try running the speed test again. This time, we get the following results:

Walk:    550ms
Recurse: 284ms

Now, it looks like the recursion method is actually quite a bit faster.

While this is not the best example, it does illustrate that the performance profile of PHP is completely different based on whether you have Xdebug enabled or not.

You should not, however, stop at simple speed test like the one above. It can give you a good idea what might be faster, but running the code on production data may yield different results. The previous example is based real-world scenario, in which the array_walk_recursive() actually ended up being more efficient in production.

Dealing with performance woes

Despite the fact that Xdebug causes performance problems, it still provides great features, as I said in the beginning. There are several ways you could potentially deal with the problems depending on your use case and setup.

Simply don't enable Xdebug

The simplest solution is to just not enable the extension. Note that it is not enough to just set xdebug.default_enable = 0 in your config. In fact, all that it does is disable traces from errors. The performance issues are caused simply by loading the extension since it needs to hook into various places in the PHP core.

When you actually need to use one of the features from Xdebug, only load the extension on those occasions. To help with this, there are couple handy solutions.

Using a xdebug toggler script

If you're using PHP on a macOS from homebrew, one way to easily toggle Xdebug on and off is to use the xdebug-toggle script:

When you set up Xdebug via PECL, this script effectively just renames ext-xdebug.ini in your PHP installation's conf.d directory to enable and disable the Xdebug extension via command line. It can also handily reboot your apache at the same time.

Setting up the extension in PHPStorm

If you're using PHPStorm, you can also add a separate path to the Xdebug extension in your PHP cli interpreter configuration. This allows PHPStorm to include the extension via command line only when you run scripts via "Debug" or run PHPUnit with code coverage.

Running separate containers with Xdebug

If you happen to have a dockerized PHP development setup, you could alternatively setup a second container with Xdebug enabled. This could allow you to use the step by step remote debugging capabilities without the need of restarting your server. Nginx can be set up to simply redirect the request on the appropriate container based on whether you want to enable remote debugging or not.

You can read a good write up about setting up a setup like that from Juan Treminio in his article Developing at Full Speed with Xdebug.

Avoid opening remote debugging sessions on each request

If any of the aforementioned solutions seem like too much work to set up, I would at least recommend using some kind of tool to manage remote debugging sessions from the browser.

If you enable the configuration xdebug.remote_autostart, Xdebug will always start a remote debugging session and that will needlessly hurt the speed of your application. Managing a debugging session cookie manually isn't very handy either.

Using a browser addon like xdebug-helper can be a great boon in enabling and disabling debugging sessions. That particular extension is available for both Chrome and Firefox.

Whitelisting for code coverage

When you want to generate code coverage, there aren't many alternatives to Xdebug. While PHPUnit also supports code coverage by running your tests using phpdbg -qrr command, it tends to come with its own set of problems (like instability, bugs, and differences in coverage metrics).

Recently, however, a new feature has been added to Xdebug which allows setting up whitelists for code coverage gathering with xdebug_set_filter(). This makes it much faster to run tests with code coverage, as the metrics are only gathered for specific files.

If you're interested, you can read more about this solution from Faster Code Coverage by Sebastian Bergmann and Sebastian Heuer.

Profile with care

Given the performance impact, you might be quick to think that this makes the profiler provided by Xdebug a bit useless. However, it can still provide quite a bit of valuable information if you use it knowing its weaknesses.

The biggest mistake you can make with Xdebug is to rely on only its metrics for speed improvements. If you try to make your code faster only under the profiler, you tend to lean towards micro-optimizations which may not actually give any real-world benefit.

However, the profiler can still provide you with useful insight about which parts are actually taking more time than you expected.

In real-world scenarios, I've often found that profiler tends to highlight two different kinds of errors:

  • Code that was taking much more time than expected due to being called too often
  • Code that was doing something completely unintended and unnecessary.

Especially, if you are unfamiliar with a code base, a profiler can help you locate potential bottlenecks. Things like a function being called 100,000 times or a script making hundreds of database queries are quite easy to spot using a profiler.

As I discussed in my previous blog post, many performance issues tend to be structural in nature and a profiler is a good tool to find those kinds of issues.

Knowing your tools is important

Xdebug is an invaluable tool to any proficient PHP developer. It can still easily lead you astray if you don't fully understand it.

The way I see it:

  • The first step is to learn that Xdebug is a great tool
  • The second step is to understand that it has flaws
  • The third step is understanding how to use it effectively while fully knowing its flaws

Debug and test your performance responsibly.

title image

The Inefficient Architecture

Posted on

Have you ever heard the adage "premature optimization is the root of all evil"? It's quite possible you have and you've also probably been taught you should only optimize actual bottlenecks and leave the optimization until the end.

In the paper that originated the phrase "Structured Programming with go to Statements", Donald Knuth mentions how programmers waste enormous amounts of time worrying about the speed of noncritical parts of their programs. He also adds that while we should worry about the performance of the critical parts, our intuitive guesses about the importance of different parts tend to be incorrect. Thus, we should rely on actual tools to make these judgments.

Sometimes this leads to the extreme position that there is no point in considering the performance impact of anything until you have actual data on the real-world performance. This may, however, end up foregoing all optimization until it is too late.

Time to ship is a woeful optimization goal

Not too long ago, I watched an insightful talk by Konstantin Kudryashov on Min-maxing Software Costs. To me, one of the most important takeaways from that talk was the fact that we tend to optimize the cost of introduction. That is, how fast it is to write new code and ship applications because that is the easiest to optimize and the easiest to measure.

Because of that, we tend to build frameworks and layers that provide convenience by hiding actual implementations and making many operations implicit or lazy. The problem is that we often end up setting up traps for ourselves because we don't either fully understand the underlying mechanics or we simply forget because the frameworks make everything too transparent.

The N+1 problem is the first sign of things to come

A typical problem caused by the convenience provided by numerous different frameworks is the N+1 query problem. If you are unfamiliar with this particular issue, let me give an example:

Let's imagine we have users and each belongs to one organization. We want to print each user and their organization. With a typical ORM/DBAL implementation, we could have code that looks something like this:

$users = $userRepository->findAllUsers();

foreach ($users as $user) {
    echo $user->getName() . ', ' . $user->getOrganization()->getName();
}

The most common problem with the previous example is that while findAllUsers() fetches all the users from the database using a single query, it doesn't fetch the organizations related to each user. The method call getOrganization() is actually a lazy loading function that queries the database for the organization that is related to the user. What ends up happening in practice is that we make one additional query per each individual user. So, the total number of database queries we end up making is 1 + N (where N is the number of users).

The appropriate solution would be to eagerly load all the organizations for each user. This could be done either by using a simple JOIN query or doing a second query with ids for each organization. The best optimal solution depends on the number of users per organization and your preferred framework. Although, in this particular case you may also just want to fetch the name fields specifically without fetching entire entities.

Convenience is the enabler of bad performance

In my honest opinion, the above example shouldn't even be possible. The code should throw an exception because the organizations have not been initialized for the users in the first place. However, because pressure from schedules encourages us to optimize how quickly we can write code, frameworks provide functionality that automatically handles these relationships for us.

You might be inclined to think you would easily catch issues like the above, but real-world examples don't tend to be quite as straightforward. You might have code in another place that fetches database entities, and then somewhere else 20 layers deep, you have another piece of code that uses relations in an unexpected way which triggers additional queries.

The N+1 problem is merely the simplest case to demonstrate that convenience features create performance problems in applications. The core issue of the problem lies in the fact that passing data around different layers of applications is quite difficult. We like to abstract it away behind abstraction layers to make it simpler to reason with but at the same time, we stop thinking about what is actually happening behind the scenes.

I've worked on optimizing several legacy systems that had created massive bottlenecks due to how some application data storage were accessed. In one application, for example, there was a function that was equivalent of Storage::readValue($storageName, $key), which read a single value from the store. However, each time it was called it needed to open and close the external store, but rather than caching the values, each piece of code simply called the static function separately as that was more convenient than figuring out how the data should be passed around different parts of the application.

Be Explicit, Be Performant

Unfortunately, there is no particular silver bullet here. Software architecture is really, really, hard to get right. My own ideology in general, however, is trying to be as explicit as possible in code. When designing frameworks or APIs for libraries, we should do our best not to allow the user to shoot themselves in the foot.

In particular, if a user is doing something that is potentially costly, like accessing IO, we should force the user to be as explicit about it as possible. It should not be possible to just accidentally query a database when accessing an entity relationship. Force the user to think, even for a second, about what they're about to do. This doesn't necessarily mean that APIs should be hard to use. Rather, they should be predictable.

For example, in the previously demonstrated code, the $user->getOrganization() call should not be possible if the organization has not been preloaded for the user. Instead, we should force the user to call something akin to $userRespository->loadOrganization($user).

When you have lazy accessors to database relations, the code becomes surprisingly unpredictable. Simple getters turn into database queries which makes it frustratingly easy to forget, especially for newer developers, that these getters can have a massive performance impact.

Revisiting the topic of premature optimization, these kinds of performance problems are usually created in the initial steps of different projects because little thought is put into how performant different kind of application structures end up being. Once you have a web application that, for example, queries for the same piece of data from the database for couple hundred times in a single request, it can be really hard to fix that kind of structural issues.

I do generally agree that optimization decisions, especially the ones about software architecture, should be based on informed opinions. There is a real danger in getting lost in meaningless micro-optimizations and focusing on the wrong things. However, if your application is inefficient by design, scaling up may become an insurmountable challenge.