Possible solutions were thrown around: do we duplicate our code in PHP? Use an artificial DOM? What about PhantomJS? Duplicating code would be a monumental effort and a continued burden when writing new features. Initial tests of fake/artificial DOMs proved unreliable. A small prototype Node.js web server that hooked into PhantomJS proved promising. Node.js’ async model would be perfect for handling things that wait for I/O like rendering webpages. We came up with the project name ‘The Phantom Renderer’ soon after.
I spent a few days whipping up a prototype proxy server to test with that worked like so:
- Node.js web server accepts a url in the querystring
- Send that URL to a newly-spawned PhantomJS process that listens on stdin
- PhantomJS fetches the page, we wait 500ms after the last HTTP request is sent to get the rendered content via the
- Send content back to Node.js
- Send content back to search bot
We thought we had a fairly simple and working solution.
Problem 1: When is a webpage ‘complete’?
In our prototype app we assumed that a webpage was ‘finished’ 500ms after the last HTTP request had begun. As you can probably already guess, this is incredibly naive. Our site loads dozens of images, scripts and stylesheets (not to mention lots of analytics code). Some load instantly, some take > 500ms to return content. What happens if a request completely fails? If the page is redirected (301, 302 or even via JS/meta tag)? 404s? We had to handle all those cases appropriately and gracefully.
At first, we had many pages that looked like this after ‘rendering’:
Obviously, this wasn’t going to work.
Through a lot of manual testing and QA we eventually came to a solution where we tracked each and every HTTP request PhantomJS makes and watch every step of the transaction (start, progress, end, failed). Only once every single request has completed (or failed, etc) we start ‘waiting’. We give the page 500ms to either start making more requests or finish adding content to the DOM. After that timeout we assume the page is done.
Much better! But we weren’t out of the woods yet…
Problem 2: PhantomJS and Node.js Bugs
Getting PhantomJS to render pages correctly during testing was a lot of work, but dealing with PhantomJS bugs made tear our hair out on occasion. When you are dealing with > 500 requests/second you uncover sporadic, random bugs that most people don’t. Also we are using a large percentage of the PhantomJS API, which means we are more likely to hit bugs or undocumented behavior. We also were new to PhantomJS so there was lots of user error 🙂
Some of these fun bugs and problems we dealt with were:
- If PhantomJS got in a redirect loop it would hog all CPU and rapidly fill up memory until it crashed itself or the server it was on
- Random ECONNRESET errors from child processes upon termination
- Small percentage of PhantomJS processes simply not returning
- PhantomJS’ onResourceRequested and onResourceReceived returning different URLs for the same resource due to url encoding. This causes problems if you are tracking requests.
- Expecting PhantomJS processes to terminate cleanly. Instead tell it to exit, then kill the process. Double tap!
Problem 3: Scaling PhantomJS and NodeJS
Since this was a brand new project and we knew rendering web pages was CPU intensive, we spent a lot of time running benchmarks (and learning how to benchmark).
Our testing infrastructure consisted of a test Phantom Renderer box and a separate server running http_load that was used to send varying amounts of traffic. We created a list of 600 public gallery urls from our most popular customer sites and repeatedly slammed our test server with varying load to determine the best combination of processes, CPU and RAM.
It’s important to also document raw number of requests/sec and response time. A server isn’t very useful if it can handle hundreds or thousands of requests/sec but takes far too long to complete them.
When performance testing we learned a few things:
- Don’t test against your normal QA/test environment. This will make your QA and dev teams unhappy.
- Do make sure the any dependent services can also hand additional load/traffic!
- Do use as close to production workloads and data.
- Do repeat your tests multiple times to allow for services to ‘warm up’.
- Do test multiple configurations (number of processes, max connections, etc) on the same hardware.
- Do write down all your results and extra data
- Do test for long periods of time (hours at least). You’ll probably uncover issues that won’t occur during a short performance test.
We also had a few problems scaling PhantomJS once it was in production and running for long periods of time:
- Setting PhantomJS’ cache size too big, causing all 64 PhantomJS processes to slam the disk with reads and writes when the cache filled up and needed items removed.
- Running too many PhantomJS instances, filling up RAM over a period of a few hours and causing processes to be killed.
- Node.js’ Cluster module on Ubuntu not load balancing equally between processes, causing server CPU to be underutilized (fix is to put HAProxy in front of Node.js)
- Setting too high of a limit on number of connections on our HAProxy servers, overloading our servers.
We also spent some time optimizing PhantomJS to load pages quickly by turning off image loading, allowing it to use a small disk cache and keeping the PhantomJS processes alive instead of respawning them for every request. We also spawn a separate Node.js process for each processor core, allowing for massive parallelization.
The importance of logs
During testing and tuning Phantom Renderer, we developed one strong habit; log everything. When we first started the project, we had no logging whatsoever. Debugging issues was easy at first when the codebase was small, but once it grew in size and complexity debugging became much more difficult. When Phantom Renderer was being tested it was difficult to determine the cause of bugs and errors (or even what PhantomJS and Node.js was doing).
About midway through the project we started using Winston, a great logging utility for Node.js. With Winston in place we added logging to every single step of the render process in PhantomJS and HTTP process in Node.js. We also used Winston’s log levels to allow for different levels of logging for debugging and production. Combining that with Splunk gave us deep insights into how specific requests were handled and how often certain errors were occurring in production. If you’re starting a new project logging should be a required piece of it.
The future of The Phantom Renderer
We’re hoping to open source The Phantom Renderer sometime in the near future. Hopefully it will be useful for web apps that have a mix of different frontend and backend technologies. Let us know if it’s something your team or company is interested in using!
We’ll be posting more in-depth posts about our experience with PhantomJS and NodeJS. Stay tuned!
Logs photo By Aapo Haapanen from Tampere, Finland (Logs) [CC-BY-SA-2.0 (http://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons
SmugMug users have uploaded millions of awesome photos, and one of the things we work hard on is making it easy and fun for people to discover them. SmugMug Search is an important part of this, since it allows anyone on the web to search among all public photos on SmugMug. It also helps drive traffic to our Pros, many of whom make a living selling prints of their work.
Naturally we want Search to be fast, intuitive, and beautiful. But more importantly, we want to showcase our users’ gorgeous photos. When people search for photos on SmugMug, they want to see photos, not a bunch of pagination links and other user interface clutter. So, a few months ago, we launched a redesigned Search page that does just that.
We put those big gorgeous photos front and center, and we got rid of the ugly pagination links in favor of infinite scrolling—as you scroll down, more results are loaded automatically. This looks great and works well, but keeping the interface fast and responsive presented a lot of challenges, especially on older browsers or slower computers.
Behind the scenes, the search results are loaded into a YUI Model List via XHR and rendered using YUI Views. As the user scrolls down, more results are loaded automatically, appended to the list, and rendered.
Even worse, as the user scrolled further and further down and more results were loaded, memory usage skyrocketed. With potentially thousands of results on the page at once and a Model and View instance for each, users without lots of RAM sometimes saw things grind to a halt as the browser was forced to rely on virtual memory. This clearly wasn’t a good experience.
Here are some of the things we did to speed stuff up:
Instead of creating a View instance for each image tile, we now use a single master View instance for the entire list of results. As new results are added to the page, the master result view is responsible for rendering those new results without re-rendering any of the existing results on the page. Now, even when there are thousands of images on the page, there’s just a single view managing them all.
We wrote a new ScrollInfo plugin for YUI’s Node component, which provides a highly efficient, throttled wrapper around the browser’s native
scrollevent. Since the
scrollevent can fire hundreds of times per second, throttling ensures that our event handlers only run, say, once every 50 or 100 milliseconds rather than on every single event. This puts less of a burden on the browser and ensures that more system resources are available to render content and keep the page feeling responsive as the user scrolls. This plugin isn’t yet available as part of YUI, but we’ve sent a pull request and we hope they’ll accept it.
Our profiling revealed several memory leaks in YUI’s event system, which we were able to work around to improve memory usage even above and beyond our other improvements. The YUI team is aware of these issues. Some have already been fixed in YUI 3.6.0, and others will be addressed in an upcoming release.
Naturally we also took the opportunity to make lots of other minor improvements and fix lots of little bugs, but those weren’t directly related to the performance effort.
Benchmarks & Pretty Graphs
Here are some pretty graphs demonstrating the effect of the performance improvements we made. Results were gathered using Google Chrome’s profiling tools on a Mac Pro (2.8 GHz Quad-Core) running OS X 10.8.
We also created a jsPerf test for LazyModelList to demonstrate how much faster it is than ModelList.
That Ain’t All
It took some work, but now we’ve got a gorgeous search page that performs well even in older browsers and on slower machines. We’re pretty happy with it, and we hope you are too. But that’s not all! We’ve still got plenty more improvements planned, so keep your eyes peeled.
— Ryan Grove and Brian Strong, SmugMug Sorcerers