Detecting broken links for Selenium and webdriver on Node.js

Yesterday I wrote a blogpost on how I'm writing a small node "application" that controls a webbrowser. Yesterday I set things up so I can control the browser. Today I'm taking it one step further.

For my application I want to be able to check the pages I visit for errors. Such as for example broken links.

I start out by trying to see if I can get selenium to do two things for me.

  1. Give me the HTTP status code for the page it's currently showing.
  2. Inform me when the page is fully loaded.

After some research though I find out that Selenium can't help me with HTTP Status codes, and and probably never will.

The blog post above mentions possible solutions, one being to set up a proxy and run your tests through the proxy. This turns out to be a great idea for me. That allows me to check all the traffic that the browser will generate as my application runs it.

In turn I can then wait for all ajax calls to be completed in a webpage before proceeding. With the added benefit that the proxy can inform me of status codes such as 404 and 500.

Setting up a proxy

I set to work to setup a proxy that will allow me to wait for the page to load.

I previously used node-http-proxy to setup a proxy, for our brunch configuration at work. The library is pretty straight forward, to use and for our use case nearly perfect.

To setup a proxy we simply have to ask it to create a proxy, and specify a port for it to listen on.

  var proxy = httpProxy.createProxyServer({
    target:options.url,
    changeOrigin: true,
  }).listen(options.localProxy);

The target is the host I wish to proxy. I want to change the Origin header because I don't really want to tell the server I'm going through my local proxy, because that would prevent me from seeing issues with for example CORS.

Another gotcha here is to allow the proxy to proxy the entire host for the page under test. Without paths, if the page fetches resources from subpaths or based of the host root the proxy will prevent the resources from being loaded.

Waiting for page loads

With that proxy all I now have to do is listen on requests as they start and finish.

  var requests = [];

  const requestStarted = (proxyRes, req, res) => requests.push(req);
  const requestFinished = (proxyRes, req, res) => requests.pop();

  proxy.on('proxyRes', requestFinished); // important to have listen to finished before started.
  proxy.on('proxyReq', requestStarted);

As you can see I keep a list of all currently active requests. Because that is what I need to see if the page is "done loading".

  const check_requests = (callback) => {
    return () => {
      if(requests.length == 0){
        console.log('done waiting for page load');
        callback();
      } else {
        console.log('waiting for ', requests.map(r=>r.url));
        setTimeout(check_requests(callback), 500);
      }
    }
  }

  const wait_for_load = () => {
    if(requests.length == 0){
      return Promise.resolve(1);
    } else {
      console.log('waiting for page load');
      return new Promise((resolve,reject)=>{
        setTimeout(check_requests(resolve), 500);
      });
    }
  }

  proxy.wait_for_load = wait_for_load;

The above code checks the array of active requests to see if we have any loading going on. Once the array is empty it completes a promise allowing us to continue once we are done loading. If no loading is going on, we just return a completed promise.

Checking for 404 and 500 status codes

The proxy's eventhandler for finished requests can easily be modified to check for 404 status codes.

  const requestFinished = (proxyRes, req, res) => {
    if(res.statusCode != 404 && res.statusCode != 500) {
    requests.pop();
    }
  }

The above code wont remove 404 and 500 from the requests array. Thus effectively blocking the wait for page load function.

Combining yesterdays code with todays

I thought I'd show the setup from yesterdays blogpost together with todays

install().then(start_selenium).then(Promise.all([start_selenium(), create_driver()])).then((driver_and_proxy)=>{
  const proxy = driver_and_proxy[0];
  const driver = driver_and_proxy[1];
  // you now have a browser up and running linked to the driver
  // and a proxy to check the traffic generated by the browser
});

The above code assumes you change your urls to point to the proxy for the browser, and let the browser proxy the page under test.

With the wait_for_load function I added above I can in my tests check for loading to complete as such.

proxy.wait_for_load().then(do_next_browser_logic);