Making errors easy to fix

As we start developing services to be hosted in a cloud like system, things change in how you manage your service. What happens when an error occur on a remote system? Can you log on to the system to collect all necessary information to correct the error in your development code? How will you notice when an error occur?

On a cloud based system you many times wont be able to log on to the remote server. Accessing the error logs might not be possible. You might not notice an error that occurs on a remote server because there is no monitoring setup.

Previously when an error occurred in the hosting environment we had to find out which server to log on to, and then dig through the different servers until we find the error in a log somewhere. This process might take minutes or hours, increasing the time it takes to fix the problem even when debugging.

In RemoteX Applications REST-service I recently added an endpoint where you directly in the API can access all errors that occurred in the service. The goal is to have an empty list, and having this endpoint makes it very easy to directly get information regarding what went wrong on the remote server.

In fact the endpoint was added in an effort to identify a service bug that occurred in our staging environment, but only occurred occasionally. I needed a log. The inspiration came from Google AppEngine where you in the Application Dashboard see which paths causes errors in the hosted application and how often.

Now all we have to do is log into the API and ask it, what went wrong? It will tell us if anything has gone wrong, on which path and which HTTP verb was used. It will even give us a stack trace. Now some might say that you don’t want to display stack traces since it might give hints to how your application is designed. But in this case the benefit far outweighs the drawbacks. Remember the goal in the end is to have an empty error list.

The endpoint is implemented in such a way that we can ask the API “if has anything gone wrong since last we asked”? Which allows us to effectively build a monitoring service that will ask our installations every once in a while, “are you ok?”

Other solutions

There are other solutions to this which I do quite like as well. We have EQATEC Analytics which we at RemoteX use for our client applications. And there is Exceptioneer, which we have used for certain web-services in the past.

They both provide you with a way to collect errors and analyze them. Exceptioneer sends you an email when a new type of error occurs which is nice for monitoring purposes. However as far as I can tell none of them allow you to collect the raw data from a web-service. Something that could be quite useful if you want to combine the data with data from other services, or just create mash-up dashboards.

All this of course is to shorten the feedback loop should an error occur. It’s very useful during testing as well, since the applications will report any issues that occurred during testing helping to narrow down the root cause. With this in place you can focus more on code correction and delivery.