A 2-stage Payment Process
A client had a two-stage payment process. The initial payment was a deposit with the full payment usually being taken manually afterwards. To avoid this manual overhead they paid for development of an automated second payment process. The automated payment process would run daily and take any payments due on that date.
Problem: Is it even running?
When the project was handed over to a new development team they reviewed the code. The error handling for the second payment process was set up correctly and the code looked good, but the line that enabled it to run was commented out and therefore wasn't running. It turned out that they had never actually enabled the process in production, and it hadn't even run once.
How was this not noticed? The most likely scenario is that they had tested it extensively before deployment and then forgot to enable it for production. Due to the delay before it was even supposed to run (several weeks after the initial payment) there wouldn't be any immediate effect. By the time the system was supposed to take over the handling of second payments it had been forgotten and the client continued to process them manually for months.
Solution: Does the process complete?
The main cause of this problem is a reliance on error monitoring over completion monitoring. The code was written, tested, approved, deployed and there were no issues. The schedule was set up, logging was in place and error handling was in place. When no alerts were received it seemed to indicate that everything was fine, but in fact it just indicated that there were no errors. There was nothing to monitor if the process even started (or finished) correctly.
With Process Warden the monitoring is inverted and rather than detecting errors in these situations we detect successes. To put it another way, we monitor for an expected success (in this case some daily indication that the process had run) and if we don't receive it then an alert is triggered. With this approach receiving no alerts from Process Warden can only mean that the process is working as expected.
Result: The process is completing correctly every day
There was no issue with the code and as soon as the process was actually started it worked perfectly. Process Warden checks every day that it has run to completion and finished successfully - if it does not complete for any reason then it will trigger an alert.
This means that the following conditions are caught above and beyond simple errors:
- Hanging
- Failure to start (e.g. server off or process disabled)
- Incorrect completion
- Not running often enough
- Taking too long to complete
And on top of that there's no extra overhead for monitoring - you still simply wait for an alert but in this case you can virtually forget the process exists since you'll only hear about it if it fails to complete. If everything's quiet, the process is definitely working.