Heuristic for Solving NP Planning Problem

Continue from last post about NP-Problem. Today, I am going to study a planning puzzle which is a type of NP-problem.

In fact, planning problems are NP-Hard (Scary).

Disclaimer: This post is for self-study and for research purposes only. I reuse some of the content from third party website (the Bus-Driver-Planning Puzzle). If it caused any unintended troubles to whatever party, please leave comment below and I will remove the post.

The Bus-Driver-Planning Puzzle

This is a real life puzzle according to the creator. It looks something like this:

  • There are three bus lines: Line 1, Line 2, Line 3
  • You must allocate two drivers per line per day. One for the morning shift and one for the late shift.
  • A driver can only do one shift per day; early or late.
  • There are eleven bus drivers: A, B, C, D, E, F, G, H, I, J, and K. Some are only qualified to drive on Line 1, 2 or 3, while some are qualified to drive on a combination of two of those lines. Find the qualifications under each driver name in the Shift Planning form.
  • Every driver is entitled to days off. Days off are indicated with dark gray nodes in the Shift Planning form. You can’t assign shifts on days off.
  • Some drivers prefer to take particular days of the week off. This is indicated by light gray nodes in the Shift Planning form. These are preferences, rather than rules.
  • Drivers may prefer to work an early or late shift on particular days. The yellow nodes in Shift Planning form indicate shift preferences.
  • Most drivers dislike working too many late shifts. Therefore, the late shifts should be distributed as fairly as possible among the drivers. You should aim to assign exactly 4 late shifts to each driver over the two-week planning period.
  • According to an agreement with the Transport Workers Union, a driver must not be assigned a late shift followed by an early shift the next day.
  • According to a second agreement with the Transport Workers Union, a driver can only work no more than three consecutive late shifts.

Planning Puzzle

An Example of the Planning Puzzle

You can earn and lose points in the following way:

Respect a single shift preference +3
Respect a single driver day-off preference +4
For each driver allocated a long rest (3 or more consecutive days off) +5
For each unassigned shift -20
For each late shift followed immediately by an early shift in a single driver’s schedule -30
For every after the third consecutive late shift assigned to a single driver -10
For every late shift assigned that is not equal to 4 -8

Here is the problem…

Given the above puzzle, how would you approach it? In fact, the above puzzle is intended to be solved with optimum solution (a score of 162) within an hour MANUALLY. In other words, you need to assign the tasks to the driver one by one to fill in the colorful puzzle in the example above (see image above).

Knowing that this problem is NP-hard, which means there is almost an infinite number of feasible solutions in the search space, finding the optimum solution MANUALLY is nearly impossible. Hitting the jackpot/lottery has higher chance than finding the perfect solution.

However, there are people who scored a full 162 scores for it. Is it luck? Possible. From time to time, there are people who hit the multi-millions jackpot. Are they skillful? Not sure. Are they lucky? Sure yes!

The only way to know whether those who scored 162 scores are really skillful is to ask them to solve several puzzles in a row and see if they can still maintain full scores for every puzzle. However, this still cannot rule out the possibility of luck since in life, there are people who got strike by lightning more than anyone else.

How to approach the NP problem

Having a near-optimum solution for a NP-hard problem is good enough. This avoids the need to exhaust all the possible solutions to find for the optimal one which might take up the whole universe time.

With the above knowledge, we can devise a heuristic solver to tackle the problem. Heuristic is an experience-based, quick and dirty way to come up with a near-optimal solution. With a little bit of luck, sometimes, it can also achieve an optimal solution.

Heuristic for the NP Planning Problem

Firstly, we need a model to represent the problem. The puzzle’s input (the constraints of the puzzle like the drivers’ preferences and the resources like which driver drives which line, etc) and output (the assignments of each driver) need to be encoded in a way that a computer can handle (The actual implementation is not exposed to allow imagination for the readers).

Then we need an evaluation function that can evaluation the quality of the solution. It allows us to know whether solution A is better than solution B. This function is obtained via the information from the table with points above.

Thirdly, our goal is to maximize the score. The most straight forward way to achieve this is to pick a task from a pool of available tasks and test-assign it to all the drivers for every day and evaluation the assignment to find the maximum score before assigning the task to a specific driver on a specific day. Then continue until no more task can be assigned.

With this approach, we will usually arrive at a local maximum and stuck there. To overcome this, we need to inject certain randomness to the solver by randomly unassign certain tasks from certain drivers and rerun the previous step to reassign the tasks back to drivers with the highest score. And do this infinitely.

Heuristic Random Solver

Heuristic Random Solver

The animation above shows the successive highest scores found (the animation is run at a one-second cadence) for the same puzzle in the example at the beginning of this article. The heuristic solver hit a maximum score of 148 (out of 162) after running for about 1 hour and 17 minutes.

An interesting thing to mention is that different run of the solver will return different high score at different duration. For example, sometimes, it will produce a score of 147 in under 30 minutes, however, on other times, it needs more than one hour to return the same high score. Thus the non-deterministic nature of the solver.

Another thing to note is that, given enough time to run, the solver will hit the jackpot one day. For example, for certain puzzles, it can hit a score of 154 after running for a few hours. If given enough time, it should be able to find the optimal solution.

Heuristic Random Solver (faster animation)

Heuristic Random Solver (faster animation)

This is a faster animation of the same algorithm running on the same puzzle. It allows us to “see” and have an idea of the evolution of the solution. We can practically see how the solution is forming gradually.


There are two kinds of constraints for the puzzle: hard and soft constraints. Hard constraints are constraints that you must follow. Soft constraints are constraints that are OK not to follow, however, they will cost penalty.

In this puzzle, hard constraints are:

  • day off which no driver can work on that day
  • the line that a driver can drive

Soft constraints are:

  • the drivers’ preferences
  • the penalties like driving late shift for 3 consecutive nights, having late shift followed by early shift, having number of late shifts not equal to 4

If you think about it, hard constraints are actually a good thing to have since it limits the search space. We have less to search. However, what make things complicated are the soft constraints. We don’t know whether to satisfy them or not. But we cannot ignore them else we won’t be able to find the near-optimal solution.

It is a bad idea to turn the soft constraints to become hard constraints hoping that it can reduce the dimensions of the search space. Doing this will make the solver unable to even assign all the tasks to the drivers. This is a key learning that I had.