“Greedy approach”, or more technically, “greedy algorithm”, is a useful method to tackle some of the optimization problems. In optimization problems, they usually involve step-by-step decision making. The “greedy” sense of the “greedy algorithm” actually comes from the idea that we ignore what decisions have made in previous stages, and simply focus on the sub-optimal solution in every single stage. In this way, we attempt to derive the overall optimal solution. Take the “activity selection” problem as an example:
“There are N activities for someone to choose for participation. Every activity has its starting and ending time. Activities may clash in time. For instance, activity A is from 2pm to 5pm, while activity B is from 3pm to 6pm. In this case, only one of these two can be chosen. How to choose the activities so that one is able to join the maximum number of activities?”
In this problem, we can see that we need to choose activities one by one, and when choosing each activity, we need to consider which activities are still available at that moment. One may come up with a single rule of thumb to make each decision. For instance, we could repeatedly choose the currently available activity which starts at an earliest time, without making any time clashes. However, “greedy algorithm” may not be always correct. Let us consider the “greedy” method mentioned above, i.e. we keep on choosing the earliest starting activity, and try to use this algorithm to deal with the situation below:
Number of activities (N) = 3
If we adopt the above method, we can only choose activity A. But obviously, this is not the optimal solution, because we can choose activities B and C instead. What should the correct method be? We observe that if one can choose an activity which ends earlier, then more time could be allocated for choosing other activities. Thus, we can raise another “greedy” method, i.e. repeatedly choose the earliest ending activity, without making any time clash. By considering again the above situation, we would firstly choose activity B, because it ends most early (2:00 pm), and then choose activity C (ends at 4:00 pm). This seems working. But does this always work?
The answer is YES. The explanation is as follows. Consider a list of activities as time intervals drawn in the diagram below:
In the diagram, all the activities are sorted in ascending order of ending time and the red intervals are the chosen activities by the above method.
Suppose the contrary that the above method does not yield an optimal solution, i.e. there exists an activity X (green in colour) within the above diagram and it is not chosen. 3 cases are considered:
Case (i): Activity X ends earlier than start of 1st chosen activity OR starts later than the end of last chosen activity.
This is impossible because if such activity X exists, the “greedy” method mentioned is able to choose it, and this contradicts to the real situation.
This is impossible because if such activity X exists, the “greedy” method mentioned is able to choose it, and this contradicts to the real situation.
Case (ii): Activity X lies between end of a chosen activity and start of next chosen activity.
Similar to case (i), if activity X exists like this, the “greedy” method is able to handle such case. Thus case (ii) also leads to contradiction.
Case (iii): Activity X has time clash with at least one of the chosen activites (red one).
This can be a possible situation. But the problem now is to find out a better solution than the “optimal” one found by the above “greedy” method. Case (iii) obviously does not yield a better solution, since if activity X is chosen instead, at least one of the activities in “red” must be discarded.
As a result, such “greedy” method is correct.
In this example, we learn that “greedy” methods may be very intuitive, efficient, yet wrong. To figure out whether a “greedy” method is correct, one should either provide a logical proof, or a counter-example to the method.
Sometimes, the “greedy” algorithm behind requires more thorough thinking, and the proof could be quite subtle. Let us come to another problem below.
[Modified from the problem “Saving Endeavour” (see reference)]
“There are N toys. Each toy needs to pass through 2 main processes: moulding and painting. Obviously, moulding must be done before painting. The time for moulding and painting(mi and pi respectively for ith toy) for every toy may be different. Now we only have 1 moulding machine and 1 painting machine. How should we produce the N toys in a suitable order such that the total production time is minimized?”
For example, now there are 6 toys in total. Their times for moulding and painting are tabulated as shown:
If we manufacture the toys in the order B, E, F, A, D and C,
Total time required would be 28 minutes, and this is minimal for such case.
This problem is much more complicated than the “activity selection” problem. It is because one can come up with even more possible “greedy” approaches:
- Choose the toy with minimum time for moulding first?
- Choose the toy with minimum time for painting first?
- Choose the toy with minimum sum of times for moulding and painting first?
- (And more……)
Unfortunately, the first three approaches mentioned above are all wrong. Many counter-examples can be constructed and are skipped here. But why could we think of such a lot of incorrect approaches? This is because we have not yet considered the key properties in this problem.
We can observe that we should try our best to make the moulding machine complete one job as soon as possible, so that it is readily available for production of other toys. In other words, the moulding machine should produce toys with the shorter moulding time first.
But this is NOT enough. The painting time should also be considered. As the start of painting is solely dependent on the completion of moulding, the painting machine may sometimes need to “wait” the moulding machine to finish its job. Therefore, to utilize the resources well, it is not difficult to see that we should minimize the “idle time” for painting machine. To do this, we should always produce the toys with shorter painting time as late as possible, so that the painting machine could somehow work while the moulding machine is still working.
By combining these two factors, we could then derive a “greedy” approach. We can sort the toys in an order such that for every pair of toys a and b (a < b), then min(ma, pb) < min(mb, pa). This is complicated, but makes sense, because as mentioned, we should produce the toy with shorter moulding time first, and delay the production of toys with shorter painting time. If we compare any two toys by using such mathematical expression, if either ma or pb is small enough, it would lead to earlier production of toy a.
Take the comparison between toy 2 and toy 4 as an example. Since min(1,4) = 1 is smaller than min(5,2) = 2, toy 2 should be produced before toy 4.
The proof is shown below.
The proof is shown below.
Suppose toy a is produced, and let D be the time required for painting machine to be available again (counting from start of moulding toy a), then there are 2 cases:
Case (i): D is smaller than ma.
Case (ii): D is greater than ma.
Thus, the time required to finish painting toy a would be max(D - ma, 0) + pa.
Suppose toy a is produced prior to toy b, and let D’ be the time required for painting machine to be available again after painting toy a, then
D’ = time required to paint toy a = max(D - ma, 0) + pa
As a result, time required to finish painting toy b would be
max(D’ – mb, 0) + pb = max(max(D - ma, 0) + pa - mb, 0) + pb
max(D’ – mb, 0) + pb = max(max(D - ma, 0) + pa - mb, 0) + pb
Suppose toy b is produced prior to toy b, then time required to finish painting
toy a would be max(max(D - mb, 0) + pb – ma, 0) + pa
If we claim that a shorter total time can be reached when we produce toy a prior to toy b, then
max(max(D-ma, 0) + pa-mb, 0) + pb < max(max(D-mb, 0) + pb-ma, 0) + pa
max(max(D-ma, 0), mb-pa) + pb + pa - mb < max(max(D-mb, 0), ma-pb) + pa + pb – ma
max(max(D-ma, 0), mb-pa) + pb + pa - mb < max(max(D-mb, 0), ma-pb) + pa + pb – ma
max(max(D-ma, 0), mb-pa) - mb < max(max(D-mb, 0), ma-pb) – ma
max(D-ma, 0, mb-pa) - mb < max(D-mb, 0, ma-pb) – ma
max(D, ma, ma+mb-pa) - mb - ma < max(D, mb, ma+mb-pb) – ma-mb
max(D, ma, ma+mb-pa) < max(D, mb, ma+mb-pb)
max(ma, ma+mb-pa) < max(mb, ma+mb-pb)
max(-mb, -pa) + ma + mb < max(-ma, -pb) + mb + ma
max(-mb, -pa) < max(-ma, -pb)
-min(mb, pa) < -min(ma, pb)
min(ma, pb) < min(mb, pa)
max(D-ma, 0, mb-pa) - mb < max(D-mb, 0, ma-pb) – ma
max(D, ma, ma+mb-pa) - mb - ma < max(D, mb, ma+mb-pb) – ma-mb
max(D, ma, ma+mb-pa) < max(D, mb, ma+mb-pb)
max(ma, ma+mb-pa) < max(mb, ma+mb-pb)
max(-mb, -pa) + ma + mb < max(-ma, -pb) + mb + ma
max(-mb, -pa) < max(-ma, -pb)
-min(mb, pa) < -min(ma, pb)
min(ma, pb) < min(mb, pa)
This is exactly what we have proposed.
From these two problems, we learn that “greedy” approaches could be useful in our daily lives. The “activity selection problem” is a very good example. Due to limited amount of time for most people, we often encounter situations of time clashes for various events. In order to attend as many events as possible, “greedy” approach could be used and this could optimize the utilization of time. For the “toy manufacturing problem”, it is particularly important because resources are scarce and so we need to make sure that the machines are working in the most efficient way in order to reduce the production cost.
Despite the fact that both “activity selection problem” and “toy manufacturing problem” oversimplify the real situation (since attending different event may have different satisfaction, and more machines and stages are involved in manufacture), such “greedy” approaches could serve as a simple, rough, yet efficient estimation.
References
- “Activity Selection Problem” http://en.wikipedia.org/wiki/Activity_selection_problem
- “Saving Endeavour” http://poj.org/problem?id=2751
No comments:
Post a Comment