# Greedy Algorithms
# Huffman Coding
Huffman code (opens new window) is a particular type of optimal prefix code that is commonly used for lossless data compression. It compresses data very effectively saving from 20% to 90% memory, depending on the characteristics of the data being compressed. We consider the data to be a sequence of characters. Huffman's greedy algorithm uses a table giving how often each character occurs (i.e., its frequency) to build up an optimal way of representing each character as a binary string. Huffman code was proposed by David A. Huffman (opens new window) in 1951.
Suppose we have a 100,000character data file that we wish to store compactly. We assume that there are only 6 different characters in that file. The frequency of the characters are given by:
++++++++
 Character  a  b  c  d  e  f 
++++++++
Frequency (in thousands) 45  13  12  16  9  5 
++++++++
We have many options for how to represent such a file of information. Here, we consider the problem of designing a Binary Character Code in which each character is represented by a unique binary string, which we call a codeword.
The constructed tree will provide us with:
++++++++
 Character  a  b  c  d  e  f 
++++++++
 Fixedlength Codeword  000  001  010  011  100  101 
++++++++
Variablelength Codeword 0  101  100  111  1101 1100
++++++++
If we use a fixedlength code, we need three bits to represent 6 characters. This method requires 300,000 bits to code the entire file. Now the question is, can we do better?
A variablelength code can do considerably better than a fixedlength code, by giving frequent characters short codewords and infrequent characters long codewords. This code requires: (45 X 1 + 13 X 3 + 12 X 3 + 16 X 3 + 9 X 4 + 5 X 4) X 1000 = 224000 bits to represent the file, which saves approximately 25% of memory.
One thing to remember, we consider here only codes in which no codeword is also a prefix of some other codeword. These are called prefix codes. For variablelength coding, we code the 3character file abc as 0.101.100 = 0101100, where "." denotes the concatenation.
Prefix codes are desirable because they simplify decoding. Since no codeword is a prefix of any other, the codeword that begins an encoded file is unambiguous. We can simply identify the initial codeword, translate it back to the original character, and repeat the decoding process on the remainder of the encoded file. For example, 001011101 parses uniquely as 0.0.101.1101, which decodes to aabe. In short, all the combinations of binary representations are unique. Say for example, if one letter is denoted by 110, no other letter will be denoted by 1101 or 1100. This is because you might face confusion on whether to select 110 or to continue on concatenating the next bit and select that one.
Compression Technique:
The technique works by creating a binary tree of nodes. These can stored in a regular array, the size of which depends on the number of symbols, n. A node can either be a leaf node or an internal node. Initially all nodes are leaf nodes, which contain the symbol itself, its frequency and optionally, a link to its child nodes. As a convention, bit '0' represents left child and bit '1' represents right child. Priority queue is used to store the nodes, which provides the node with lowest frequency when popped. The process is described below:
 Create a leaf node for each symbol and add it to the priority queue.
 While there is more than one node in the queue:
 Remove the two nodes of highest priority from the queue.
 Create a new internal node with these two nodes as children and with frequency equal to the sum of the two nodes' frequency.
 Add the new node to the queue.
For our example: (opens new window)
The pseudocode looks like:
Procedure Huffman(C): // C is the set of n characters and related information
n = C.size
Q = priority_queue()
for i = 1 to n
n = node(C[i])
Q.push(n)
end for
while Q.size() is not equal to 1
Z = new node()
Z.left = x = Q.pop
Z.right = y = Q.pop
Z.frequency = x.frequency + y.frequency
Q.push(Z)
end while
Return Q
Although lineartime given sorted input, in general cases of arbitrary input, using this algorithm requires presorting. Thus, since sorting takes O(nlogn) time in general cases, both methods have same complexity.
Since n here is the number of symbols in the alphabet, which is typically very small number (compared to the length of the message to be encoded), time complexity is not very important in the choice of this algorithm.
Decompression Technique:
The process of decompression is simply a matter of translating the stream of prefix codes to individual byte value, usually by traversing the Huffman tree node by node as each bit is read from the input stream. Reaching a leaf node necessarily terminates the search for that particular byte value. The leaf value represents the desired character. Usually the Huffman Tree is constructed using statistically adjusted data on each compression cycle, thus the reconstruction is fairly simple. Otherwise, the information to reconstruct the tree must be sent separately. The pseudocode:
Procedure HuffmanDecompression(root, S): // root represents the root of Huffman Tree
n := S.length // S refers to bitstream to be decompressed
for i := 1 to n
current = root
while current.left != NULL and current.right != NULL
if S[i] is equal to '0'
current := current.left
else
current := current.right
endif
i := i+1
endwhile
print current.symbol
endfor
Greedy Explanation:
Huffman coding looks at the occurrence of each character and stores it as a binary string in an optimal way. The idea is to assign variablelength codes to input input characters, length of the assigned codes are based on the frequencies of corresponding characters. We create a binary tree and operate on it in bottomup manner so that the least two frequent characters are as far as possible from the root. In this way, the most frequent character gets the smallest code and the least frequent character gets the largest code.
References:
 Introduction to Algorithms  Charles E. Leiserson, Clifford Stein, Ronald Rivest, and Thomas H. Cormen
 Huffman Coding (opens new window)  Wikipedia
 Discrete Mathematics and Its Applications  Kenneth H. Rosen
# Activity Selection Problem
# The Problem
You have a set of things to do (activities). Each activity has a start time and a end time. You aren't allowed to perform more than one activity at a time. Your task is to find a way to perform the maximum number of activities.
For example, suppose you have a selection of classes to choose from.
Activity No.  start time  end time 

1  10.20 A.M  11.00AM 
2  10.30 A.M  11.30AM 
3  11.00 A.M  12.00AM 
4  10.00 A.M  11.30AM 
5  9.00 A.M  11.00AM 
Remember, you can't take two classes at the same time. That means you can't take class 1 and 2 because they share a common time 10.30 A.M to 11.00 A.M. However, you can take class 1 and 3 because they don't share a common time. So your task is to take maximum number of classes as possible without any overlap. How can you do that?
# Analysis
Lets think for the solution by greedy approach.First of all we randomly chose some approach and check that will work or not.
 sort the activity by start time that means which activity start first we will take them first. then take first to last from sorted list and check it will intersect from previous taken activity or not. If the current activity is not intersect with the previously taken activity, we will perform the activity otherwise we will not perform. this approach will work for some cases like
Activity No.  start time  end time 

1  11.00 A.M  1.30P.M 
2  11.30 A.M  12.00P.M 
3  1.30 P.M  2.00P.M 
4  10.00 A.M  11.00AM 
the sorting order will be 4>1>2>3 .The activity 4> 1> 3 will be performed and the activity 2 will be skipped. the maximum 3 activity will be performed. It works for this type of cases. but it will fail for some cases. Lets apply this approach for the case
Activity No.  start time  end time 

1  11.00 A.M  1.30P.M 
2  11.30 A.M  12.00P.M 
3  1.30 P.M  2.00P.M 
4  10.00 A.M  3.00P.M 
The sort order will be 4>1>2>3 and only activity 4 will be performed but the answer can be activity 1>3 or 2>3 will be performed. So our approach will not work for the above case. Let's try another approach
 Sort the activity by time duration that means perform the shortest activity first. that can solve the previous problem . Although the problem is not completely solved. There still some cases that can fail the solution. apply this approach on the case bellow.
Activity No.  start time  end time 

1  6.00 A.M  11.40A.M 
2  11.30 A.M  12.00P.M 
3  11.40 P.M  2.00P.M 
if we sort the activity by time duration the sort order will be 2> 3 >1 . and if we perform activity No. 2 first then no other activity can be performed. But the answer will be perform activity 1 then perform 3 . So we can perform maximum 2 activity.So this can not be a solution of this problem. We should try a different approach.
# The solution
 Sort the Activity by ending time that means the activity finishes first that come first. the algorithm is given below

 Sort the activities by its ending times.
 If the activity to be performed do not share a common time with the activities that previously performed, perform the activity.
Lets analyse the first example
Activity No.  start time  end time 

1  10.20 A.M  11.00AM 
2  10.30 A.M  11.30AM 
3  11.00 A.M  12.00AM 
4  10.00 A.M  11.30AM 
5  9.00 A.M  11.00AM 
sort the activity by its ending times , So sort order will be 1>5>2>4>3.. the answer is 1>3 these two activities will be performed. ans that's the answer. here is the sudo code.

 sort: activities
 perform first activity from the sorted list of activities.
 Set : Current_activity := first activity
 set: end_time := end_time of Current activity
 go to next activity if exist, if not exist terminate .
 if start_time of current activity <= end_time : perform the activity and go to 4
 else: got to 5.
see here for coding help http://www.geeksforgeeks.org/greedyalgorithmsset1activityselectionproblem/ (opens new window)
# Changemaking problem
Given a money system, is it possible to give an amount of coins and how to find a minimal set of coins corresponding to this amount.
Canonical money systems. For some money system, like the ones we use in the real life, the "intuitive" solution works perfectly. For example, if the different euro coins and bills (excluding cents) are 1€, 2€, 5€, 10€, giving the highest coin or bill until we reach the amount and repeating this procedure will lead to the minimal set of coins.
We can do that recursively with OCaml :
(* assuming the money system is sorted in decreasing order *)
let change_make money_system amount =
let rec loop given amount =
if amount = 0 then given
else
(* we find the first value smaller or equal to the remaining amount *)
let coin = List.find ((>=) amount) money_system in
loop (coin::given) (amount  coin)
in loop [] amount
These systems are made so that changemaking is easy. The problem gets harder when it comes to arbitrary money system.
General case. How to give 99€ with coins of 10€, 7€ and 5€? Here, giving coins of 10€ until we are left with 9€ leads obviously to no solution. Worse than that a solution may not exist. This problem is in fact nphard, but acceptable solutions mixing greediness and memoization exist. The idea is to explore all the possibilies and pick the one with the minimal number of coins.
To give an amount X > 0, we choose a piece P in the money system, and then solve the subproblem corresponding to XP. We try this for all the pieces of the system. The solution, if it exists, is then the smallest path that led to 0.
Here an OCaml recursive function corresponding to this method. It returns None, if no solution exists.
(* option utilities *)
let optmin x y =
match x,y with
 None,a  a,None > a
 Some x, Some y> Some (min x y)
let optsucc = function
 Some x > Some (x+1)
 None > None
(* Changemaking problem*)
let change_make money_system amount =
let rec loop n =
let onepiece acc piece =
match n  piece with
 0 > (*problem solved with one coin*)
Some 1
 x > if x < 0 then
(*we don't reach 0, we discard this solution*)
None
else
(*we search the smallest path different to None with the remaining pieces*)
optmin (optsucc (loop x)) acc
in
(*we call onepiece forall the pieces*)
List.fold_left onepiece None money_system
in loop amount
Note: We can remark that this procedure may compute several times the change set for the same value. In practice, using memoization to avoid these repetitions leads to faster (way faster) results.
# Continuous knapsack problem
Given items as (value, weight)
we need to place them in a knapsack (container) of a capacity k
. Note! We can break items to maximize value!
Example input:
values[] = [1, 4, 5, 2, 10]
weights[] = [3, 2, 1, 2, 4]
k = 8
Expected output:
maximumValueOfItemsInK = 20;
Algorithm:
1) Sort values and weights by value/weight.
values[] = [5, 10, 4, 2, 1]
weights[] = [1, 4, 2, 2, 3]
2) currentWeight = 0; currentValue = 0;
3) FOR i = 0; currentWeight < k && i < values.length; i++ DO:
IF k  currentWeight < weights[i] DO
currentValue = currentValue + values[i];
currentWeight = currentWeight + weights[i];
ELSE
currentValue = currentValue + values[i]*(k  currentWeight)/weights[i]
currentWeight = currentWeight + weights[i]*(k  currentWeight)/weights[i]
END_IF
END_FOR
PRINT "maximumValueOfItemsInK = " + currentValue;
# Remarks
A greedy algorithm is an algorithm in which in each step we choose the most beneficial option in every step without looking into the future. The choice depends only on current profit.
Greedy approach is usually a good approach when each profit can be picked up in every step, so no choice blocks another one.