Wednesday, December 3, 2014

Maximum Subarray



I am going to write here my explanation for the the maximum subarray problem which is the task of finding the contiguous subarray within a one-dimensional array of numbers (containing at least one positive number) which has the largest sum.

Here we can find this subarray using different algorithms, Here I shall talk about two of them,

1)  Divide and Conquer :


Time Complexity: O(n log n)
Algorithmic Paradigm: Divide and Conquer

2)  Kadane’s Algorithm : 


Time Complexity: O(n)
Algorithmic Paradigm: Dynamic Programming



1)  Divide and Conquer :


This is a method of designing algorithms where we divide the problem into a number of
sub-problems of smaller size, and we conquer these sub-problems by solving them recursively, ( we keep sending it back for further division until they are a base case of the recursion )
Finally we combine the solutions of these sub-problems to get the required solution of the 
actual problem.

Now we have an array ( lets call it A ) in which there exists a sub-array ( lets call it S )
whose sum is the maximum when compared to other sub-array's,
our objective is to find S's sum.
This subarray could either have the middle element of A ( lets call it mid ) in it or not.
thus we have two cases 

1)  when mid exists inside S :

if this is the case then we take the array and look at the rest of the array from mid's 
point of view. 












here we have chosen A[3] to be the mid ( (7+0)/2 = 3.5 ) 
now to find S, we can break the problem further into 2 parts( the left half and the right half ), S will consist of 2 
sub-arrays S1 and S2 where,
S1 is the maximum subarray in the left half from A[3] to A[0] and 
S2 is the maximum subarray in the right half A[4] to A[7].
these two can be found by iteration, for S1 take A[3] alone then 3,2 then 3,2,1 keep doing this until we find the sum of all elements from mid till the first element (A[0]), all the while keeping track of the maximum sum so gained. so we will have the value of the max subarray in the left of mid ( including mid ) and we can repeat this with the remaining half of the array to find S2's sum.
In the following code I have implemented this leftsum is the S1's sum and rightsum is 
S2's sum. we can thus return (leftsum+rightsum) as S's sum.
In the following code I have implemented this leftsum is the S1's sum and rightsum is 
S2's sum. we can thus return (leftsum+rightsum) as S's sum.


# a is the array, start, mid and end are the respective array indices
def maxsubarray3(a,start,mid,end):
    # if want to find the actual subarray which gives us this maximum value for its sum
    ansstart=mid
    ansend=mid+1

    # we need to find both sums of the left maxsubarray and the right maxsubarray.
    # the sum of which will be the sum of the joint subarray (from i to j) the actual max subarray of a
    # where i and j are such that,
    
    # The subarray from i to mid_index is the maximum you can get
    # from all subarrays where i which is before mid_index

    # keeps track of all the the possible subarray sums
    tempsum=0
    # keeps track of the current top sum (for the left half)
    # initially keep it as a very negative number 
    leftsum=-1
    
    # lets take first half and find the maximum subarray,
    # start from mid and by decrementing ( therefore the -1 )
    # and go till the 1st element ( therefore till array indice start )
    # ( Note that the range(a,b,c) function  produces a list of a,a+c,a+2c,.. so on till b-1 )
    # ( we need to write start-1 to get till start )
    for i in range(mid,start-1,-1):
        tempsum=tempsum+a[i]
        if(tempsum>leftsum):
            # the new top sum
            leftsum=tempsum
            ansstart=i
            
    # just like how we got the leftsum, repeat for right sum.
    tempsum=0 
    rightsum=-1

    # we now check for all possible subarrays in the right half, ie
    # mid+1 till end ( Note that the range(a,b,c) function
    # produces a list of a,a+c,a+2c,.. so on till b-1 )
    for j in range(mid+1,end+1):
        tempsum=tempsum+a[j]
        if(tempsum>rightsum):
            # the new top sum
            rightsum=tempsum
            ansend=j

    return (rightsum+leftsum)
But we should keep in mind we have only covered the case when the max subarray we wish to find has the mid element in it, but then comes the other case.


2)  when mid is not inside S :

In this case we can further divide this case by calling the same function recursively.
until it ends up as the case discussed above or
a trivial case when the array becomes a single element.
I have implemented this with the following code.

# Actual function to solve the max subarray problem
def maxsubarray(a,start,end):
    # Base case when recursion caused the array to finally become a single element
    # the element itself is the only sum, therefore the maximum
    if(start==end):
        return a[start]

    # find mid - an integer as its going to be used as a index
    mid=int((start+end)/2)

    # now we can find the maximum of the 3 cases
    # we divide the array a into 2 parts (0 to mid) and (mid+1 to end), if the "maxsubarray" is a subarray of these 2 halfs we can find so by calling recursively
    # else if the subarray we seek is that which crosses the middle element, then we call the maxsubarray3 function
    return max(maxsubarray(a,start,mid),maxsubarray(a,mid+1,end),maxsubarray3(a,start,mid,end))
So we can see how this algorithm works. I hope. :p




2)  Kadane’s Algorithm : 


So here we shall be solving the same problem using dynamic programming.
Now dynamic programming also involves dividing the problem into simper
subproblems. 
But the difference is that in the divide and conquer method 
these subproblems are independent while we have the subproblems chosen 
for the dynamic programming solution depending on other subproblems.
So we store the solutions to subproblems and use them to save time 
memoization )  when solving other subproblems.

Lets not stray from the topic, lets consider the maximum sum subarray S, which 
we want to find could start at some index i and end at some index j. of the given 
array A.
The basic idea of this algorithm is that instead of finding S within the array A
we find the maximum subarray( Sj ) with j fixed at some index of A. This is the 
subproblem that we are concerned with, we can compare all the solutions 
to this subproblem, ( s0,s1,s2... ) to find the maximum in linear time.
these s0, s1, s2... are the sums of the subarrays S0,S1,S2.... which 
are the maximum subarrays ending at the index 0, 1, 2 etc
Now if we check closely we will see how each of these subproblems are 
related, we can say that if we take s(j-1) and add A[j] to it we get s[j]
because S[j] will include A[j] but the only question is what i should be 
taken so that we have a maximum subarray, the possibilities are 0, 1, 2.. 
till j itself. ( if its j itself then the subarray is a single element )
All these possibilities have already been tried when we computed 
the subproblem of finding max subarray at 0, 1, 2.. etc

Lets take an example, if we find s0 for an array A then it can either be A[0] 
( as the only subarray ending at the 0 index is the element itself ). 
or zero if the element is negative, as then it couldn't be the 
subarray we are looking for, ( I am talking about solving for an
array - containing at least one positive number )
Now we keep this value stored for future reference (say in a variable temp)
and go on to the element at 1, now to find s1 all we need to do add
A[1] to temp. 
See how this works? there are 2 possible subarrays ending at 1
the subarray containing only A[1] and the one containing both A[0] and 
A[1], from the temp value we have the information on which subarray
 is going to have the larger sum. So moving on we can see how 
at every index j the temp value will have the largest sum possible for 
any starting index before j, so we can find sj easily.
So once thats done all we need to do is compare these s1, s2, s3... and 
find the largest.
Thats the answer folks!

Here's my python implementation of this code, its too simple...
Therefore AWESOME!!!! I was literally stunned when i saw this 
code in wikipedia



ans_sum=maxsum_upto_j=0
    for j in range(no):
        maxsum_upto_j=max(0,maxsum_upto_j+array[j])
        ans_sum=max(ans_sum,maxsum_upto_j)

    print "The sum is "+str(ans_sum)

No comments:

Post a Comment