My Experiences: Maximum Subarray

I am going to write here my explanation for the the maximum subarray problem which is the task of finding the contiguous subarray within a one-dimensional array of numbers (containing at least one positive number) which has the largest sum.

Here we can find this subarray using different algorithms, Here I shall talk about two of them,

1) Divide and Conquer :

Time Complexity: O(n log n)
Algorithmic Paradigm: Divide and Conquer

2) Kadane’s Algorithm :

Time Complexity: O(n)
Algorithmic Paradigm: Dynamic Programming

1) Divide and Conquer :

This is a method of designing algorithms where we divide the problem into a number of

sub-problems of smaller size, and we conquer these sub-problems by solving them recursively, ( we keep sending it back for further division until they are a base case of the recursion )

Finally we combine the solutions of these sub-problems to get the required solution of the

actual problem.

Now we have an array ( lets call it A ) in which there exists a sub-array ( lets call it S )

whose sum is the maximum when compared to other sub-array's,

our objective is to find S's sum.

This subarray could either have the middle element of A ( lets call it mid ) in it or not.

thus we have two cases

1) when mid exists inside S :

if this is the case then we take the array and look at the rest of the array from mid's

point of view.

here we have chosen A[3] to be the mid ( (7+0)/2 = 3.5 )
now to find S, we can break the problem further into 2 parts( the left half and the right half ), S will consist of 2
sub-arrays S1 and S2 where,
S1 is the maximum subarray in the left half from A[3] to A[0] and
S2 is the maximum subarray in the right half A[4] to A[7].
these two can be found by iteration, for S1 take A[3] alone then 3,2 then 3,2,1 keep doing this until we find the sum of all elements from mid till the first element (A[0]), all the while keeping track of the maximum sum so gained. so we will have the value of the max subarray in the left of mid ( including mid ) and we can repeat this with the remaining half of the array to find S2's sum.
In the following code I have implemented this leftsum is the S1's sum and rightsum is
S2's sum. we can thus return (leftsum+rightsum) as S's sum.
In the following code I have implemented this leftsum is the S1's sum and rightsum is
S2's sum. we can thus return (leftsum+rightsum) as S's sum.

# a is the array, start, mid and end are the respective array indices
def maxsubarray3(a,start,mid,end):
    # if want to find the actual subarray which gives us this maximum value for its sum
    ansstart=mid
    ansend=mid+1

    # we need to find both sums of the left maxsubarray and the right maxsubarray.
    # the sum of which will be the sum of the joint subarray (from i to j) the actual max subarray of a
    # where i and j are such that,
    
    # The subarray from i to mid_index is the maximum you can get
    # from all subarrays where i which is before mid_index

    # keeps track of all the the possible subarray sums
    tempsum=0
    # keeps track of the current top sum (for the left half)
    # initially keep it as a very negative number 
    leftsum=-1
    
    # lets take first half and find the maximum subarray,
    # start from mid and by decrementing ( therefore the -1 )
    # and go till the 1st element ( therefore till array indice start )
    # ( Note that the range(a,b,c) function  produces a list of a,a+c,a+2c,.. so on till b-1 )
    # ( we need to write start-1 to get till start )
    for i in range(mid,start-1,-1):
        tempsum=tempsum+a[i]
        if(tempsum>leftsum):
            # the new top sum
            leftsum=tempsum
            ansstart=i
            
    # just like how we got the leftsum, repeat for right sum.
    tempsum=0 
    rightsum=-1

    # we now check for all possible subarrays in the right half, ie
    # mid+1 till end ( Note that the range(a,b,c) function
    # produces a list of a,a+c,a+2c,.. so on till b-1 )
    for j in range(mid+1,end+1):
        tempsum=tempsum+a[j]
        if(tempsum>rightsum):
            # the new top sum
            rightsum=tempsum
            ansend=j

    return (rightsum+leftsum)

But we should keep in mind we have only covered the case when the max subarray we wish to find has the mid element in it, but then comes the other case.

2) when mid is not inside S :

In this case we can further divide this case by calling the same function recursively.

until it ends up as the case discussed above or
a trivial case when the array becomes a single element.
I have implemented this with the following code.

# Actual function to solve the max subarray problem
def maxsubarray(a,start,end):
    # Base case when recursion caused the array to finally become a single element
    # the element itself is the only sum, therefore the maximum
    if(start==end):
        return a[start]

    # find mid - an integer as its going to be used as a index
    mid=int((start+end)/2)

    # now we can find the maximum of the 3 cases
    # we divide the array a into 2 parts (0 to mid) and (mid+1 to end), if the "maxsubarray" is a subarray of these 2 halfs we can find so by calling recursively
    # else if the subarray we seek is that which crosses the middle element, then we call the maxsubarray3 function
    return max(maxsubarray(a,start,mid),maxsubarray(a,mid+1,end),maxsubarray3(a,start,mid,end))

So we can see how this algorithm works. I hope. :p

2) Kadane’s Algorithm :

So here we shall be solving the same problem using dynamic programming.

Now dynamic programming also involves dividing the problem into simper

subproblems.

But the difference is that in the divide and conquer method

these subproblems are independent while we have the subproblems chosen

for the dynamic programming solution depending on other subproblems.

So we store the solutions to subproblems and use them to save time

( memoization ) when solving other subproblems.

Lets not stray from the topic, lets consider the maximum sum subarray S, which

we want to find could start at some index i and end at some index j. of the given

array A.

The basic idea of this algorithm is that instead of finding S within the array A

we find the maximum subarray( Sj ) with j fixed at some index of A. This is the

subproblem that we are concerned with, we can compare all the solutions

to this subproblem, ( s0,s1,s2... ) to find the maximum in linear time.

these s0, s1, s2... are the sums of the subarrays S0,S1,S2.... which

are the maximum subarrays ending at the index 0, 1, 2 etc

Now if we check closely we will see how each of these subproblems are

related, we can say that if we take s(j-1) and add A[j] to it we get s[j]

because S[j] will include A[j] but the only question is what i should be

taken so that we have a maximum subarray, the possibilities are 0, 1, 2..

till j itself. ( if its j itself then the subarray is a single element )

All these possibilities have already been tried when we computed

the subproblem of finding max subarray at 0, 1, 2.. etc

Lets take an example, if we find s0 for an array A then it can either be A[0]

( as the only subarray ending at the 0 index is the element itself ).

or zero if the element is negative, as then it couldn't be the

subarray we are looking for, ( I am talking about solving for an

array - containing at least one positive number )

Now we keep this value stored for future reference (say in a variable temp)

and go on to the element at 1, now to find s1 all we need to do add

A[1] to temp.

See how this works? there are 2 possible subarrays ending at 1

the subarray containing only A[1] and the one containing both A[0] and

A[1], from the temp value we have the information on which subarray

is going to have the larger sum. So moving on we can see how

at every index j the temp value will have the largest sum possible for

any starting index before j, so we can find sj easily.

So once thats done all we need to do is compare these s1, s2, s3... and

find the largest.

Thats the answer folks!

Here's my python implementation of this code, its too simple...

Therefore AWESOME!!!! I was literally stunned when i saw this

code in wikipedia.

ans_sum=maxsum_upto_j=0
    for j in range(no):
        maxsum_upto_j=max(0,maxsum_upto_j+array[j])
        ans_sum=max(ans_sum,maxsum_upto_j)

    print "The sum is "+str(ans_sum)

My Experiences

Wednesday, December 3, 2014

Maximum Subarray

1) Divide and Conquer :

2) Kadane’s Algorithm :

1) Divide and Conquer :

2) Kadane’s Algorithm :

No comments:

Post a Comment

Blog Archive