Algorithm to remove duplicates from a sorted array

int unique(int *array, int number)   //sorted array
{
int k = 0;
for (int i = 1; i < n; i++) {
if (a[k] != a[i]) {              //comparing the first 2 unequal numbers
a[k+1] = a[i];                 //shifting to the left if distinct
k++;
}
}
return (k+1);             //return the last index of new array i.e. the number of distinct elements
}

Advertisements

7 thoughts on “Algorithm to remove duplicates from a sorted array

  1. Your algorithm looks exactly ike C++ std::unique:

    http://www.cplusplus.com/reference/algorithm/unique/

    One comment: if all elts are already unique and sorted, then this loop unnecessarily copies elts from itself to itself. For an array of ints, performance wouldn’t matter much, but suppose it was an array of structures, then it might run a bit slower. In C++, the object copy is done via a copy constructor.

    -dennis bednar

  2. Here’s a PERL version:

    Hope the indentation is preserved, .. here goes …

    # perl version of C++ std::unique, in which we start with a sorted array of numbers.
    # NOTE: This perl version is like std::unique:
    #
    # Removes all but the first element from every consecutive group of equivalent elements in the range [first,last).
    #
    # HENCE, given input of:
    # {10, 20, 20, 30, 30, 30, 40, 40, 40, 40}
    # the output is
    # {10, 20, 30, 40, ?, ?, ?, ?, ?, ?}
    # and dupIndex returned (0-based) is 4.
    #
    # NOTE: in C++ std::unique works fine on a std::vector, but not for std::vector.
    # With vector of pointer, we run into problems with memory leaks or duplicte deletes of same pointer.
    # In order to fix the memory leak problem in C++, you need to preserve dupliacates at end.
    #
    #====================================================

    use strict;
    use warnings;

    # flag used to control behavior of unique() function, namely,
    # the elts at [$dupInx, $last) are the actual duplicates, and
    # prevents memry leaks and duplicate deletes.

    my $PRESERVE_DUPLICATES_AT_END = 0;

    sub main()
    {
    test1();
    $PRESERVE_DUPLICATES_AT_END = 1;
    test1();
    }

    sub test1()
    {
    print “=== TEST1 BEG ===\n”;
    print “TEST1: PRESERVE_DUPLICATES_AT_END=$PRESERVE_DUPLICATES_AT_END \n”;
    my @nums1 = (1, 2, 3, 4, 5, 6, 7);
    testUnique(@nums1);

    my @nums2 = (10, 20, 20, 30, 30, 30, 40, 40, 40, 40);
    testUnique(@nums2);
    print “=== TEST1 END ===\n”;
    }

    # sort an array of integers in an array
    sub testUnique(@)
    {
    my (@nums) = @_;

    printArray( “BEFORE: “, @nums );
    my $dupInx = unique( \@nums );
    printArray( “AFTER: “, @nums );
    print “INFO: dupInx=$dupInx\n”;
    print “\n”;
    }

    # Unfortunately, with perl, we must pass the array by reference in order to be able to change it.
    # given a sorted array of numbers, move all uniques to the right,
    # and return the index of the first duplicate.
    # All elts 0..$dupInx-1 are unique.
    # If all elts are unique, then returns $LEN (length of array), meaning
    # index past the end of the array.

    sub unique($)
    {
    # we must pass by reference, otherwise, upon return, caller
    # won’t see the change to the passed array

    # get array reference
    my ($aref) = @_;

    # current item that is unique (the first one at the left)
    my $k = 0;

    # length of array
    my $LEN = @{$aref};

    # find the first mismatch to right, skipping over same elts
    # $inx moves from 2nd elt to last elt at end

    for (my $inx = 1; $inx [$k] != $aref->[$inx])
    {

    ### Copy down next new elt to right of existing unique elt at index [$k]
    ### $aref->[++ $k ] = $aref->[ $inx ];
    ### next;

    # NOTE: this code may look KLUNKY, but I wanted to show the
    # nuances of the behavior.

    ++ $k;
    if ($k == $inx)
    {
    print “DEBUG: no need to SELF COPY from $inx to $k\n”;
    }
    else
    {
    if ($PRESERVE_DUPLICATES_AT_END)
    {
    # keep all duplicates at end properly by swapping aref->[$k] and aref->[$inx]
    # cirular shift 3 elts: k->temp, inx->k, temp->inx

    # swap $array[k] and $array[$inx], but pass it via ref, so that swap can change it.
    swap($aref, $k, $inx );
    }
    else
    {
    # duplicates at end are ? undefined
    $aref->[$k] = $aref->[$inx];
    }
    }
    }
    }

    printArray( “DEBUG: unique: “, @{$aref} );

    # return index of first duplicate
    # if all are unqueue
    return $k+1;
    }

    # @param aref is array reference
    # @param $k is index of 1st item to swap
    # @param $i is index of 2nd item to swap

    # swap $array[$k] and $array[$i], where $array is pointed to by $aref arg1
    # Array references *ARE REQUIRED in perl, so that array modifed by function
    # can be seen by caller, upon return.

    sub swap($$$)
    {
    my ($aref, $k, $i) = @_;
    my $temp = undef;

    $temp = $aref->[ $k ];
    $aref->[ $k ] = $aref->[ $i ];
    $aref->[ $i ] = $temp;
    }

    # prints “$msg 1,2,3,4,…,N” array, with commas between numbers.
    # if only one number, then no comma.

    sub printArray(@)
    {
    my ($msg, @array) = @_;

    print “$msg”;

    # length of array
    my $LEN = @array;

    for (my $inx = 0; $inx 0)
    {
    print “,”;
    }
    print “$array[$inx]”;
    }
    print “\n”;
    }

    main();

  3. And here’ s the output from the above program:

    === TEST1 BEG ===
    TEST1: PRESERVE_DUPLICATES_AT_END=0
    BEFORE: 1,2,3,4,5,6,7
    DEBUG: no need to SELF COPY from 1 to 1
    DEBUG: no need to SELF COPY from 2 to 2
    DEBUG: no need to SELF COPY from 3 to 3
    DEBUG: no need to SELF COPY from 4 to 4
    DEBUG: no need to SELF COPY from 5 to 5
    DEBUG: no need to SELF COPY from 6 to 6
    DEBUG: unique: 1,2,3,4,5,6,7
    AFTER: 1,2,3,4,5,6,7
    INFO: dupInx=7

    BEFORE: 10,20,20,30,30,30,40,40,40,40
    DEBUG: no need to SELF COPY from 1 to 1
    DEBUG: unique: 10,20,30,40,30,30,40,40,40,40
    AFTER: 10,20,30,40,30,30,40,40,40,40
    INFO: dupInx=4

    === TEST1 END ===
    === TEST1 BEG ===
    TEST1: PRESERVE_DUPLICATES_AT_END=1
    BEFORE: 1,2,3,4,5,6,7
    DEBUG: no need to SELF COPY from 1 to 1
    DEBUG: no need to SELF COPY from 2 to 2
    DEBUG: no need to SELF COPY from 3 to 3
    DEBUG: no need to SELF COPY from 4 to 4
    DEBUG: no need to SELF COPY from 5 to 5
    DEBUG: no need to SELF COPY from 6 to 6
    DEBUG: unique: 1,2,3,4,5,6,7
    AFTER: 1,2,3,4,5,6,7
    INFO: dupInx=7

    BEFORE: 10,20,20,30,30,30,40,40,40,40
    DEBUG: no need to SELF COPY from 1 to 1
    DEBUG: unique: 10,20,30,40,30,30,20,40,40,40
    AFTER: 10,20,30,40,30,30,20,40,40,40
    INFO: dupInx=4

    === TEST1 END ===

    Notice that the harder test case has one 10, two 20’s, 3 30’s, and 4 40’s.

    Now when the PRESERVE flag is 1, we maintain that relationship in the array returne from unique().

    However when the PRESERVE flag is 0 (OFF, false0, then we only loose one of the 20’s, and we gain an extrra 40. In an integer array this doesn’t matter. Or in C++, in a std::vector of Objects, it won’t matter.

    BUT in C+, when you have a std::vector of pointers to objects, this will cause two kinds of memory problems. FIrst, leaks due to failure to delete the 20. And second, and duplicate deletes of same 40 pointer, possibly leading to SEGMENTATION FAULTS on a LInux machine.

  4. THis SITE messed up the printArray() method that I pasted into the window ABOVE. SORRY about that. The logic for printArray is straightforward. THere is test is if $inx > 0 then insert a comma. THen print the element in the array..

  5. nice algorithm but just one comment: your coment on the return line is not completely correct: what you return is the number of distinct elements, that is ok, but is corresponds to the last index of the new array+1 (because the last index of the new array is actually k).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s