I have duplicate client records within my data file. I need to determine the level of agreement between the records.

I would like to create a variable that identifies if there is agreement within the RecordID for a variable of interest (like Gender). All variables are numeric.

RecordID     Gender
11           1-male
22           1-male
22           1-male
33           2-female
44           1-male
44           1-male
44           2-female

The number of duplicates range from 2-17 records within a RecordID.

I would like the output files to look like this:

RecordID    Gender_agreement  Count
11          1-yes               1
22          1-yes               2
33          1-yes               1
44          0-no                3

asked 02 Dec '11, 23:08

KwonC's gravatar image

KwonC
13529
accept rate: 100%


Christy, You can use the magic of SQL!

/*
    %CheckDupes(dsn=, varname=, missingval= -1);

    dsn        = This is the name of the dataset to process
    varname    = This is the name of the variable to check for value agreement
    missingval = This is the default value to use for missing values in the results.  Default= -1;
*/

%macro CheckDupes(dsn=, varname=, missingval=-1);

/*-- Magic and Wizardry is below this line --*/
proc sql;
    create view matches_&varname
    as
    select RecordID, &varname, count(*) as cnt
    from &dsn
    group by RecordID, &varname;

    create table agreement_&varname
    as
    select 
        RecordID, 
        count(distinct case when missing(&varname) then &missingval else &varname end ) 
                label='Agreement of Variables - # Unique Values' as Agreement, 
        count(*) label='\\# of Records' as NumRecords, 
        min(case when missing(&varname) then &missingval else &varname end ) 
              label='minimum value observed' as minValue, 
        max(case when missing(&varname) then &missingval else &varname end ) 
              label='maxmimum value observed' as maxValue,
        case when calculated Agreement = 1 then 'MATCH' else 'NO_MATCH' end 
              label='Match Status' as MatchStat
    from matches_&varname
    group by RecordID;
quit;

title "Match Status for Variable: &varname";
proc freq data=agreement_&varname;
tables MatchStat /missing;
run;

%mend;

To run the duplicates check for a variable just execute the macro:

%CheckDupes(dsn=yourDataSet, varname=Gender);

The default value for Missing is -1 you can change that by adding it as a parameter;

%CheckDupes(dsn=yourDataSet, varname=Gender, missingval=9999999999);

If you're processing a character variable, you'll need to change the missing value as such:

%CheckDupes(dsn=yourDataSet, varname=Gender, missingval='Missing');
link

answered 05 Dec '11, 11:01

jay.l.stevens's gravatar image

jay.l.stevens ♦
1.0k624
accept rate: 36%

edited 05 Dec '11, 11:42

You can also do it using a standard DATA step and BY processing:

proc sort data=test;
    by RecordID Gender;
run;

data test_agreement (keep=RecordID GenderAgreement Count);
    set test;
    by RecordID;
    retain GenderAgreement;
    retain Count;
    retain _PastGender;
    _PastGender = lag(Gender);
    if first.RecordID then do;
        GenderAgreement = 1;        
        Count = 1;
    end;
    else do;
        Count = Count + 1;
        if _PastGender ne Gender then do;
            GenderAgreement = 0;
        end;        
    end;
    if last.RecordID then do;
        output;
    end;
run;
link

answered 14 Feb '12, 01:42

biyectivo's gravatar image

biyectivo
461
accept rate: 20%

proc sql;

create table output as

select RecordID,

case

when count(distinct Gender) = 1 then '1-yes'

when count(distinct Gender) > 1 then '0-no'

else ' ' end as Gender_agreement ,

count(*) as Count

from datasetname

group RecordID;

quit;

link

answered 16 Feb '12, 11:11

Joy's gravatar image

Joy
212
accept rate: 0%

edited 16 Feb '12, 11:16

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×1
×1

Asked: 02 Dec '11, 23:08

Seen: 1,234 times

Last updated: 28 Nov '12, 16:38

Related questions

powered by OSQA