Today I was confronted with the following situation: a table with two columns that contained IDs. The content was more or less something like this:
ID ALIAS 1 2 2 1 1 3 2 2
I had to eliminate the records that pointed to themselves (i.e. ID = ALIAS), because it’s a bit silly to say that you’re an alias of yourself.
I also had to eliminate the records that were essentially duplicates. In this example, the record “2, 1″ is functionally the same record as “1, 2″ and thus I didn’t need it.
Could this be solved using SQL only? I suspected it, but couldn’t find a solution fast enough. I called in the help of my colleagues and within minutes Martin Schaap helped me out:
select id, alias from t where id <> alias -- that was the easy part and not exists ( -- here comes the hard part select 1 from t where id = t.alias and alias = t.id and rowid > t.rowid );
Thanks Martin!
UPDATE: one day later, Frank Dorst (yes, that’s right, our “I once was a developer” CTO) also sent me an elegant, working solution:
select distinct least(id, alias), greatest(id, alias) from t where id <> alias;
Kudos to you too, Frank (you still got it
)!


Whitehorses is specialized in succesfully implementing Oracle SOA solutions: BPEL, OSB, WebLogic & BPM
{ 3 comments… read them below or add one }
As soon as I saw “eliminating duplicates” I thought of my favoured analytics solution – but it’s bugging me that I can’t (yet) figure out how to use it to solve your particular problem.
This is what I typically use if I have duplicate values in the ID field, disregarding any other fields:
create table t(id number, doesnt_matter number);
insert into t values (1,2);
insert into t values (2,1);
insert into t values (1,3);
insert into t values (3,2);
select * from
(select t.*, row_number() over (partition by idorder by null) rn from t)
where rn= 1;
Hi Scott,
Thanks for sharing your solution to eliminate duplicate ID values. It is very efficient to use analytics to recognize/eliminate duplicate keys, but unfortunately it doesn’t work in this case (as you already noticed). Each ID/ALIAS pair has to be transformed first to determine if it really is a duplicate.
But… your way of thinking optimizes Frank’s solution a bit further as it eliminates the need for the rather heavy DISTINCT:
select *
from (
select t.*
, row_number() over (partition by least(id, alias), greatest(id, alias) order by null) rn
from t
where id <> alias
)
where rn = 1;
However, Martin Schaap’s solution is still the most efficient, as it only involves index scanning and no extra function executions or whatsoever.
And that’s the best c-level to have – one who knows what it takes to write code!
onya Frank