Wednesday, March 7, 2012

query consecutive valid rows?

I have a question for the T-SQL gurus out there.
I need to find all Names where there are 10 consecutive "Yes"
Values. The table has upwards of 30 million records.
Does anyone have an idea how to structure this query?
CREATE TABLE (
Name varchar(50)
,Value char(1)
,dDate datetime)
Sample Data
Name Value dDate
David, Yes, 1/1/2004
David, Yes, 1/2/2004
David, No, 1/3/2004
David, Yes, 1/4/2004
David, Yes, 1/5/2004
David, Yes, 1/6/2004
Chris, Yes, 1/6/2004
David, Yes, 1/7/2004Can you be more precise? By consecutive, do you mean "on consecutive days",
or "on any days, but uninterrupted in time by a NO"?
Perhaps you can post a few more of your 30,000,000 rows, so we have
sample data for which there are some results? If possible, post
sample data as INSERT .. VALUES statements that match the
CREATE TABLE statement. Then we can paste it into Query
Analyzer and use it without correcting your errors. 'Yes' and 'No' do
not fit into a char(1) column, for example.
Finally, are the dates constrained to be time-less, and is (Name,dDate)
a key (or guaranteed to be unique)? And does 1/4/2004 mean January
4, 2004, or does it mean April 1, 2004?
Steve Kass
Drew University
Dave wrote:

>I have a question for the T-SQL gurus out there.
>I need to find all Names where there are 10 consecutive "Yes"
>Values. The table has upwards of 30 million records.
>Does anyone have an idea how to structure this query?
>
>CREATE TABLE (
> Name varchar(50)
> ,Value char(1)
> ,dDate datetime)
>Sample Data
>Name Value dDate
>David, Yes, 1/1/2004
>David, Yes, 1/2/2004
>David, No, 1/3/2004
>David, Yes, 1/4/2004
>David, Yes, 1/5/2004
>David, Yes, 1/6/2004
>Chris, Yes, 1/6/2004
>David, Yes, 1/7/2004
>
>|||Sorry for being unclear.
I mean 10 (lets just say 3 for simplicity) Yes values in a row based of
dDate (datetime). cName,nValue ,dDate form the primary key... for this
example) should be unique.
I hope that answers your questions, if not please restate and I will do
my best to be clear.
Thanks for your interest!
As you can see from
select *
from myTest
order by cName,dDate
David,Peter have 3 consecutive Yes values while Chris and Chad do not.
CREATE TABLE myTest(
cName varchar(50)
,nValue char(3)
,dDate datetime )
INSERT INTO myTest(cName,nValue ,dDate)
SELECT 'David','Yes','2005-08-02 16:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-02 16:26:31.077'
UNION
SELECT 'Chris','Yes','2005-08-02 16:26:31.077'
UNION
SELECT 'Chad','Yes','2005-08-03 16:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-03 16:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-03 17:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-05 16:26:31.077'
UNION
SELECT 'David','Yes','2005-08-05 16:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-05 18:26:31.077'
UNION
SELECT 'Chad','Yes','2005-08-07 16:26:31.077'
UNION
SELECT 'David','Yes','2005-08-07 16:26:31.077'
UNION
SELECT 'Chad','No','2005-08-08 16:26:31.077'
UNION
SELECT 'Chris','No','2005-08-08 16:26:31.077'
UNION
SELECT 'David','Yes','2005-08-09 16:26:31.077'
UNION
SELECT 'David','No','2005-08-10 16:26:31.077'
UNION
SELECT 'David','Yes','2005-08-11 16:26:31.077'
UNION
SELECT 'Chad','No','2005-08-13 16:26:31.077'
UNION
SELECT 'Chris','Yes','2005-08-14 16:26:31.077'|||Sorry for being unclear.
I mean 10 (lets just say 3 for simplicity) Yes values in a row based of
dDate (datetime). cName,nValue ,dDate form the primary key... for this
example) should be unique.
I hope that answers your questions, if not please restate and I will do
my best to be clear.
Thanks for your interest!
As you can see from
select *
from myTest
order by cName,dDate
David,Peter have 3 consecutive Yes values while Chris and Chad do not.
CREATE TABLE myTest(
cName varchar(50)
,nValue char(3)
,dDate datetime )
INSERT INTO myTest(cName,nValue ,dDate)
SELECT 'David','Yes','2005-08-02 16:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-02 16:26:31.077'
UNION
SELECT 'Chris','Yes','2005-08-02 16:26:31.077'
UNION
SELECT 'Chad','Yes','2005-08-03 16:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-03 16:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-03 17:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-05 16:26:31.077'
UNION
SELECT 'David','Yes','2005-08-05 16:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-05 18:26:31.077'
UNION
SELECT 'Chad','Yes','2005-08-07 16:26:31.077'
UNION
SELECT 'David','Yes','2005-08-07 16:26:31.077'
UNION
SELECT 'Chad','No','2005-08-08 16:26:31.077'
UNION
SELECT 'Chris','No','2005-08-08 16:26:31.077'
UNION
SELECT 'David','Yes','2005-08-09 16:26:31.077'
UNION
SELECT 'David','No','2005-08-10 16:26:31.077'
UNION
SELECT 'David','Yes','2005-08-11 16:26:31.077'
UNION
SELECT 'Chad','No','2005-08-13 16:26:31.077'
UNION
SELECT 'Chris','Yes','2005-08-14 16:26:31.077'|||Sorry for being unclear.
I mean 10 (lets just say 3 for simplicity) Yes values in a row based of
dDate (datetime). cName,nValue ,dDate form the primary key... for this
example) should be unique.
I hope that answers your questions, if not please restate and I will do
my best to be clear.
Thanks for your interest!
As you can see from
select *
from myTest
order by cName,dDate
David,Peter have 3 consecutive Yes values while Chris and Chad do not.
CREATE TABLE myTest(
cName varchar(50)
,nValue char(3)
,dDate datetime )
INSERT INTO myTest(cName,nValue ,dDate)
SELECT 'David','Yes','2005-08-02 16:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-02 16:26:31.077'
UNION
SELECT 'Chris','Yes','2005-08-02 16:26:31.077'
UNION
SELECT 'Chad','Yes','2005-08-03 16:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-03 16:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-03 17:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-05 16:26:31.077'
UNION
SELECT 'David','Yes','2005-08-05 16:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-05 18:26:31.077'
UNION
SELECT 'Chad','Yes','2005-08-07 16:26:31.077'
UNION
SELECT 'David','Yes','2005-08-07 16:26:31.077'
UNION
SELECT 'Chad','No','2005-08-08 16:26:31.077'
UNION
SELECT 'Chris','No','2005-08-08 16:26:31.077'
UNION
SELECT 'David','Yes','2005-08-09 16:26:31.077'
UNION
SELECT 'David','No','2005-08-10 16:26:31.077'
UNION
SELECT 'David','Yes','2005-08-11 16:26:31.077'
UNION
SELECT 'Chad','No','2005-08-13 16:26:31.077'
UNION
SELECT 'Chris','Yes','2005-08-14 16:26:31.077'|||Sorry for being unclear.
I mean 10 (lets just say 3 for simplicity) Yes values in a row based of
dDate (datetime). cName,nValue ,dDate form the primary key... for this
example) should be unique.
I hope that answers your questions, if not please restate and I will do
my best to be clear.
Thanks for your interest!
As you can see from
select *
from myTest
order by cName,dDate
David,Peter have 3 consecutive Yes values while Chris and Chad do not.
CREATE TABLE myTest(
cName varchar(50)
,nValue char(3)
,dDate datetime )
INSERT INTO myTest(cName,nValue ,dDate)
SELECT 'David','Yes','2005-08-02 16:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-02 16:26:31.077'
UNION
SELECT 'Chris','Yes','2005-08-02 16:26:31.077'
UNION
SELECT 'Chad','Yes','2005-08-03 16:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-03 16:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-03 17:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-05 16:26:31.077'
UNION
SELECT 'David','Yes','2005-08-05 16:26:31.077'
UNION
SELECT 'Peter','Yes','2005-08-05 18:26:31.077'
UNION
SELECT 'Chad','Yes','2005-08-07 16:26:31.077'
UNION
SELECT 'David','Yes','2005-08-07 16:26:31.077'
UNION
SELECT 'Chad','No','2005-08-08 16:26:31.077'
UNION
SELECT 'Chris','No','2005-08-08 16:26:31.077'
UNION
SELECT 'David','Yes','2005-08-09 16:26:31.077'
UNION
SELECT 'David','No','2005-08-10 16:26:31.077'
UNION
SELECT 'David','Yes','2005-08-11 16:26:31.077'
UNION
SELECT 'Chad','No','2005-08-13 16:26:31.077'
UNION
SELECT 'Chris','Yes','2005-08-14 16:26:31.077'|||CREATE TABLE answers(
aName varchar(50)
,Value char(1)
,dDate datetime)
go
-- select only 3 consecutive Ys to keep it short
insert into answers values('Joe','Y','9/17/2001')
insert into answers values('Joe','Y','9/18/2001')
insert into answers values('Joe','N','9/19/2001')
insert into answers values('Joe','Y','9/20/2001')
insert into answers values('Joe','Y','9/21/2001')
insert into answers values('Jill','N','9/19/2001')
insert into answers values('Jill','Y','9/20/2001')
insert into answers values('Jill','Y','9/21/2001')
insert into answers values('Jill','Y','9/22/2001')
insert into answers values('Jack','Y','9/21/2001')
insert into answers values('Jack','Y','9/22/2001')
go
CREATE TABLE #answers( id int identity
,aName varchar(50)
,Value char(1)
,dDate datetime)
go
insert into #answers(aname,value,ddate)
select aname,value,ddate from answers
order by aname, ddate
go
select distinct aname from #answers a
where exists(select aname from #answers b
-- 3 consecutive Ys to keep it short
where a.id=b.id+2 and a.aname=b.aname)
and not exists(select aname from #answers b
-- 3 consecutive Ys to keep it short
where b.id between a.id-2 and a.id and a.aname=b.aname and b.value='N')
drop table answers
drop table #answers
it works, but it would be interesting to figure out if it performs well
enough in your invironment. Instead of using identity, you could use
row_number() if you have it|||You need keys and constraints on the data. Let me do this in Standard
SQL and leave it to you to translate into dialect. Is this what you
meant?
CREATE TABLE Quiz
(contestant_name VARCHAR(30) NOT NULL,
answer_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,
answer_flag CHAR(1) DEFAULT 'Y' NOT NULL
CHECK (answer_flag IN ('Y', 'N')),
PRIMARY KEY (foo_name, answer_date));
SELECT DISTINCT Q1.person_name
FROM Quiz AS Q1
AND 'Y'
= ALL (SELECT Q2.answer
FROM Quiz AS Q2
WHERE Q2.answer_date
BETWEEN Q1.answer_date AND Q1.answer_date +
INTERVAL 9 DAY);|||>> SELECT DISTINCT Q1.person_name
FROM Quiz AS Q1
AND 'Y'
= ALL (SELECT Q2.answer
FROM Quiz AS Q2
WHERE Q2.answer_date
BETWEEN Q1.answer_date AND Q1.answer_date +
INTERVAL 9 DAY);
<<
this is a wrong answer:
1. the subquery does not take into account contestant_name. So id Joe
always answers Yes from 1/1 to 1/11, but Jill gives a No at 1/5, Joe
will not be selected.
2. INTERVAL 9 DAY clause implies that each contestant answers exactly 1
question a day, which is not the case, just look at the OP's sample data|||Here is one way to do it:
select
distinct T.cName
from (
select
T1.cName,
T1.dDate as YesDate,
coalesce(min(T2.dDate),'99991231') as NextNoDate
from myTest as T1
left outer join myTest as T2
on T2.cName = T1.cName
and T2.nValue = 'No'
and T2.dDate > T1.dDate
where T1.nValue = 'Yes'
group by T1.cName, T1.dDate
) as R
join myTest as T
on T.cName = R.cName
and T.dDate >= R.YesDate
and T.dDate < R.NextNoDate
group by T.cName, R.YesDate
having count(*) >= 3
I am assuming the key is (cName, dDate), without nValue,
but if I'm wrong, you will have to tweak this to make it do
what you want when there are Yes and No values at the
same time for the same person.
SK
Dave wrote:

>Sorry for being unclear.
>I mean 10 (lets just say 3 for simplicity) Yes values in a row based of
>dDate (datetime). cName,nValue ,dDate form the primary key... for this
>example) should be unique.
>I hope that answers your questions, if not please restate and I will do
>my best to be clear.
>Thanks for your interest!
>As you can see from
>select *
>from myTest
>order by cName,dDate
>David,Peter have 3 consecutive Yes values while Chris and Chad do not.
>
>
>CREATE TABLE myTest(
> cName varchar(50)
> ,nValue char(3)
> ,dDate datetime )
>INSERT INTO myTest(cName,nValue ,dDate)
>SELECT 'David','Yes','2005-08-02 16:26:31.077'
> UNION
>SELECT 'Peter','Yes','2005-08-02 16:26:31.077'
> UNION
>SELECT 'Chris','Yes','2005-08-02 16:26:31.077'
> UNION
>SELECT 'Chad','Yes','2005-08-03 16:26:31.077'
> UNION
>SELECT 'Peter','Yes','2005-08-03 16:26:31.077'
> UNION
>SELECT 'Peter','Yes','2005-08-03 17:26:31.077'
> UNION
>SELECT 'Peter','Yes','2005-08-05 16:26:31.077'
> UNION
>SELECT 'David','Yes','2005-08-05 16:26:31.077'
> UNION
>SELECT 'Peter','Yes','2005-08-05 18:26:31.077'
> UNION
>SELECT 'Chad','Yes','2005-08-07 16:26:31.077'
> UNION
>SELECT 'David','Yes','2005-08-07 16:26:31.077'
> UNION
>SELECT 'Chad','No','2005-08-08 16:26:31.077'
> UNION
>SELECT 'Chris','No','2005-08-08 16:26:31.077'
> UNION
>SELECT 'David','Yes','2005-08-09 16:26:31.077'
> UNION
>SELECT 'David','No','2005-08-10 16:26:31.077'
> UNION
>SELECT 'David','Yes','2005-08-11 16:26:31.077'
> UNION
>SELECT 'Chad','No','2005-08-13 16:26:31.077'
> UNION
>SELECT 'Chris','Yes','2005-08-14 16:26:31.077'
>
>

No comments:

Post a Comment