anyone have the solution or work around?
my system need to select x% ramdom record from a batch of data from sql server for validation. is this posible?
any sql statement allow to select the record ramdomly? let say total record is around 1000, example record that i need to select is:-
record no 5,48,49,50,147,148,256,257,258,411,412,413,414,415,..... so and so
can i use cursor to move around the record to read it?
if we view on performance of the system. how can i make this at max speed? is that clone this table into local access database will make this more faster?
regards
terence chua
How do you intend to use this data ? If you won't display it in tabular form, you could read the entire DataTable, then use random number generation to access different rows in random order ?
|||actually this is a system to receive customer feedback.
and now i want to basic on the total transaction to take out the % of transaction to collect customer contact information. then i will create a new contact list to allow the system to send out sms to customer to say thanks and request them to reply. just to make sure the customer did feedback to us but not our own staff key in the feedback.
this sms list will also be a report to show to top management. regarding how many sms we send out everyday.
regard
terence chua
|||You can use TABLESAMPLE (nnn PERCENT) or TABLESAMPLE (nnn ROWS) in your SELECT statement. But be carefully - TABLESAMPLE extracts not exactly nnn percent or nnn rows, i.e. two executions of, for example, 'select * from Person.Contact tablesample (10 percent)' will have thow different rowcounts. See BOL for details.
WBR, Evergray -- Words mean nothing...|||YOu could use something like this:
SELECT TOP 10 *
FROM
SomeTable
Order by NEWID()
HTH, jens Suessmeyer.
|||Yes, this kind of query will return exactly specified number of records, but by cost of performance (in common case). Query with 'tablesample' will read random number of table pages (and, yeah, may even return 0 records or double expected count) while query with 'order by newid()' should scan entire table or index and compute newid() for each record to properly sort them and select top(n). The second type of queries may be better (in performance) only with highly selective (or covered) queries with appropriate indexes for support them (and, again, always more accurate :-)). Or, if table has an 'uniqueidentifier' or 'rowguid' column (indexed) - this is may be case, too.
If the main goal is performance of such a query (as I've understand from the first post), and the query is not very selective and not covered by any index, the choice is tablesampling. Inaccuracy of this clause may be eliminated by doing tablesampling twice (or more), for example:
select top 10 percent * from
(select * from sometable
tablesample (10 percent)
where x=y
union
select * from sometable
tablesample (10 percent)
where x=y
) a
will be more accurate with percentage than one sample. Anyway, you should compare performance of queries of both types to select the better one for your needs.
This data may be used to fill out temporary table (directly or by using table variable - the last will be better choice) to process it on server side (sms via Database Mail?). But, if you doing processing in an external application, sure you can (maybe even should) extract data from server in any store which is local for that application so server will be free from take a care of it.
Good luck!
WBR, Evergray--
Words mean nothing...|||Hi Everygray,
what your mean is using more then 1 time tablesampling to make sure the record more accurate?
to make sure what i understand is correct in ur sample, "a" is temporary table? is the "tablesample(10 percent)" is the syntax for the query?
below is the code for single tablesampling?
--
select * from sometable
tablesample (10 percent)
where x=y
--
so let say my table name call "Answers" then my query should be like this?
select top 30 percent * from
(select * from Answers
tablesample (10 percent)
where date >= yesterday
union
select * from Answers
tablesample (10 percent)
where date >= yesterday
) a
now my system should able to take out the contact information and pass to external system to process and send out the sms. so the output should be in either .txt or .dat file or something else like .xml.
btw are you having any better idea or better way to done the job with sms the data directly out to my customer phone without export the data out? because now my company planing to corporate with a communication provided company to send out sms. so them may request the customer list in a txt, dat or xml file. tat's y i looking for the solution now before my management want it to be implement.
regards
terence chua
|||
see below...Terence Chua wrote:
Hi Everygray, what your mean is using more then 1 time tablesampling to make sure the record more accurate?
Terence Chua wrote:
to make sure what i understand is correct in ur sample, "a" is temporary table?
No, it's called 'derived table' (named subquery)
Terence Chua wrote:
is the "tablesample(10 percent)" is the syntax for the query? below is the code for single tablesampling?
--
select * from sometable
tablesample (10 percent)
where x=y
--
Yes
Terence Chua wrote:
so let say my table name call "Answers" then my query should be like this? select top 30 percent * from
(select * from Answers
tablesample (10 percent)
where date >= yesterday
union
select * from Answers
tablesample (10 percent)
where date >= yesterday
) a
Ok, maybe I've been not clear enough about sampling. Tablesample reads random number of PAGES, not ROWS. If you specify "tablesample (10 percent)" in your query, server will randomly select approximately 10 percent of pages on which table data is stored (let's say, 10 of 100 total) and return ALL rows from each selected page. If your table is big enough and rows are evenly distributed on data pages, resulting count of rows will be close to 10 percent. But, if, for example, your table is quite small (for example, 5 pages), 10% of total number of pages is zero, so query will result no rows at all. Or, if some of data pages in your table has small count of rows (e.g. 2-3), and some - big (e.g. 20-30), the resulting count will depend on which pages were selected (e.g. your query may return 3 rows or 300 on the same data). Or, if you specify "tablesample (10 rows)" for table which contains 300 rows at all, which are stored on 15 pages, the result again be 0 rows (10 rows from 300 are 3%, and 3% from 15 pages are 0 pages). So, if the same query executed several times against the same data returns very different results (e.g. 0, 30, 300, 100....) we may think about 'equalizing' these results. Your query above should return number of records closer to 6% of total rows than simple 'select * .. tablesample 6%'.
Terence Chua wrote:
now my system should able to take out the contact information and pass to external system to process and send out the sms. so the output should be in either .txt or .dat file or something else like .xml. btw are you having any better idea or better way to done the job with sms the data directly out to my customer phone without export the data out? because now my company planing to corporate with a communication provided company to send out sms. so them may request the customer list in a txt, dat or xml file. tat's y i looking for the solution now before my management want it to be implement.
regards
terence chua
If number of messages to send is low, you may use sp_send_dbmail stored proc to send them through your smtp server (little emails to xxxxxxx@.mobile.operator.com or so). But this is not likely your case, because it is resource consuming task (all messages are stored in a queue on the server, mailing program executes here too), at least, you should't use the same SQL Server instance for it.
As for export - there are MANY different ways to export data in any format - using linked servers (in any external database or plaintext file), simply executing query with sqlcmd and storing results (maybe using FOR XML) in plaintext file, create webservice which will return such data in desired format to your partner and so on... Again, what the goal is? Which restrictions are? ;-)
WBR, Evergray--
Words mean nothing...|||
Hi Evergray,
Thanks for your help, but when i try the sql statement in sql server Enterprise Manager, at the view. i get this error.
[Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near the keyword 'PERCENT'.
below is the code i test on hard code the date at 26th Jan 2006 and get the answer between a range.
SELECT TOP 30 PERCENT *
FROM (SELECT *
FROM Answers tablesample(10 PERCENT)
WHERE (iMinutes BETWEEN 570 AND 1020) AND (MONTH(dtAnswer) = 1) AND (YEAR(dtAnswer) = 2006) AND (iTemplateId = 1) AND (fDiscard = 0)
AND (fIncomplete = 0) AND (DAY(dtAnswer) = 26)
UNION
SELECT *
FROM Answers tablesample(10 PERCENT)
WHERE (iMinutes BETWEEN 570 AND 1020) AND (MONTH(dtAnswer) = 1) AND (YEAR(dtAnswer) = 2006) AND (iTemplateId = 1) AND (fDiscard = 0) AND
(fIncomplete = 0) AND (DAY(dtAnswer) = 26))
The first, you should add an alias for your inner query, such as
select top 30 percent *
from (...) somealias
The second... the error message you refer to... Looks like you are using SQL Server 2000 or your database compatibility level is not 9.0, as required for tablesample to work... If this is the case, you should use Jens's suggestion because tablesampling is unavailable for you.
The third. If your Answers table has an index on dtAnswer column, you'd better use two DateTime constants and BETWEEN clause in you query rather than using these Day(), Year() etc., because your query doesn't benefit from such index, always doing fullscan of table.
WBR, Evergray
--
Words mean nothing....
No comments:
Post a Comment