Conditional sampling
Caution
Applying command filtering or scope parameters when sampling compromises the validity of the sample. If you do this, a note stating that the sample results may be invalid is generated in the log.
Although the capability to apply command filters and scope parameters exists in the Sampling dialog box, the steps have been removed from the sampling procedures in this guide.
Conditional sampling is used to restrict sample selection to records that meet a specified condition – for example, transactions originating at a particular location, or products made by a particular manufacturer.
When you perform conditional sampling, you must ensure that you are working with an accurate data set. Using a command filter to refine data while sampling can yield unexpected results. A best practice is to first extract the data that meets the desired condition to a new table, and then perform the sampling, without using filters, on the new table.
Sampling filtered data versus filtering sampled data
When performing conditional sampling, be aware of the difference between:
- sampling filtered data
- filtering sampled data
Best practice: sampling filtered data
You have a table with 1000 records of which 150 meet the condition “Dept 03”. You want to draw a sample of 10 records from “Dept 03”.
The best way to achieve your goal is to filter and extract the “Dept 03” records to a new table first, before drawing the sample. You then sample the new table, so that you are drawing only from “Dept 03” records. Using this method, you are sampling filtered data.
Avoid: filtering sampled data
You have a table with 1000 records of which 150 meet the condition “Dept 03”. You want to draw a sample of 10 records from “Dept 03”.
If you draw the sample of 10 records from the original table with 1000 records, and in the process apply the command filter IF Dept = "03", you are filtering sampled data.
The problem with this method is that Analytics selects 10 records from the unfiltered data set and then presents only the records that match “Dept 03”, which results in fewer than the required 10 records in the sample. The sample is not representative and is invalid.
For similar reasons, filtering an output table containing sampled records invalidates the sample.