Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria [email protected] Important statistical terms Population: a set which includes all measurements of interest to the researcher
(The collection of all responses, measurements, counts that are of interest) Sample: A subset of the population or ?Why sampling Get information about large populations
Less costs Less field time More accuracy i.e. Can Do A Better Job of Data Collection
When its impossible to study the whole population Target Population: The population to be studied/ to which the investigator wants to generalize his results Sampling Unit: smallest unit from which sample can be selected
Sampling frame List of all the sampling units from which sample is drawn Sampling scheme Method of selecting sampling units from sampling frame Types of sampling
Non-probability samples Probability samples Non probability samples Convenience samples (ease of access) sample is selected from elements of a population that are easily accessible Snowball sampling (friend of friend.etc.)
Purposive sampling (judgemental) You chose who you think should be in the study Quota sample Non probability samples Probability of being chosen is unknown Cheaper- but unable to generalise potential for bias
Probability samples Random sampling Each subject has a known probability of being selected
Allows application of statistical sampling theory to results to: Generalise Test hypotheses Conclusions Probability samples are the best
Ensure Representativeness Precision Methods used in probability samples Simple random sampling Systematic sampling
Stratified sampling Multi-stage sampling Cluster sampling Simple random sampling Table of random numbers 684257954125632140 582032154785962024 362333254789120325
985263017424503686 Systematic sampling Sampling fraction Ratio between sample size and population size Systematic sampling Cluster sampling Cluster: a group of sampling units close to each
other i.e. crowding together in the same area or neighborhood Cluster sampling Section 1 Section 2 Section 3 Section 5
Section 4 Stratified sampling Multi-stage sampling Errors in sample Systematic error (or bias) Inaccurate response (information bias) Selection bias
Sampling error (random error) Type 1 error The probability of finding a difference with our sample compared to population, and there really isnt one.
Known as the (or type 1 error)) Usually set at 5% (or 0.05) Type 2 error The probability of not finding a difference that actually exists between our sample
compared to the population Known as the (or type 2 error)) Power is (1- ) and is usually 80% Sample size
Quantitative n 2 2 Z D2 (12 22 )xF n 2
D Qualitative Z2 (1 ) n D2 2 P (1 - P) F n D2
Problem 1 A study is to be performed to determine a certain parameter in a community. From a previous study a sd of 46 was obtained. If a sample error of up to 4 is to be accepted. How many subjects should be included in this study at 99% level of confidence? Answer n
2 2 Z 2 D 2 2 2.58 x 46 n
880.3 ~ 881 42 Problem 2 A study is to be done to determine effect of 2 drugs (A and B) on blood glucose level. From previous studies using those
drugs, Sd of BGL of 8 and 12 g/dl were obtained respectively. A significant level of 95% and a power of 90% is required to detect a mean difference between the two groups of 3 g/dl. How many subjects should be include in each group? Answer 2 1
2 2 ( )xF n 2 D 2 n
2 (8 12 )x10.5 2 3 in each group 242.6 ~ 243
Problem 3 It was desired to estimate proportion of anaemic children in a certain preparatory school. In a similar study at another school a proportion of 30 % was detected. Compute the minimal sample size required at a confidence limit of 95% and accepting a difference of up to 4% of the true population. Answer
2 Z (1 ) n 2 D 2 1.96 x 0.3(1 0.3) n 504
. 21 ~ 505 2 (0.04) Problem 4 In previous studies, percentage of hypertensives among Diabetics was 70% and among non diabetics was 40% in a
certain community. A researcher wants to perform a comparative study for hypertension among diabetics and non-diabetics at a confidence limit 95% and power 80%, What is the minimal sample to be taken from each group with 4% accepted
difference of true value? Answer 2 P (1 - P) F n 2 D 2 x 0.55 (1 - 0.55) x7.8 n
2413 . 2 2 0.04 Precision Cost