본문 바로가기
Growth/통계

Sampling With and Without Replacement | Introduction to Sampling | Data Scientists Must Know

by Diligejy 2022. 9. 13.

https://www.youtube.com/watch?v=LnGFL_A6A6A 

1. Why do we need sampling?

a. Before we can conduct an experiment such as A/B testing or an observational study, we need to select some participants.

     - E.g Visitors of our website.

b. Participants can be any other objects, events of interest.

c. Ideally, we have access to the whole population.

d. E.g. study popular tweets

    - Look at all of the tweets produced every day.

e. E.g. length of books on a certain topic.

    - Look at all of the books ever written on that topic.

f. Surveying or measuring the entire population would be an enormously time-intensive and costly process.

g. Most of the time, no access to all possible observations.

    - Challenging to gather all observations together. 

    - E.g. About 600 million tweets are produced every day.

    - More observations are expected to be made in future.

    - Difficult or expensive to make more observations.

h. We do not easily have access to the entire population due to cost, privacy, complexity, etc.

i. This is where sampling comes in.

j. Data Scientists do not need to sample an entire population.

 

2. What is Sampling?

a. E.g. Understand sentiment in popular topics on Twitter.

b. Select a subset of them that we believe has the same proportions of topics as the entire population of tweets.

 

3. Sampling with Replacement

a. When a sampling unit is drawn from a population and is returned to that population after its characteristics have been recorded before the next unit is drawn.

b. Might end up selecting and measuring the same unit more than once.

c. Items in the sample are independently drawn from the population.

 

4. Sampling without Replacement

a. When a sampling unit is drawn from a population and is not returned to that population before the next unit is drawn.

b. Each draw is not independent.

5. Sampling WR vs Sampling WOR

댓글