Exploration
In this case I will be using simulation to explore a situation rather
than trying to
answer a particular question. I will be simulating the gender of the
driver in an
OPP random check of 100 vehicles (not including large truck etc. which
require
special driver's license) on the 401 for a given day of the year.
I aim to explore
how the composition of these samples can vary when the procedure is
repeated.
I will look at the proportion of male drivers in each sample and the
mean and
standard deviation of the data accumulated when the experiment is
repeated.
Plan
As an estimate of the probability of the driver of the car being male
or female
I used the data provided by the Ontario Ministry of Transport in its
2000
Ontario Road Safety Annual Report, Table 2.16 which provides Sex of
Driver
Population by Age Groups 2000. As part of the simulation I will assume
that
only Ontario drivers will be stopped.
Analysis
From Table 2.16 of the report 2000 Ontario Road Safety Annual Report
I find
that there are 4,313,694 male drivers out of a total of 8,121,374
licensed
Ontario drivers. Thus the probability that a random licensed Ontario
driver
is male is 0.531 (to three decimal places).
Since this is a two outcome experiment and if I assume that drivers
are
statistically independent the experiment suggests a Binomial Probability
model.
So, with a sample of 100 vehicles, the mean number of males driving
the cars
is given by
µ = np or µ = 53.1 males,
with a standard deviation of
However this does not provide me with an indication
of what could be the result in each sample. To get this view I tried
a simulation and to do this I followed the Instructions given in a
Fathom Workshop Guide (reference)
Procedures for the simulation of 100 Ontario
drivers.
1. I started with a new,
empty Fathom document |
|
2. From the shelf I dragged
a slider into the document |
|
3. By double clicking
on the name V1 I changed it to probmale |
|
4. I changed the slider
so that my scale would be approx 0 to 1 |
|
5. I set the slider to
the probability of randomly selecting a male drive, namely 0.531 |
|
6. I dragged a new
collection from the shelf |
|
7. I double clicked on
the collection1 and renamed it Sample of drivers |
|
8. With the collection
selected I chose New Cases from the Data Menu |
|
9. I typed in 100 for
one hundred drivers in the dialogue box and clicked the OK |
|
10.
I double clicked on the collection which brought up its inspector
|
|
11. In <new> I
typed driver and pressed Enter |
|
12. I double clicked
in the formula cell |
|
- 13. I typed in the following
- if(random()<probmale) and in the curly bracket
"male" in the top line and "female"
in the bottom line
|
|
14. I closed the formula
editor by clicking the OK button |
|
15. I dragged a graph
from the shelf into the document |
|
16. Dragged the driver
attribute from the inspector onto the x axis.This gave me all that
I needed for the simulation. |
|
17. To get a new set
of data from the simulation I chose Rerandomize from the
Analyze menu, and I explored the changes that occurred each
time I rerandomized. |
|
18. I looked at the effect
of changing the probability on the slider and then reset the probability
at 0.531 |
|
For further analysis I decided to accumulate
the data information of each 100 simulation and to see how these
data appeared graphically, what was their mean and standard deviation.
To do this I followed the instructions of the Fathom tutorial.
|
19.
I opened the inspector window and clicked the Measures tab |
|
20. In <new>
I typed proportionOfMale for the measure's name |
|
21. I double clicked
in the formula cell |
|
22. I entered the formula
proportion(drivers="male")
|
|
23. To ensure that
it was working I rerandomized a number of times and observed the
change in the proportionOfMale |
|
24. I closed the
Sample of Drivers inspector |
|
25. With the collection
selected, I choose Collect Measures from the Analyze
menu |
|
26. I double clicked the measures collection
to open its inspector
|
|
27. Clicked on
the Collect Measures tab |
|
28. Changed the
number of measures from 5 to 200 |
|
29. Finally clicked
Collect More Measures (this takes quite a while if the animation
is on) |
|
30. To look at
the results I brought a new graph onto the page |
|
31. I doubleclicked on
the sample of drivers collection to open its inspector |
|
32. Dragged the proportionOfMale
from the inspector to the x-axis of the graph |
|
33. Changed the graph
from a dot plot to a histogram |
|
34. To get the mean and
standard deviation, I choose a Summary Table from the Insert menu |
|
35. Dragged the proportionOfMale
from the inspector to the top row of this table |
|
36. Noted the mean and
obtained the Standard Deviation by doubleclicking on S1=mean() and
changing it to stdDev(). |
|
REFERENCE:
Finzer, W. and Erickson, T., p. 25-27, "Tutorial 4: Simlation -
Polling Voters",
Workshop Guide for Fathom Dynamic Statistics(TM) Software Version
1.1, 2000. |
The results of the simulation follow:
I introduced a slider for the probability of
stopping a male driver and set it to as
close to 0.531 as I could
Through the simulation I generated a table of the
sample data
from which I generated a Bar Chart of
the gender distribution for 100 drivers.
A typical distribution was
I then repeated the simulation 200 times,
in each case, noting the mean proportion of male drivers in each sample.
These 200 data were then plotted in a bar chart
and the mean and standard deviation
of this distribution was calculated
Observations
Through the simulation I saw that the sample composition of male and
female
drivers could change quite a bit from sample to sample, not only in
terms of totals
but also in the order in which they appeared. When this process was
repeated a
large number of times, the mean proportion of all the samples was
close to the
one on which I based my simulation, and although this number changed
slightly
as the number of repetitions was increased, it stayed consistently
close to 0.531,
which I noticed it is µ/n (where µ is the mean of the Binomial
distribution).
The
behaviour of the standard deviation was a bit more erratic than that
of the mean
but it did move around the value of 0.05. I explored to see whether
this was
related to any of the values that I used in the simulation. I found
that it is close to
the sqrt(.431x.469) and is therefore also close to ó/sqrt(n) (where
ó is the
standard deviation of the Binomial distribution).
>>NEXT