Exploration
The Fathom Workshop Guide that I used in my exploration of simulation
suggests
that the distribution of the mean proportion of males in my samples
follows a
Normal distribution.
Plan and Data
What I plan to do is to use the data generated from the simulation
and to analyse
the distribution through the tools supplied by Fathom.
Analysis
If the simulation data follows
a Normal distribution I would expect the cumulative
distribution to look like an S shaped curve very similar to the
one I obtained. I can
get better visual confirmation that the data follows a Normal distribution
by
looking at the Normal Quartile Plot provided by Fathom.
From the Fathom Help file - A Normal
Quartile Plot shows the distribution
continuous (numeric) data. It plots the z-scores associated with
the percentile of
each case if the data were normally distributed. Therefore, if the
data are
Normal, the plot should show a straight line. My simulation data
is very close to
the straight line shown on the plot.
Finally, I can do some checking whether the
distribution of my simulation data
has the following properties of the Normal Probability distribution:
50% of the data falls on each side of the
mean
About 68% of the data falls within one standard
deviation of the mean
About 95% of the data falls within two standard
deviations of the mean
About 99% of the data falls within three
standard deviations of the mean
For my simulation data the mean proportion is 0.531, the standard
deviation is 0.0497 and the Dot Plot is
where each dot represents two data.
I counted the data in each interval and
divided by 200 hundred to get the percentage within each interval.
I found the
percentage of data that falls on either side of the mean is 49.5%
and 50.5%,
which is about 50% on each side of the mean.
The percentage of data that falls within one standard deviation
(0.48 - 0.58) of
the mean is 73.5, which is larger than the predicted 68%.
The percentage of the data that falls within two standard deviations
(0.43 - 0.63)
of the mean is 97%, which is larger than 95%.
The percentage of the data that falls within three standard deviations
(0.38 -
0.68) of the mean is 100%.
I did not have time to explore whether the approximations
that I did (rounding off
the mean and standard deviation, and counting from the Dot Plot
rather than
using the raw data) biased these results.
Conclusion
The work that I did suggests that the distribution
of the data I obtained from the
simulation could be modelled by the Normal probability distribution.