By: Andrew Collier

Re-posted from: http://www.exegetic.biz/blog/2015/10/monthofjulia-day-28-hypothesis-tests/

It’s all very well generating myriad statistics characterising your data. How do you know whether or not those statistics are telling you something interesting? Hypothesis Tests. To that end, we’ll be looking at the HypothesisTests package today.

The first (small) hurdle is loading the package.

ulia> using HypothesisTests

That wasn’t too bad. Next we’ll assemble some synthetic data.

julia> using Distributions julia> srand(357) julia> x1 = rand(Normal(), 1000); julia> x2 = rand(Normal(0.5, 1), 1000); julia> x3 = rand(Binomial(100, 0.25), 1000); # 25% success rate on samples of size 100 julia> x4 = rand(Binomial(50, 0.50), 1000); # 50% success rate on samples of size 50 julia> x5 = rand(Bernoulli(0.25), 100) .== 1;

We’ll apply a one sample t-test to `x1`

and `x2`

. The output below indicates that `x2`

has a mean which differs significantly from zero while `x1`

does not. This is consistent with our expectations based on the way that these data were generated. I’m impressed by the level of detail in the output from `OneSampleTTest()`

: different aspects of the test are neatly broken down into sections (population, test summary and details) and there is automated high level interpretation of the test results.

julia> t1 = OneSampleTTest(x1) One sample t-test ----------------- Population details: parameter of interest: Mean value under h_0: 0 point estimate: -0.013027816861268473 95% confidence interval: (-0.07587776077157478,0.04982212704903784) Test summary: outcome with 95% confidence: fail to reject h_0 two-sided p-value: 0.6842692696393744 (not signficant) Details: number of observations: 1000 t-statistic: -0.40676289562651996 degrees of freedom: 999 empirical standard error: 0.03202803648352013 julia> t2 = OneSampleTTest(x2) One sample t-test ----------------- Population details: parameter of interest: Mean value under h_0: 0 point estimate: 0.5078522467069418 95% confidence interval: (0.44682036100064954,0.5688841324132342) Test summary: outcome with 95% confidence: reject h_0 two-sided p-value: 2.6256160116367554e-53 (extremely significant) Details: number of observations: 1000 t-statistic: 16.328833826939398 degrees of freedom: 999 empirical standard error: 0.031101562554276502

Using `pvalue()`

we can further interrogate the p-values generated by these tests. The values reported in the output above are for the two-sided test, but we can look specifically at values associated with either the left- or right tails of the distribution. This makes the outcome of the test a lot more specific.

julia> pvalue(t1) 0.6842692696393744 julia> pvalue(t2) 2.6256160116367554e-53 julia> pvalue(t2, tail = :left) # Not significant. 1.0 julia> pvalue(t2, tail = :right) # Very significant indeed! 1.3128080058183777e-53

The associated confidence intervals are also readily accessible. We can choose between two-sided or left/right one-sided intervals as well as change the significance level.

julia> ci(t2, tail = :both) # Two-sided 95% confidence interval by default (0.44682036100064954,0.5688841324132342) julia> ci(t2, tail = :left) # One-sided 95% confidence interval (left) (-Inf,0.5590572480083876) julia> ci(t2, 0.01, tail = :right) # One-sided 99% confidence interval (right) (0.43538291818831604,Inf)

As a second (and final) example we’ll look at `BinomialTest()`

. There are various ways to call this function. First, without looking at any particular data, we’ll check whether 25 successes from 100 samples is inconsistent with a 25% success rate (obviously not and, as a result, we fail to reject this hypothesis).

julia> BinomialTest(25, 100, 0.25) Binomial test ------------- Population details: parameter of interest: Probability of success value under h_0: 0.25 point estimate: 0.25 95% confidence interval: (0.16877973809934183,0.3465524957588082) Test summary: outcome with 95% confidence: fail to reject h_0 two-sided p-value: 1.0 (not signficant) Details: number of observations: 100 number of successes: 25

Next we’ll see whether the Bernoulli samples in `x5`

provide contradictory evidence to an assumed 50% success rate (based on the way that `x5`

was generated we are not surprised to find an infinitesimal p-value and the hypothesis is soundly rejected).

julia> BinomialTest(x5, 0.5) Binomial test ------------- Population details: parameter of interest: Probability of success value under h_0: 0.5 point estimate: 0.18 95% confidence interval: (0.11031122915326055,0.26947708596681197) Test summary: outcome with 95% confidence: reject h_0 two-sided p-value: 6.147806615048005e-11 (extremely significant) Details: number of observations: 100 number of successes: 18

There are a number of other tests available in this package, including a range of non-parametric tests which I have not even mentioned above. Certainly HypothesisTests should cover most of the bases for statistical inference. For more information, read the extensive documentation. Check out the sample code on github for further examples.

The post #MonthOfJulia Day 28: Hypothesis Tests appeared first on Exegetic Analytics.