 # Summary Class notes - MAT-24306

##### Course
- MAT-24306
- Bastiaan Engel
- 2020 - 2021
- Wageningen University (Wageningen University, Wageningen)
- Voeding en Gezondheid
107 Flashcards & Notes
1 Students

# Remember faster, study better. Scientifically proven. • ## 1580684400 t-tests, confidence intervals & sample size calculation

• The response variable =
Is y

this is the variable that you are interested in.
• Experimental study =
Experiment where treatments can be randomly assigned to experimental units
• Met welk symbool wordt de population mean aangegeven?
Met het mu symbool
• When we reject H0 we say that we have shown (proven) that the alternative (research) hypothesis is true, .... Dus er is aangetoond dat (we have shown that) ....
• The statistical model hoe ziet deze eruit?
Zie de afbeelding
• Note that equal variance sigma^2 for the tow distributions is assumed

so, we assume:
• normality
• equal variance
• independence
• The test statistic measures how well the data match up with H0, hoe bereken je dit?
Zie de afbeelding
• The rejection region, wat houdt het in?
Zie de afbeelding

dus reject als t in het groene gebied komt en als t in het rode gebied komt
• P-value =
Probability under H0 for the outcome of test statistic t and anything more extreme (supporting Ha)

P-value > 0.05, do not reject H0
P-value < 0.05, reject H0
• The general structure of t statistic
t = (estimate - value from H0) / standard error

• here estimate is the difference between two sample means
• value from H0 is often zero (but not always)
• standard error is standard deviation of the estimator
• Een confidence interval en de t-test, hoe zit het in elkaar en hoe ziet de formule eruit.
• Confidence interval consists of all values for e.g. mu1 - mu2 that are likely on the basis of the data observed
• these are all parameter values not rejected by the t-test
• this is something entirely different from rejection region!

often a confidence interval has the following structure:

(estimate +- constant * standard error)

constant --> here from a t distribution
• tabel van de type 1 and type 2 errors
Zie de afbeelding
• Type I error =
P (type I error) </= alfa

typically alfa = 0.05
• Type II error =
P (type II error) = beta

at a given alfa, for smaller beta a larger sample size n is required.

als je de error van beta kleiner wil maken heb je een grotere steekproef nodig.
• Formule voor de sample size ook wel de power calculation is te vinden op pagina 11 van de lecture notes
• ## 1580770800 inference on probabilities

• Welke 8 stappen moet je zetten om te komen to je conclusie
1. The null-hypothesis H0 and the alternative hypothesis Ha
2. the test statistic
3. the distribution of the test statistic under H0 and its behaviour under Ha, i.e. Large outcomes supporting Ha, or small outcomes, or both
4. the type of rejection region (left-, right-, two-sided)
5. the rejection region for a given significance level alfa
1. the outcome of the test statistic
6. the outcome of the test statistic
1. the appropriate (one- or two-sided) P-value
7. whether the value of the test statistic is within the rejection region, or not
1. whether the P-value is below alfa, or not
8. your conclusion: H0 is rejected or not, also the conclusion in words in pracitical terms
•  Hoe bereken je de confidence interval for probability pi
Zie de afbeelding
• Symbool E =
E is de helft van het gewenst confidence interval

width of 0.04, so E = 0.04 / 2 = 0.02
• What is the basic idea of Fisher's exact test
• Fix the margins, ie. 14, 16, 17, 13
• note that 14 and 16 were already fixed
• margins 17 and 13 are observed, but now fixed as well
• list all tables with the same margins, but with different numbers inside
• under H0 a probability van be attached to each table
• consider all tables with probability smaller than or equal to the probability of the observed table (which includes the observed table)
• sum the probabilities of these tables; this is the P-value.
• De basics zijn bekend van de fisher exact test 2x2, geef nog een korte samenvatting:
• Two random samples from two populations
• often the two populations are the same population of units either receiving one treatment or another
• often the two random samples follow from one random sample and randomization over two treatments
• we compare two population proportions H0: pi1 -pi2 = 0
• data can be presented in a 2 x 2 table
• for Fisher's exact test we condition upon the margins
• Ha may be two- or one-sided, e.g. Ha: pi1 - pi2 < 0
• dan moet je R hier wel specifiek naar vragen als je een one-sided wil
• Bereken:
1. odds for liking taste without the additive
2. odds for liking taste with the additive

En hoe interpreteert je de odds?
1. 0.56
2. 3

als de odds kleiner is dan 1 is de kans groter dat het niet lekker wordt gevonden als de odds groter is dan 1 is de kans groter dat het wel lekker wordt gevonden.
• Hoe bereken je de odds ratio

• Normaal wordt er een interval van 0.95 gebruik voor een CI (OR^ +/- 1.96 * se (OR^)), based on normal approximation, maar dit werkt niet goed, omdat de distribution of OR^ quite skewed is. Wat nemen we dan wel?
The distribution of ln(OR^) is much closer to a normal distribution
• Distribution of ln(OR^) approximated by normal distribution with

mean = ln(OR^)

standard deviation = wortel van 1/a + 1/b + 1/c + 1/d
• A confidence interval for the odds ratio berekenen
Hier zijn 2 stappen voor nodig, zie de afbeelding
• Summary odds and odds ratio

• We defined odds and the odds ratio OR
• An approximate CI was derived for OR
• For small counts the approximation does not work well.
• collecting all values for OR that are not rejected by Fisher's exact test, we get an exact CI for OR
• the exact interval can be easily obtained form R
• for large counts the two intervals will be practically the same
• De Pearson's chi-square goodness-of-fit test, van het volgende voorbeeld
H0: pi1 = 0.50, pi2 = 0.25, pi3 = 0.10, pi4 = 0.15

the test statistic is due to Karl Pearson:
(zie afbeelding)

we reject H0 when the outcome is too large.
• When is the outcome of a person's chi-square goodness-of-fit test too large (that we reject H0)
We need the distribution of chi-square under H0 to decide whether 24.33 is large enough to be an unlikely outcome under H0.

this distribution can be approximated by a chi-square distribution with 3 degrees of freedom.

P-value is area to the right of the outcome, traditionally from a chi-square distribution with (K-1) degrees of freedom