Summary Class notes - MAT-24306

- MAT-24306
- Bastiaan Engel
- 2020 - 2021
- Wageningen University (Wageningen University, Wageningen)
- Voeding en Gezondheid
138 Flashcards & Notes
1 Students
  • This summary

  • +380.000 other summaries

  • A unique study tool

  • A rehearsal system for this summary

  • Studycoaching with videos

Remember faster, study better. Scientifically proven.

Summary - Class notes - MAT-24306

  • 1580684400 t-tests, confidence intervals & sample size calculation

  • The response variable =
    Is y 

    this is the variable that you are interested in. 
  • Experimental study =
    Experiment where treatments can be randomly assigned to experimental units
  • Met welk symbool wordt de population mean aangegeven?
    Met het mu symbool
  • When we reject H0 we say that we have shown (proven) that the alternative (research) hypothesis is true, .... Dus er is aangetoond dat (we have shown that) ....
  • The statistical model hoe ziet deze eruit?
    Zie de afbeelding
  • Note that equal variance sigma^2 for the tow distributions is assumed 

    so, we assume:
    • normality 
    • equal variance 
    • independence 
  • The test statistic measures how well the data match up with H0, hoe bereken je dit?
    Zie de afbeelding
  • The rejection region, wat houdt het in?
    Zie de afbeelding 

    dus reject als t in het groene gebied komt en als t in het rode gebied komt 
  • P-value =
    Probability under H0 for the outcome of test statistic t and anything more extreme (supporting Ha)

    P-value > 0.05, do not reject H0 
    P-value < 0.05, reject H0 
  • The general structure of t statistic
    t = (estimate - value from H0) / standard error 

    • here estimate is the difference between two sample means 
    • value from H0 is often zero (but not always)
    • standard error is standard deviation of the estimator 
  • Een confidence interval en de t-test, hoe zit het in elkaar en hoe ziet de formule eruit.
    • Confidence interval consists of all values for e.g. mu1 - mu2 that are likely on the basis of the data observed 
    • these are all parameter values not rejected by the t-test 
    • this is something entirely different from rejection region!

    often a confidence interval has the following structure:

    (estimate +- constant * standard error)

    constant --> here from a t distribution 
  • tabel van de type 1 and type 2 errors
    Zie de afbeelding
  • Type I error =
    P (type I error) </= alfa 

    typically alfa = 0.05 
  • Type II error =
    P (type II error) = beta

    at a given alfa, for smaller beta a larger sample size n is required.

    als je de error van beta kleiner wil maken heb je een grotere steekproef nodig. 
  • Formule voor de sample size ook wel de power calculation is te vinden op pagina 11 van de lecture notes
  • 1580770800 inference on probabilities

  • Welke 8 stappen moet je zetten om te komen to je conclusie
    1. The null-hypothesis H0 and the alternative hypothesis Ha
    2. the test statistic 
    3. the distribution of the test statistic under H0 and its behaviour under Ha, i.e. Large outcomes supporting Ha, or small outcomes, or both 
    4. the type of rejection region (left-, right-, two-sided)
    5. the rejection region for a given significance level alfa 
      1. the outcome of the test statistic 
    6. the outcome of the test statistic 
      1. the appropriate (one- or two-sided) P-value 
    7. whether the value of the test statistic is within the rejection region, or not 
      1. whether the P-value is below alfa, or not 
    8. your conclusion: H0 is rejected or not, also the conclusion in words in pracitical terms 
  •  Hoe bereken je de confidence interval for probability pi
    Zie de afbeelding
  • Symbool E =
    E is de helft van het gewenst confidence interval 

    width of 0.04, so E = 0.04 / 2 = 0.02
  • What is the basic idea of Fisher's exact test
    • Fix the margins, ie. 14, 16, 17, 13
    • note that 14 and 16 were already fixed
    • margins 17 and 13 are observed, but now fixed as well
    • list all tables with the same margins, but with different numbers inside
    • under H0 a probability van be attached to each table
    • consider all tables with probability smaller than or equal to the probability of the observed table (which includes the observed table)
    • sum the probabilities of these tables; this is the P-value.
  • De basics zijn bekend van de fisher exact test 2x2, geef nog een korte samenvatting:
    • Two random samples from two populations 
    • often the two populations are the same population of units either receiving one treatment or another 
    • often the two random samples follow from one random sample and randomization over two treatments 
    • we compare two population proportions H0: pi1 -pi2 = 0 
    • data can be presented in a 2 x 2 table 
    • for Fisher's exact test we condition upon the margins 
    • Ha may be two- or one-sided, e.g. Ha: pi1 - pi2 < 0 
      • dan moet je R hier wel specifiek naar vragen als je een one-sided wil
  • Bereken:
    1. odds for liking taste without the additive
    2. odds for liking taste with the additive

    En hoe interpreteert je de odds?
    1. 0.56
    2. 3

    als de odds kleiner is dan 1 is de kans groter dat het niet lekker wordt gevonden als de odds groter is dan 1 is de kans groter dat het wel lekker wordt gevonden. 
  • Hoe bereken je de odds ratio
    OR = (odds without additive) / (odds with additive)

    OR^ = (estimated odds without additive) / (estimated odds with additive)
  • Normaal wordt er een interval van 0.95 gebruik voor een CI (OR^ +/- 1.96 * se (OR^)), based on normal approximation, maar dit werkt niet goed, omdat de distribution of OR^ quite skewed is. Wat nemen we dan wel?
    The distribution of ln(OR^) is much closer to a normal distribution
  • Distribution of ln(OR^) approximated by normal distribution with 

     mean = ln(OR^) 

    standard deviation = wortel van 1/a + 1/b + 1/c + 1/d
  • A confidence interval for the odds ratio berekenen
    Hier zijn 2 stappen voor nodig, zie de afbeelding
  • Summary odds and odds ratio 

    • We defined odds and the odds ratio OR
    • An approximate CI was derived for OR
    • For small counts the approximation does not work well. 
    • collecting all values for OR that are not rejected by Fisher's exact test, we get an exact CI for OR
    • the exact interval can be easily obtained form R
    • for large counts the two intervals will be practically the same
  • De Pearson's chi-square goodness-of-fit test, van het volgende voorbeeld
    H0: pi1 = 0.50, pi2 = 0.25, pi3 = 0.10, pi4 = 0.15

    the test statistic is due to Karl Pearson:
    (zie afbeelding)

    we reject H0 when the outcome is too large. 
  • When is the outcome of a person's chi-square goodness-of-fit test too large (that we reject H0)
    We need the distribution of chi-square under H0 to decide whether 24.33 is large enough to be an unlikely outcome under H0.

    this distribution can be approximated by a chi-square distribution with 3 degrees of freedom.

    P-value is area to the right of the outcome, traditionally from a chi-square distribution with (K-1) degrees of freedom
Read the full summary
This summary. +380.000 other summaries. A unique study tool. A rehearsal system for this summary. Studycoaching with videos.

Latest added flashcards

Hoe bereken je een estimation of mu, EMS model
Zie de afbeelding
Wat is the estimation of variance components, met een EMS
Zie de afbeelding
Wat is de components of variance, terminology
Zie de afbeelding
Als er analysts worden toegevoegd, wat zijn de dan random effects/delen van een one-way model
Zie de afbeelding
assumptions in regression, ANOVA and ANCOVA are
  • Linearity of parameters
  • independence
  • equal variance 
  • normality 
 Compare with an F-test: complete model en reduced model
Complete model (=model with interaction):
  • separate slope and separate intercept for each fertilizer, so three arbitrary lines 

reduced model (= analysis of covariance model):
  • common slope and separate intercept for each fertilizer, so three parallel lines 
Wat zijn de assumptions in ANCOVA
In addition to the usual assumptions about error terms epsilon, we need to verify:

  • relationship is linear between the response y and covariate x 
  • slope is the same for all treatments (parallel lines)
  • covariate x does not depend on the treatments 

the last assumption will certainly hold when x is observed prior to random assignment of the treatments. 
How does the correction work out?
Zie de afbeelding
Wat is het model van de ANCOVA:
Zie de afbeelding 

als z = 0 dan is x = het gemiddelde
als z < 0 dan is x onder het gemiddelde
als z > 0 dan is x boven het gemiddelde 
Wat zijn de verschillen tussen observationele studies en experimentele studies
Zie de afbeelding