 # Summary Statistics The Art and Science of Learning from Data : Third Edition : Alan Agresti & Christine Franklin

ISBN-10 1784483516 ISBN-13 9781784483517
269 Flashcards & Notes
4 Students

# Remember faster, study better. Scientifically proven. ## This is the summary of the book "Statistics The Art and Science of Learning from Data : Third Edition : Alan Agresti & Christine Franklin". The author(s) of the book is/are Alan Agresti Christine A Franklin. The ISBN of the book is 9781784483517 or 1784483516. This summary is written by students who study efficient with the Study Tool of Study Smart With Chris. • ## 1 statistics: the art and science of learning from data

• what is data?
the information we gather using experiments and surveys.
• what is statistics?
the science of learning from data.
• what is design?
design refers to planning how to obtain data on a problem of interest.
• what is descriptive statistics?
summarizing and analyzing the data, thats obtained.
• what is inferencial statistics?
making decisions and predictions based on the data for answering statistical questions.
• what is probability?
a framework for quantifying how likely various outcomes are.
• the population is the set of all subjects of interest. the sample is the subjects of which data will be gathered.
• what are inferential statistics?
statistics obtained from methods of making decisions or predictions about a population, based on data obtained from a sample of that population.
• what is a (sample) statistic?
a numerical summary of a sample taken from the population.
• what is a parameter?
a numerical summary of the population.
• what is random sampling?
taking a sample of the population where each subject in the population has had the same chance of getting picked.
• Wat is een Modal category
That is the category with the highest frequency
• Wat is een mode?

For a quantitative variable, that is the nummerical value that occurs the most frequently.
voor een kwantitatieve variabele is dat nummerieke waarde die het vaakst voorkomt.
• ## 2 Exploring data with graphs and numerical statistics

• what is a variable?
any characteristic observed in a study
• what are observations?
the data values that we observe for a variable.
• when is a variable categorial and when is it quantitative?
it's categorial if the observations can be put in categories, and its's quantitative if the observations can be expressed in numbers.
• what is the key characteristic of a quantitive variable?
there has to be different magnitudes, or you need to be able to take an average of the variable.
• when is a quantitive variable dicrete, and when is it continuous?
it is descrete when it's possible values form a set of seperate numbers, it's continuous when the possible values form an interval.
• what is the modal category?
the category with the highest frequency.
• what is the mode?
the numerical value in a quantative variable that occurs the most.
• what is the proportion?
the frequency of a per category divided by total observations.
• proportions and percentages are relative frequencies
• what is a pareto chart?
a bar graph in which the categories are ordened by their frequency from the highest to the lowest.
• what is the pareto principle?
a small subset of categories often contains most of the observations.
• what is this? and what does it show
a dot plot. a dot plot shows the frequency of all observations of a variable.
• what is this and what does it show?
this is a stem-and-leaf plot. it shows the frequencies of the observation. 16 seconds was the highest frequent observation.
• what is a histogram?
a bar graph that shows the (relative) frequencies of the observations of a quantative variable.
• if the set of data is small which type of graph is usually preferred?
the stem-and-leaf plot or the dot plot is usually preferred.
• what is the distribution of data, or data distribution?
the values the variable take and the frequency of each value in a graph. data distribution is often a histogram.
• when is a distribution called unimodal, and when is it called bimodal?
when a distribution has two distinct mounds (dalparabool maar dan histogram) it is called bimodal. when it has one distribution it's called unimodal (bergparabool maar dan histogram)
• symmetric distribution is always unimodal.
• is this distribution skewed to the left or to the right.
this distribution is skewed to the left.
• what are the tails of a distribution?
the lowest, and highest values.
• what is a time series?
a data set collected over time.
• what is a time plot?
a graph of a time series.
• what is a trend?
a trend is a pattern in a tima plot, so increasing, or decreasing.
• what is the mean?
the centre of a distribution found by taking the average out of the observations, (gemiddelde nemen).
• what is the median?
the centre of distribution found by ordening the observations from small to large and then picking the middle value.
• what are the properties of the mean?
- the mean is the balance point of the data; make a line where the data is ordened from small to large, the mean would balance out this line.
- the mean can be highly influenced by an outlier.
- the mean is pulled to the longer tail in a skewed distribution.
• what is an outlier?
an observation that falls way out of line with the rest of the data.
• an extremely large value out in the right hand tail will pull the mean to the right.
• a symmetric distribution means mean = median
• a skewed to the right distribution means mean > median
• a skewed to the left distribution means mean < median
• what does the median being resistent to extreme observations mean?
that the median doesn't change because of extreme values.
• what is the mode?
the value that occurs most frequently.
• what is the range?
the difference between the largest and the smallest observation. largest - smallest = range
• what is the deviation of an observation?
the difference between the observation and the mean.
• what is the formula for deviation? x being observation
x-^x^
• the sum of the deviations always equals zero.
• what is the variance?
an average of the squeres of the deviation.
• what is the formula for the standard deviation?
√((∑(x-^x^)^2)/(n-1))﻿
This summary. +380.000 other summaries. A unique study tool. A rehearsal system for this summary. Studycoaching with videos.

Wat is een mode?

For a quantitative variable, that is the nummerical value that occurs the most frequently.
voor een kwantitatieve variabele is dat nummerieke waarde die het vaakst voorkomt.
Wat is een Modal category
That is the category with the highest frequency
what does the standardized residual show?
how many standard errors a residual falls from 0
what does the residual standard deviation show?
the variability of y for one value of x
how do you perform a kruskal wallis nonparametric test for comparing several groups?
1. assumptions
2. H0: identical population distributions for g groups.
3. H1: population distributions not identical
4. use kruskal wallis test statistic
5. p value from chi squared with df=g-1
6. conclusion
when would you use the kruskal-wallis test instead of anova f?
if the population distribution is not normal. when the sample size is not large.
what is the assumption for using the median to compare groups?
the population distributions for both groups have the same shape.
when is it advisable to summarize a group by median instead of mean?
when the response distribution of the groups may be skewed.
how do you get a two sided p value from a single z score?
founded p value times 2
what do we do if two participants tend to have equally good ranks?
we give both of them the average of those two ranks, so rank 2 and 3 become both 2.5