ࡱ> fhe ;kbjbjSS .11c4,,,,,@@@8xD,@"          $5 0,0,,  E, ,    b$"[0s!"s!s!,,00s! : EDRM-612 ANOVA GLOSSARY Begun January 28, 2009 Completed (initial draft) March 17, 2009 One-way analysis: A simple design that tests to see if there is a difference between two groups. Only one factor is being considered. This is similar to a t-test. F is actually t2. F-test is used to determine of there is a significant difference between two or more groups or variables simultaneously. It is defined as the mean square between groups divided by the mean square within groups and is conceptually similar to differences in means between groups. This is not a factorial design since they must have two factors. You can have a one-way analysis of variance design with five groups or so, but we wont know which group or combination of groups is actually different. Assumptions: A thing that is accepted as true or sure to happen, without proof. 1) Homogeneity of variance (see homoscedasticity below) 2) Normality: the assumption that the dependent variables are normally distributed around their mean. Error is normally distributed within conditions. 3) Independent observations: we assume that the observations are independent of each other. That means that knowing one case of the independent variable will in no way tell us anything about another case of the independent variable. Another way to say this is that the following conditions must be present for a one-way ANOVA: 1) Each population must have the same variance. 2) The populations of interest must be normally distributed. 3) The samples must be independent of each other. Structural Model: structural modeling works this way: 1. You state the way that you believe the variables are inter-related, often with the use of a path diagram. 2. You work out, via some complex internal rules, what the implications of this are for the variances and covariances of the variables. 3. You test whether the variances and covariances fit this model of them. 4. Results of the statistical testing, and also parameter estimates and standard errors for the numerical coefficients in the linear equations are reported. 5. On the basis of this information, you decide whether the model seems like a good fit to your data. The structural model that underlies the analysis of variance is like this: 1. Xij represents the score of Personi in Conditionj. We let ( (mu) represent the mean of all subjects who could theoretically be run in the experiment, regardless of condition. The symbol (j represents the population mean of Conditionj, and and ( (tau) is the degree to which the mean of Conditionj deviates from the grand mean ((j = (j - (). Finally, (ij (epsilon) is the amount by which Personi in Conditionj deviates from the mean of his or her group ((ij = Xij - (j). Xij = ( + ((j - () + (ij = ( + (j + (ij Homoscedasticity (also referred to as homogeneity of variance): this is a basic assumption underlying ANOVA that each of the populations has the same variance. A condition of substantially equal variances in the dependent variable for the same values of the independent variable in the different populations being sampled and compared in a regression analysis or an ANOVA. Heteroscedasticity (also referred to as heteroscedasticity): the basic assumption that populations have different variances. This assumption can be a problem. A situation in which there are considerably unequal variances in the dependent variable for the same values of the independent variable in the different populations being sampled and compared in a regression analysis or an ANOVA. MSerror: mean square error. Also referred to as Mswithin. This is a measure of variation within each sample. This is calculated on each sample separately, so does not depend on the truth or falsity of the null hypothesis. MSwithin: also referred to as MSerror because it is the variance among samples within a factor. MStreatment: also referred to as Msbetween. It is a measure of the variation within each sample. Note about MSerror (MSW) and MStreatment (MSB): A large MSB variation, relative to the MSW variation, indicates that the sample means are not very close to one another. This condition will result in a large value of F, the calculated F-statistic. The larger the value of F, the more likely it will exceed the critical F-statistic, leading us to conclude there is a difference between population means. Treatment effect: in experiments, a treatment is what researchers do to subjects in the experimental group but not to those in the control group. A treatment is thus an independent variable. Treatment can be used broadly to mean almost any predictor variable. For example, in a study of the effect on the traffic accident rate of changing the speed limit, the speed limit would be the treatment. The treatment effect is the difference between the mean of treatmentj (or conditionj) ((j ) and the grand mean ((). Expected value: [E( )] is the long-range average of a statistic. The mean value of a variable in repeated samplings or trials. Sum of Squares: the sum of the squared deviations about the mean. Not to be confused with square of sums. Analysis of variance is in fact analysis of sums of squares or the result of adding together the squares of deviation scores. SStotal: (read sum of squares total) represents the sum of squares of all the observations, regardless of which treatment produced them. SStreat: the sum of squared deviations of the treatment means around the grand mean, multiplied by n to give us an estimate of the population variance. Also called the sums of squares between (SSB). SSerror: the sum over groups of the sums of squared dviation of scores around their groups mean. The truth or falsity of the null hypothesis is irrelevant to the calculations of the SSerror. Variation within each sample (SSW). dftotal: degrees of freedon are the number of values free to vary when computing a statistic; the number of pieces of information that can vary independently of one another. The df tells you how much data was used to calculate a particular statistic and is usually one less than the number of variables. The dftotal is always N-1 where N is the total number of observations. dftreat: the number of degrees of freedom between treatments is always k-1 where k is the number of treatments. dferror: the number of degrees of freedom is most easily thought of as what is left over and is obtained by subtracting dftreat from dftotal. However, dferror can can be calculated more directly as the sum of the degrees of freedom within each treatment. F statistic: F is obtained by dividing MStreat by MSerror. This is the ratio of explained to unexplained variance in an ANOVA; that is, the ratio of the between-group variance (treatment) to the within-group variance (error). To interprest the F ratio, you need to consult a table of F values for a particular level of statistical significance at the number of degrees of freedom in your study. Named after R.A. Fisher, the inventor of analysis of variance. Balanced design: an experimenal design for collecting the same number of observations in each treatment. Most experiments are originally designed this way. This term also refers to a randomized-blocks design in which every treatment appears in each block the same number of times. Logarithmic transformation: useful whenever the standard deviation is proportional to the mean. Also useful when the data are markedly positively skewed. Positively skewed distributions tend toward symmetry under logarithmic transformations. If we take the numbers 10, 100, and 1000, their logs are 1, 2, and 3. Thus, the distance between 10 and 100, in log units, is now equivalent to the distance between 100 and 1000. In other words, the right side of the distribution (more positive values) will be compressed more than will the left side by taking logarithms. Squared Root transformation: When your data is in the form of counts of something, you can sometimes stabilize variance and descrease skewness by comparing the mean to the variance instead of the standard deviation. Use a square-root transformation and see how it affects the data. (*not completely sure I understand this) Heavy-tailed distribution: a relatively flat distribution that has an unusual number of observations in the tails. Winsorized samples: these are closely related to trimmed samples (a sample from which a fixed percentage of the extreme values in each tail have been removed). But in winsorized samples, the most trimmed values are replaced by the most extreme value remaining in each tail. Fixed-model analysis of variance: you will use the same levels chosen from the possible samples when replicating the study. Random-model analysis of variance: you would use random levels chosen from the possible samples when replicating the study. Magnitude of experimental effect: a significant F in an analysis of variance simply tells us that there are differences among the means of the treatments that cannot be attributed to error. Magnitude of experimental effect measure the importance of the difference. There are several measures of the magnitude of experiemental effect. Eta-squared is one of the oldest ((2). It is sometimes called the correlation ration. An alternative and often better method of assessing the magnitude of the experimental effect is omega-squared ((2). Noncentrality parameter: The noncentrality parameter simply displaces the F distribution in a positive direction away from one, with the amount of displacement depending on the true differences among the population means. (*I dont really understand this.) Familywise error rate (FW): this is the probability that a Type I error (the probability of rejecting a null hypothesis when it is true) has been committed in one of the comparisons in research involving multiple comparisons. In this context, family means group or set of related statistical tests. Orthogonal contrasts: orthogonal means intersecting or lying at right angles (or more specifically, not correlated). Trimmed means: a measure of central tendancy that allows the researcher to deal separately with a distributions outliers. It is a mean computed without the extreme observations. For example, an 80% trimmed mean would be the mean calculated using only the central 80% of the values in the distribution; the high and low 10% would be eliminated (trimmed). (trimmed samples, trimmed statistics) Factor: an independent variable in an analysis of variance. Factor level: the individual treatment conditions that make up a factor are called levels of the factor. Factorial design: this is a design (not a statistical procedure) that tests every level of every factor with every level of every other factorall combinations and all levels of independent variable (2 or more factors studied at two or more levels). The goal of factorial design is to determine whether the factors combine to produce interaction effects. Factorial designs are labeled by the number of factors that are involved in the study. The simplest example of this design is a 2x2 design which must have at least two factors, and each factor must have at least two levels. The simplest factorial design will produce at least four groups (2 x 2 = 4). Advantage one: factorial designs allow greater generalizability of the results Advantage two: they allow us to look at the interaction of variables. Interaction effects are some of the most interesting results. Advantage three: Economy. The effects of one variable are averaged across the levels of the other variable. So fewer participants are needed to gain the same degree of power of two one ways. Interaction of variables: interaction exists when there is a significant differential effect across the areas of interest. In other words, when we plot the mean scores for at least two groups, with two or more variables, the relationship between these variables is seen by lines that are significantly not parallel. Interaction effects occur when independent variables act in combination on dependent variables. Or, put another way, interaction effects occur when the relation between two variables differs depending on the value of a third variable. First order interaction: when two variables interact, this is called a first-order interaction. Second-order interaction: when three variables interact. Main effect: major variables Is that of a factor ignoring the other factor. If we say that tasks that involve more processing lead to better recall, we are speaking of a main effect. *When a factor or an independent variable has a significant effect on the outcome variable; in other words, *when analysis of the data reveals a difference between the levels of any factor. Simple effect: when we are looking at the effect of one factor for the data at only one level of the other factor---the effect of one factor at one level of the other factor. The analysis of simple effects can be an important technique for analyzing data that contains significant interactions. It allows us to tease apart interactions. Mixed-model design: Consists of one or more fixed variables (the levels of that variable are selected) and one or more random variables (the levels are obtained by random sampling). Sampling fraction: the size of a sample as a percentage of the population from which it was drawn; the ratio of sample size to population size. (In mixed-model design: the ratio of the number of levels of a given variable that actually are used to the potential number of levels that could have been used.) Partial effect: both eta-squared and omega-squared represent the size of an effect (SSeffect) relative to the total variability in the experiement (SStotal). Unbalanced design: said of factorial designs when there are unequal numbers of observations for different factors, or when the cells contain unequal numbers of subjects (unequal sample sizes). Also called nonorthogonal factorial design. Repeated-measures design: (ANOVA) analysis of variance in which subjects are measured more than once to determine whether statistically significant change has occurred from the pretest to the posttest. This design can also be called: within-subject ANOVA; treatments-by-subjects ANOVA, and randomized-blocks ANOVA. A research design in which subjects are measured two or more times on the dependent variable. Rather than using different subjects for each level of treatment, the subjects are given more than one treatment and are measured for each. This means that each subject will be its own control. The advantage of these designs is that they allow us to reduce overall variability by using a common subject pool for all treatments, while allowing us to remove subject differences from our error term, leaving the error components independent from treatment to treatment or cell to cell. Covariance matrix: Covariance is the measure of the joint or (co-) variance of two or more variables. A covariance matrix represents the covariances among the treatments. On the main diagonal of the matrix are the variances within each treatment. The off-diagonal elements represent the covariances among the treatments. It is assumed that there will be compound symmetry of the covariance matrix. This represents a sufficient condition underlying a repeated-measure analysis of variance. Intraclass correlation: a way to account for differences and reliability in repeated-measures testing. Sequence effects or carryover effects: lingering effects of an earlier experimental treatment that combine with the effects of a later treatment in a way that makes it difficult to assess the unique effects of the later treatment. In certain studies, carryover effects are desirable. In learning studies, for example, the basic data represent what is carried over from one trial to another. In most situations, however, carryover effects are considered a nuisancesomething to be avoided. Randomized blocks design: A research design in which subjects are matched on a variable the researcher wishes to control. The subjects are put into groups (blocks) of the same size as the number of treatments. The members of each block are assigned randomly to different treatment groups. Collinearity: when a variable is highly correlated with several of the other predictors, and perhaps does not have much that is unique to itself. When those other predictors are included, the collinear predictor will have little to offer in explaining variability. The extent to which independent (or predictor) variables in a gression analysis are correlated with one another. Collinearity causes problems in analysis because it makes it difficult to study separate effects of independent variables. The main difference between collinearity and multicollinearity in most usages is two syllables (. Regression coefficient: a number indicating the values of a dependent variable associated with the values of an independent variable or variables. A regression coefficient is part of a regression equation. A standardized regression coefficient (one expressed in z-scores) is symbolized by the Greek letter beta ((); an unstandardized regression coefficient is symbolized by a lowercase b. For example, if we were studying the relationship between education (independent variable) and annual income (dependent variable) and we found that for every year of education beyond the 10th grade, the expected annual income went up by $1,200, the (unstandardized) regression coefficient would be $1,200. Standardized regression coefficient: a statistic that provides a way to compare the relative importance of different variables in a multiple regression analysis. Often symbolized as beta and called the beta weight or beta coefficient (not to be confused with the beta used to symbolize Type II error). Also called standard partial regression coefficient (the term partial indicates that the effects of other variables have been held constant). The unstandardized coefficient b is an asymmetric measure; beta is symmetric. Residual variance: error that is still involved in the prediction of Y after all predictors have been taken into account. It is given as the error term in the ANOVA summary table (error variance, or within group variance). Multiple correlation coefficient: a multiple correlation is one with more than two variables, one of which is dependent, the others independent. The object is to measure the combined influence of two or more independent variables on a dependent variable. R is the symbol for a multiple correlation coefficient. It is defined as the correlation between the criterion (Y) and the best linear combination of the predictors. R2 gives the proportion of the variance in the dependent variable that can be explained by the action of all the independent variables taken together. Semipartial correlation: the correlation between the criterion and a partialled predictor variable. A semipartial correlation has a single variable partialled out of only one predictor. The correlation of Y with that part of X1 that is independent of X2. A correlation that partials out (controls for) a variable, but only from one of the other variables being correlated. Also called part correlation. (Partial correlation partials out a variable from all other variables.) The formal notation may be of help in grasping this tricky notion. The correlation expressed r1 (2.3) means the correlation between variables 1 and 2 after 3 has been partialed out from 2, but not from 1. Multivariate outliers: outliers that are not evident when looking at the variables individually, but which become evident when scores of several variables are combined. For instance, it is not unusal to have people 6 feet tall, nor to have people weighing 125 lbs., however, it is unusual to have 6 feet tall people weighing 125 lbs. Studentized residual: a t-test on the magnitude of the residuals (Rstudent)error variance or within-group variance. Variance inflation factor (VIF): the reciprocol of tolerance. It refers to the degree to which the standard error of bj is increased because Xj is correlated with the other predictors. The higher the standard error of bj, the more that coefficient will fluctuate from sample to sample and the less confidence we can have in the particular value that we obtained. Since we want stable regression coefficients, we also want variables with low VIFs, or high tolerance. It is often worth eliminating a redundant variable from the model to achieve that goal. Tolerance: the degree to which one predictor can itself be predicted by the other predictors in the model. First, tolerance tells us the degree of overlap among the predictors, helping us see which predictors have information in common and which are relatively independent. Second, it alerts us to the potential problems of instability in our model. With very low levels of telerance, the stability of the model and sometimes even the accuracy of the arithmetic can be in danger. An allowable margin of error in measurement. In multiple regression analysis, the tolerance is the proportion of the variability in one independent variable not explained by the other independent variables. The bigger the tolerance, the better the analysis; the smaller the tolerance, the higher the collinearity. Stepwise regression: more or less the reverse of the backward elimination method. A technique for calculating a regression equation that instructs a computer to find the best equation by entering independent variables in various combinations and orders. Stepwise regression combines the methods of backward elimination and forward selection. The variables are in turn subject first to the inclusion criteria of forward selection and then to the exclusion procedures of backward elimination. Variables are selected and eliminated until there are none left that meet the criteria for removal. Stepwise regression is often contrasted with hierarchical regression analysis, in which the researcher, not the computer program, determines the order of the variables in the regression equation. Cross-validation a concept that proves to be the stumbling block for most multiple regression studies. This is done against an independent data set. For instance, we break our data into two or more data sets and derive a regression equation for the first set. We then apply the regression coefficients obtained from that sample against the data in the other sample to obtain predicted values of Y on a cross-validation sample. Our interest then focuses on the question of the relationship between Y and Ycv in the new subsample. Multicollinearity: in multiple regression analysis, multicollinearity exists when two or more independent variables are highly correlated; this makes it difficult if not impossible to determine their separate effects on the dependent variable. Also called collinearity (see above). Mediating relationship: some variable mediates the relationship between two other variables. This mediating variable can also be called an intervening variable; that is, a variable that transmits the effects of another variable. For example, parents transmit their social status to their children directly, but they also do so indirectly, through education, as in the following diagram, in which childs education is the mediating variable. (parents status ( childs education ( childs status) Moderating relationship: situation in which the relationship between the independent and dependent variables changes as a function of the level of a third variable (the moderator). The moderator variable influences (moderates) the relation between two other variables and thus produces an interaction effect. Full model: a factorial design in which we can know only the amount of variation that can be accounted for by all the predictors simultaneously. Reduced model: a factorial design in which the variation can be partitioned among A, B and AB. Analysis of covariance: an extension of ANOVA that provides a way of statistically controlling the (linear) effects of variables one does not want to examine in a study. These extraneous variables are called covariates, or control variables. (Covariates should be measured on an interval or ratio scale). ANCOVA allows you to remove covariates from the list of possible explanations of variance in the dependent variable. ANCOVA does this by using statistical techniques (such as regression to partial out the effects of covariates) rather than direct experimental methods to control extraneous variables. ANCOVA is used in experimental studies when researchers want to remove the effects of some antecedent variable. For example, pretest scores are used as covariates in pretest-posttest experimental designs. ANCOVA is also used in non-experimental research, such as surveys of nonrandom samples, or in quasi-experiments when subjects cannot be assigned randomly to control and experimental groups. Although fairly common, the use of ANCOVA for non-experimental research is controversial. Covariate: a variable that a researcher seeks to control for (statistically subtract the effects of) by using such techniques as multiple regression analysis (MRA) or analysis of covariance (ANCOVA). Also called concomitant variable.     PAGE  PAGE 1 Kathy Beagles  WXYi     Q R U # % k m  !"'(01256:;榟據挚據 jthW[ jmhW[ hW[H* hW[6H* hW[6h7hW[5hjwhW[5hvhW[5 hhW[hhW[5 hohW[ hW[H*hQphW[5hW[ hW[5hzv+hW[5hzv+h2%57 /XY'  R # k D=, d`gdW[dgdW[$a$gdW[;GHKTUrt45lm2>?&'89´¬ hbOKhW[h^'hW[5 hhW[ hhW[ hW[5h7hW[5 h^'hW[ jthW[ jmhW[ hW[6H* hW[6 hW[H* jehW[hW[Am 2'0%I !#$&(()**,-.dgdW[,-/0?AB$%-.    ! " H I K L O P Q R 1!3!9!>!ַাַhq@hW[5H* hq@hW[ hW[6H*h^'hW[5H* hIIhW[ hChW[ hW[6 hW[5h^'hW[5 hHhW[ jmhW[hW[ hW[H*A>!!!!!!!!!!!!!!!!""""###$$$$&&&&((1(2(((())))**;*<*****++,,,,,,,,,,--..T/b/c/ﳺ h'EhW[ jwhW[ hW[H* jhhW[ hsehW[ hihW[ hW[5 hjwhW[h^'hW[5 hhW[hW[ hW[6H* hW[6D.T/0114d4457/8h88#995;;==>? A,BmCD}DgFGHdgdW[c/0001&1114"4d4r44455577/8G8h8s8995;G;;;=-=u={======>>,B=BD,D.D|D}DDDDDfFgFFFGGGGI h3hW[ h|>hW[ hW[5h 4hW[5hDjhW[5 hKhW[ hW[H*hKhW[5hhW[5 h@hW[ hW[6h@hW[6h3}hW[5 h? hW[hW[;IIIIIIKKcKeK$L&LLLLLxNyNNNNNNNOOOOPPPPPP-Q.Q/QQQQQRRRRRRRRTT TpTqTTTUUUU4V5Vծբբբ h)PhW[ hW[H* hYhW[hYhW[H*hYhW[>*h0JmHnHu hW[0JjhW[0JUh|?jh|?UhW[ hW[5h 4hW[5 h%hW[ k k kkkkkkkk(k8k9k:k;kdgdW[h]hgdW[ &`#$gdW[,1h/ =!"#$% ^ 2 0@P`p2( 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p8XV~_HmH nH sH tH @`@ NormalCJ_HaJmH sH tH DA`D Default Paragraph FontRiR  Table Normal4 l4a (k (No List H@H zv+ Balloon TextCJOJQJ^JaJ44 zjHeader  !6o6 zj Header CharCJaJ4 @"4 zjFooter  !6o16 zj Footer CharCJaJ.)@A. zj Page NumberPK![Content_Types].xmlN0EH-J@%ǎǢ|ș$زULTB l,3;rØJB+$G]7O٭V$ !)O^rC$y@/yH*񄴽)޵߻UDb`}"qۋJחX^)I`nEp)liV[]1M<OP6r=zgbIguSebORD۫qu gZo~ٺlAplxpT0+[}`jzAV2Fi@qv֬5\|ʜ̭NleXdsjcs7f W+Ն7`g ȘJj|h(KD- dXiJ؇(x$( :;˹! I_TS 1?E??ZBΪmU/?~xY'y5g&΋/ɋ>GMGeD3Vq%'#q$8K)fw9:ĵ x}rxwr:\TZaG*y8IjbRc|XŻǿI u3KGnD1NIBs RuK>V.EL+M2#'fi ~V vl{u8zH *:(W☕ ~JTe\O*tHGHY}KNP*ݾ˦TѼ9/#A7qZ$*c?qUnwN%Oi4 =3N)cbJ uV4(Tn 7_?m-ٛ{UBwznʜ"Z xJZp; {/<P;,)''KQk5qpN8KGbe Sd̛\17 pa>SR! 3K4'+rzQ TTIIvt]Kc⫲K#v5+|D~O@%\w_nN[L9KqgVhn R!y+Un;*&/HrT >>\ t=.Tġ S; Z~!P9giCڧ!# B,;X=ۻ,I2UWV9$lk=Aj;{AP79|s*Y;̠[MCۿhf]o{oY=1kyVV5E8Vk+֜\80X4D)!!?*|fv u"xA@T_q64)kڬuV7 t '%;i9s9x,ڎ-45xd8?ǘd/Y|t &LILJ`& -Gt/PK! ѐ'theme/theme/_rels/themeManager.xml.relsM 0wooӺ&݈Э5 6?$Q ,.aic21h:qm@RN;d`o7gK(M&$R(.1r'JЊT8V"AȻHu}|$b{P8g/]QAsم(#L[PK-![Content_Types].xmlPK-!֧6 0_rels/.relsPK-!kytheme/theme/themeManager.xmlPK-!0C)theme/theme/theme1.xmlPK-! ѐ' theme/theme/_rels/themeManager.xml.relsPK] ;c 3336;>!c/I5Vj;k68:;=>@A.H k;k79<?B  6!!8@0(  B S  ?./89ABHINOQSWY AAVVXXYY[[\\\\\\\\]]]] ] ]]]]]]]]]!]"](])]+],]5]6]:];]>]?]A]B]K]L]O]P]R]S]V]W]Z][]e]f]t]v]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]bbccc c c ccccccc'c7cc@;c@UnknownG*Ax Times New Roman5Symbol3. *Cx Arial;Wingdings5. .[`)Tahoma?= *Cx Courier NewA$BCambria Math"hAGAGf?T2?T24bb2HX ?~t2!xx ,ANOVA GLOSSARYSchool of EducationLea M Danihelova Oh+'0d    , 8DLT\ANOVA GLOSSARYSchool of EducationNormalLea M Danihelova2Microsoft Office Word@F#@k@@$@$?T՜.+,0 hp  Ķvlog Unversity2b ANOVA GLOSSARY Title  !"#$%&'()*+,-./0123456789:;<=>?@ABCEFGHIJKLMNOPQRSTVWXYZ[\^_`abcdgRoot Entry Fp$i1TableD!WordDocument.SummaryInformation(UDocumentSummaryInformation8]CompObjr  F Microsoft Word 97-2003 Document MSWordDocWord.Document.89q