The third rudimentary model command in Mplus is BY or factor. Although statistically more complicated than the previous two, a factor simply generates a latent or unobserved variable through its prediction of observed variables. In other words, you are telling Mplus you have a variable that exists but cannot be measured directly (what is called a latent variable) and that you have some measurements of behaviour proposed to be caused by this latent variable (what are called observed or measured variables).
This is important to understand, so how about an example?
Consider the personality trait extraversion. People who are extraverted are considered outgoing and gregarious (McCrae & Costa, 1987). However, we cannot put someone’s extraversion on a bathroom scale and weigh it — nor can we pour it out of people into a test tube. Extraversion is simply a way of organizing and thinking about a common pattern of behaviours. In other words, extraversion is a latent variable and we must measure it by gathering observed variables representative of our idea of what an extraverted person is, how they behave, and the thoughts they commonly have.
In psychology, asking people questions about themselves and their behaviour is the most common form of measurement. It is no surprise that people tend to understand themselves better than anyone else (especially when it comes internal behaviours such as beliefs, attitudes, and emotions). When measuring extraversion, we can, for instance, ask people to rate the degree to which they consider themselves as talkative.
There are also other ways of gathering observed variables aside from self-report. We can hire coders to observe someone’s behaviour (e.g., code how frequently a participant approaches strangers to strike up a conversation), recruit people who know our participant (e.g., have peers rate how gregarious our participant is in general), and so forth into the realms of creativity.
Essentially, our model of reality is that the personality trait of extraversion (our latent variable) is causing specific patterns of behaviour, such as talkativeness, sociability, and gregariousness (the observed variables).
Visually, this is what it looks like:
And here is a generic syntax that would run this factor analysis:
Now, lets look at an example from a real dataset.
Here, participants were asked to think about themselves and rate the extent to which they agree with the following statements about their tendency to perspective-take (i.e., try to understand the world from another’s point of view):
- “I try to look at everybody’s side of a diagreement before I make a decision”
- “I sometimes try to understand my friends better by imagining how things look from their perspective”
- “I believe there are two sides to every question, and try to look at them both”
- “When I’m upset at someone, I usually try to put myself in his/her shoes for a while”
- “Before criticizing somebody, I try to imagine how I would feel if I were in their place”
Translating these items into Mplus and producing their factor results in the following syntax:
TITLE: Simple Confirmatory Factor Analysis; DATA: File is PT5.dat; VARIABLE: Names are PT1 PT2 PT3 PT4 PT5; Missing are all(-999); Usevariables = PT1 PT2 PT3 PT4 PT5; MODEL: PT by PT1 PT2 PT3 PT4 PT5; !Latent factor by observed factors OUTPUT: Standardized sampstat Modindices(all);
And produces the following output:
Mplus VERSION 7.4 (Mac) MUTHEN & MUTHEN 05/28/2017 10:01 AM INPUT INSTRUCTIONS TITLE: Simple Confirmatory Factor Analysis; DATA: File is PT5.dat; VARIABLE: Names are PT1 PT2 PT3 PT4 PT5; Missing are all(-999); Usevariables = PT1 PT2 PT3 PT4 PT5; MODEL: PT by PT1 PT2 PT3 PT4 PT5; !Latent factor by observed factors OUTPUT: Standardized sampstat Modindices(all); *** WARNING Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 8 1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS Simple Confirmatory Factor Analysis; SUMMARY OF ANALYSIS Number of groups 1 Number of observations 982 Number of dependent variables 5 Number of independent variables 0 Number of continuous latent variables 1 Observed dependent variables Continuous PT1 PT2 PT3 PT4 PT5 Continuous latent variables PT Estimator ML Information matrix OBSERVED Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20 Maximum number of iterations for H1 2000 Convergence criterion for H1 0.100D-03 Input data file(s) PT5.dat Input data format FREE SUMMARY OF DATA Number of missing data patterns 2 COVARIANCE COVERAGE OF DATA Minimum covariance coverage value 0.100 PROPORTION OF DATA PRESENT Covariance Coverage PT1 PT2 PT3 PT4 PT5 ________ ________ ________ ________ ________ PT1 1.000 PT2 0.998 0.998 PT3 1.000 0.998 1.000 PT4 1.000 0.998 1.000 1.000 PT5 1.000 0.998 1.000 1.000 1.000 SAMPLE STATISTICS ESTIMATED SAMPLE STATISTICS Means PT1 PT2 PT3 PT4 PT5 ________ ________ ________ ________ ________ 1 3.990 3.910 4.071 3.684 3.784 Covariances PT1 PT2 PT3 PT4 PT5 ________ ________ ________ ________ ________ PT1 0.829 PT2 0.559 0.881 PT3 0.496 0.520 0.734 PT4 0.539 0.633 0.481 1.051 PT5 0.574 0.607 0.512 0.671 0.996 Correlations PT1 PT2 PT3 PT4 PT5 ________ ________ ________ ________ ________ PT1 1.000 PT2 0.654 1.000 PT3 0.635 0.646 1.000 PT4 0.577 0.658 0.547 1.000 PT5 0.632 0.648 0.599 0.656 1.000 MAXIMUM LOG-LIKELIHOOD VALUE FOR THE UNRESTRICTED (H1) MODEL IS -5337.842 UNIVARIATE SAMPLE STATISTICS UNIVARIATE HIGHER-ORDER MOMENT DESCRIPTIVE STATISTICS Variable/ Mean/ Skewness/ Minimum/ % with Percentiles Sample Size Variance Kurtosis Maximum Min/Max 20%/60% 40%/80% Median PT1 3.990 -0.935 1.000 1.43% 3.000 4.000 4.000 982.000 0.829 0.761 5.000 30.35% 4.000 5.000 PT2 3.910 -0.797 1.000 1.63% 3.000 4.000 4.000 980.000 0.882 0.368 5.000 28.16% 4.000 5.000 PT3 4.071 -0.953 1.000 1.22% 3.000 4.000 4.000 982.000 0.734 1.095 5.000 33.20% 4.000 5.000 PT4 3.684 -0.746 1.000 4.07% 3.000 4.000 4.000 982.000 1.051 0.173 5.000 20.57% 4.000 5.000 PT5 3.784 -0.664 1.000 2.44% 3.000 4.000 4.000 982.000 0.996 0.001 5.000 25.46% 4.000 5.000 THE MODEL ESTIMATION TERMINATED NORMALLY MODEL FIT INFORMATION Number of Free Parameters 15 Loglikelihood H0 Value -5357.476 H1 Value -5337.842 Information Criteria Akaike (AIC) 10744.952 Bayesian (BIC) 10818.296 Sample-Size Adjusted BIC 10770.656 (n* = (n + 2) / 24) Chi-Square Test of Model Fit Value 39.268 Degrees of Freedom 5 P-Value 0.0000 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.084 90 Percent C.I. 0.060 0.109 Probability RMSEA <= .05 0.010 CFI/TLI CFI 0.987 TLI 0.974 Chi-Square Test of Model Fit for the Baseline Model Value 2686.639 Degrees of Freedom 10 P-Value 0.0000 SRMR (Standardized Root Mean Square Residual) Value 0.017 MODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-Value PT BY PT1 1.000 0.000 999.000 999.000 PT2 1.091 0.039 27.699 0.000 PT3 0.910 0.036 25.272 0.000 PT4 1.099 0.044 24.953 0.000 PT5 1.115 0.042 26.451 0.000 Intercepts PT1 3.990 0.029 137.334 0.000 PT2 3.909 0.030 130.471 0.000 PT3 4.071 0.027 148.892 0.000 PT4 3.684 0.033 112.616 0.000 PT5 3.784 0.032 118.810 0.000 Variances PT 0.515 0.036 14.217 0.000 Residual Variances PT1 0.314 0.018 17.719 0.000 PT2 0.268 0.017 15.958 0.000 PT3 0.308 0.017 18.414 0.000 PT4 0.429 0.024 18.207 0.000 PT5 0.357 0.021 17.217 0.000 STANDARDIZED MODEL RESULTS STDYX Standardization Two-Tailed Estimate S.E. Est./S.E. P-Value PT BY PT1 0.788 0.015 54.078 0.000 PT2 0.834 0.013 66.497 0.000 PT3 0.762 0.016 48.446 0.000 PT4 0.770 0.015 49.858 0.000 PT5 0.801 0.014 57.054 0.000 Intercepts PT1 4.383 0.104 42.175 0.000 PT2 4.165 0.099 41.945 0.000 PT3 4.751 0.112 42.475 0.000 PT4 3.594 0.087 41.239 0.000 PT5 3.791 0.091 41.522 0.000 Variances PT 1.000 0.000 999.000 999.000 Residual Variances PT1 0.379 0.023 16.487 0.000 PT2 0.304 0.021 14.555 0.000 PT3 0.419 0.024 17.462 0.000 PT4 0.408 0.024 17.170 0.000 PT5 0.358 0.023 15.907 0.000 STDY Standardization Two-Tailed Estimate S.E. Est./S.E. P-Value PT BY PT1 0.788 0.015 54.078 0.000 PT2 0.834 0.013 66.497 0.000 PT3 0.762 0.016 48.446 0.000 PT4 0.770 0.015 49.858 0.000 PT5 0.801 0.014 57.054 0.000 Intercepts PT1 4.383 0.104 42.175 0.000 PT2 4.165 0.099 41.945 0.000 PT3 4.751 0.112 42.475 0.000 PT4 3.594 0.087 41.239 0.000 PT5 3.791 0.091 41.522 0.000 Variances PT 1.000 0.000 999.000 999.000 Residual Variances PT1 0.379 0.023 16.487 0.000 PT2 0.304 0.021 14.555 0.000 PT3 0.419 0.024 17.462 0.000 PT4 0.408 0.024 17.170 0.000 PT5 0.358 0.023 15.907 0.000 STD Standardization Two-Tailed Estimate S.E. Est./S.E. P-Value PT BY PT1 0.718 0.025 28.434 0.000 PT2 0.783 0.025 30.927 0.000 PT3 0.653 0.024 27.098 0.000 PT4 0.789 0.029 27.454 0.000 PT5 0.800 0.027 29.100 0.000 Intercepts PT1 3.990 0.029 137.334 0.000 PT2 3.909 0.030 130.471 0.000 PT3 4.071 0.027 148.892 0.000 PT4 3.684 0.033 112.616 0.000 PT5 3.784 0.032 118.810 0.000 Variances PT 1.000 0.000 999.000 999.000 Residual Variances PT1 0.314 0.018 17.719 0.000 PT2 0.268 0.017 15.958 0.000 PT3 0.308 0.017 18.414 0.000 PT4 0.429 0.024 18.207 0.000 PT5 0.357 0.021 17.217 0.000 R-SQUARE Observed Two-Tailed Variable Estimate S.E. Est./S.E. P-Value PT1 0.621 0.023 27.039 0.000 PT2 0.696 0.021 33.249 0.000 PT3 0.581 0.024 24.223 0.000 PT4 0.592 0.024 24.929 0.000 PT5 0.642 0.023 28.527 0.000 QUALITY OF NUMERICAL RESULTS Condition Number for the Information Matrix 0.150E-01 (ratio of smallest to largest eigenvalue) MODEL MODIFICATION INDICES Minimum M.I. value for printing the modification index 10.000 M.I. E.P.C. Std E.P.C. StdYX E.P.C. ON Statements PT1 ON PT3 13.131 0.156 0.156 0.147 PT1 ON PT4 10.045 -0.117 -0.117 -0.131 PT3 ON PT1 13.131 0.153 0.153 0.162 PT3 ON PT4 14.998 -0.136 -0.136 -0.163 PT4 ON PT1 10.045 -0.159 -0.159 -0.141 PT4 ON PT3 14.998 -0.189 -0.189 -0.158 PT4 ON PT5 19.784 0.215 0.215 0.209 PT5 ON PT4 19.784 0.178 0.178 0.183 WITH Statements PT3 WITH PT1 13.131 0.048 0.048 0.154 PT4 WITH PT1 10.045 -0.050 -0.050 -0.136 PT4 WITH PT3 14.998 -0.058 -0.058 -0.160 PT5 WITH PT4 19.784 0.077 0.077 0.196 DIAGRAM INFORMATION Use View Diagram under the Diagram menu in the Mplus Editor to view the diagram. If running Mplus from the Mplus Diagrammer, the diagram opens automatically. Diagram output /Users/Granger/Google Drive/Website/Stats Resources/Mplus/Files for post/Rudimentary analyses in Beginning Time: 10:01:14 Ending Time: 10:01:14 Elapsed Time: 00:00:00 MUTHEN & MUTHEN 3463 Stoner Ave. Los Angeles, CA 90066 Tel: (310) 391-9971 Fax: (310) 391-8971 Web: www.StatModel.com Support: Support@StatModel.com Copyright (c) 1998-2015 Muthen & Muthen
There are two highlighted regions in the output that we want to pay particular attention to. The first region pertains to the Model Fit of our perspective-taking scale (i.e., how well our scale captures reality). Most researchers report the following fit indices: Chi-square test of model fit, CFI, RMSEA, and SRMR. What these mean is a whole other post, but here are the general “rules of thumb” (Hu & Bentler, 1999):
- Chi-square test of model fit: non-significant (or as small a value as possible — this fit index is unfortunately vulnerable to larger sample sizes, so people can often shrug off a signficant value with the right reference, e.g., Bentler, 1990)
- Comparative Fit Index (CFI): Equal to or greater than .95
- Root Mean Square Error of Approximation (RMSEA): Equal to or less than .06
- Standardized Root Mean Square Residual (SRMR): Equal to or less than .08
In the sample output, you can see that some fit indices meet or surpass our rules of thumb (including the CFI and SRMR) and some fit indices are edging on problematic (including the chi-square test of model fit and RMSEA). Messiness like this is very common in research but the general take-away here is that the scale is satisfactory but not great.
The second region we need to pay attention to is the Standardized Model Results, STDYX Standardization. Here we have what are called our factor loadings (or lambdas; under the Estimate column) which are kind of like correlations between the observed variables and the latent variable. In general, you want factor loadings no lower than .40, but higher is even better. In this example, our items are loading on the latent factor very well – which is a good sign!
Finally, if you happen to use Mplus Diagrammer instead of Mplus editor, Mplus will produce sweet diagrams such as this to help you visualize your factor analysis:
And that is about it for the basics of how to use and interpret the BY command! And now for some Mplus syntax humor: Good BY see you later;
References