Rudi[M]entary Model Commands in Mplus – part 3: BY

The third rudimentary model command in Mplus is BY or factor. Although statistically more complicated than the previous two, a factor simply generates a latent or unobserved variable through its prediction of observed variables. In other words, you are telling Mplus you have a variable that exists but cannot be measured directly (what is called a latent variable) and that you have some measurements of behaviour proposed to be caused by this latent variable (what are called observed or measured variables).

This is important to understand, so how about an example?

Consider the personality trait extraversion. People who are extraverted are considered outgoing and gregarious (McCrae & Costa, 1987). However, we cannot put someone’s extraversion on a bathroom scale and weigh it — nor can we pour it out of people into a test tube. Extraversion is simply a way of organizing and thinking about a common pattern of behaviours. In other words, extraversion is a latent variable and we must measure it by gathering observed variables representative of our idea of what an extraverted person is, how they behave, and the thoughts they commonly have.

In psychology, asking people questions about themselves and their behaviour is the most common form of measurement. It is no surprise that people tend to understand themselves better than anyone else (especially when it comes internal behaviours such as beliefs, attitudes, and emotions). When measuring extraversion, we can, for instance, ask people to rate the degree to which they consider themselves as talkative.

There are also other ways of gathering observed variables aside from self-report. We can hire coders to observe someone’s behaviour (e.g., code how frequently a participant approaches strangers to strike up a conversation), recruit people who know our participant (e.g., have peers rate how gregarious our participant is in general), and so forth into the realms of creativity.

Essentially, our model of reality is that the personality trait of extraversion (our latent variable) is causing specific patterns of behaviour, such as talkativeness, sociability, and gregariousness (the observed variables).

Visually, this is what it looks like:

factor example

And here is a generic syntax that would run this factor analysis:

Screen Shot 2017-04-29 at 2.26.51 PM

Now, lets look at an example from a real dataset.

Here, participants were asked to think about themselves and rate the extent to which they agree with the following statements about their tendency to perspective-take (i.e., try to understand the world from another’s point of view):

  1. “I try to look at everybody’s side of a diagreement before I make a decision”
  2. “I sometimes try to understand my friends better by imagining how things look from their perspective”
  3. “I believe there are two sides to every question, and try to look at them both”
  4. “When I’m upset at someone, I usually try to put myself in his/her shoes for a while”
  5. “Before criticizing somebody, I try to imagine how I would feel if I were in their place”

Translating these items into Mplus and producing their factor results in the following syntax:


TITLE:
	Simple Confirmatory Factor Analysis;

DATA:
	File is PT5.dat;

VARIABLE:
	Names are PT1 PT2 PT3 PT4 PT5;
	Missing are all(-999);
	Usevariables = PT1 PT2 PT3 PT4 PT5;

MODEL:
	PT by PT1 PT2 PT3 PT4 PT5;
	!Latent factor by observed factors

OUTPUT:
	Standardized sampstat Modindices(all);

And produces the following output:


Mplus VERSION 7.4 (Mac)
MUTHEN & MUTHEN
05/28/2017  10:01 AM

INPUT INSTRUCTIONS

  TITLE:
  	Simple Confirmatory Factor Analysis;

  DATA:
  	File is PT5.dat;

  VARIABLE:
  	Names are PT1 PT2 PT3 PT4 PT5;
  	Missing are all(-999);
  	Usevariables = PT1 PT2 PT3 PT4 PT5;

  MODEL:
  	PT by PT1 PT2 PT3 PT4 PT5;
  	!Latent factor by observed factors

  OUTPUT:
  	Standardized sampstat Modindices(all);

*** WARNING
  Data set contains cases with missing on all variables.
  These cases were not included in the analysis.
  Number of cases with missing on all variables:  8
   1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS

Simple Confirmatory Factor Analysis;

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                         982

Number of dependent variables                                    5
Number of independent variables                                  0
Number of continuous latent variables                            1

Observed dependent variables

  Continuous
   PT1         PT2         PT3         PT4         PT5

Continuous latent variables
   PT

Estimator                                                       ML
Information matrix                                        OBSERVED
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20
Maximum number of iterations for H1                           2000
Convergence criterion for H1                             0.100D-03

Input data file(s)
  PT5.dat

Input data format  FREE

SUMMARY OF DATA

     Number of missing data patterns             2

COVARIANCE COVERAGE OF DATA

Minimum covariance coverage value   0.100

     PROPORTION OF DATA PRESENT

           Covariance Coverage
              PT1           PT2           PT3           PT4           PT5
              ________      ________      ________      ________      ________
 PT1            1.000
 PT2            0.998         0.998
 PT3            1.000         0.998         1.000
 PT4            1.000         0.998         1.000         1.000
 PT5            1.000         0.998         1.000         1.000         1.000

SAMPLE STATISTICS

     ESTIMATED SAMPLE STATISTICS

           Means
              PT1           PT2           PT3           PT4           PT5
              ________      ________      ________      ________      ________
      1         3.990         3.910         4.071         3.684         3.784

           Covariances
              PT1           PT2           PT3           PT4           PT5
              ________      ________      ________      ________      ________
 PT1            0.829
 PT2            0.559         0.881
 PT3            0.496         0.520         0.734
 PT4            0.539         0.633         0.481         1.051
 PT5            0.574         0.607         0.512         0.671         0.996

           Correlations
              PT1           PT2           PT3           PT4           PT5
              ________      ________      ________      ________      ________
 PT1            1.000
 PT2            0.654         1.000
 PT3            0.635         0.646         1.000
 PT4            0.577         0.658         0.547         1.000
 PT5            0.632         0.648         0.599         0.656         1.000

     MAXIMUM LOG-LIKELIHOOD VALUE FOR THE UNRESTRICTED (H1) MODEL IS -5337.842

UNIVARIATE SAMPLE STATISTICS

     UNIVARIATE HIGHER-ORDER MOMENT DESCRIPTIVE STATISTICS

         Variable/         Mean/     Skewness/   Minimum/ % with                Percentiles
        Sample Size      Variance    Kurtosis    Maximum  Min/Max      20%/60%    40%/80%    Median

     PT1                   3.990      -0.935       1.000    1.43%       3.000      4.000      4.000
             982.000       0.829       0.761       5.000   30.35%       4.000      5.000
     PT2                   3.910      -0.797       1.000    1.63%       3.000      4.000      4.000
             980.000       0.882       0.368       5.000   28.16%       4.000      5.000
     PT3                   4.071      -0.953       1.000    1.22%       3.000      4.000      4.000
             982.000       0.734       1.095       5.000   33.20%       4.000      5.000
     PT4                   3.684      -0.746       1.000    4.07%       3.000      4.000      4.000
             982.000       1.051       0.173       5.000   20.57%       4.000      5.000
     PT5                   3.784      -0.664       1.000    2.44%       3.000      4.000      4.000
             982.000       0.996       0.001       5.000   25.46%       4.000      5.000

THE MODEL ESTIMATION TERMINATED NORMALLY

MODEL FIT INFORMATION

Number of Free Parameters                       15

Loglikelihood

          H0 Value                       -5357.476
          H1 Value                       -5337.842

Information Criteria

          Akaike (AIC)                   10744.952
          Bayesian (BIC)                 10818.296
          Sample-Size Adjusted BIC       10770.656
            (n* = (n + 2) / 24)

Chi-Square Test of Model Fit

          Value                             39.268
          Degrees of Freedom                     5
          P-Value                           0.0000

RMSEA (Root Mean Square Error Of Approximation)

          Estimate                           0.084
          90 Percent C.I.                    0.060  0.109
          Probability RMSEA <= .05           0.010

CFI/TLI

          CFI                                0.987
          TLI                                0.974

Chi-Square Test of Model Fit for the Baseline Model

          Value                           2686.639
          Degrees of Freedom                    10
          P-Value                           0.0000

SRMR (Standardized Root Mean Square Residual)

          Value                              0.017

MODEL RESULTS

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 PT       BY
    PT1                1.000      0.000    999.000    999.000
    PT2                1.091      0.039     27.699      0.000
    PT3                0.910      0.036     25.272      0.000
    PT4                1.099      0.044     24.953      0.000
    PT5                1.115      0.042     26.451      0.000

 Intercepts
    PT1                3.990      0.029    137.334      0.000
    PT2                3.909      0.030    130.471      0.000
    PT3                4.071      0.027    148.892      0.000
    PT4                3.684      0.033    112.616      0.000
    PT5                3.784      0.032    118.810      0.000

 Variances
    PT                 0.515      0.036     14.217      0.000

 Residual Variances
    PT1                0.314      0.018     17.719      0.000
    PT2                0.268      0.017     15.958      0.000
    PT3                0.308      0.017     18.414      0.000
    PT4                0.429      0.024     18.207      0.000
    PT5                0.357      0.021     17.217      0.000

STANDARDIZED MODEL RESULTS

STDYX Standardization

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 PT       BY
    PT1                0.788      0.015     54.078      0.000
    PT2                0.834      0.013     66.497      0.000
    PT3                0.762      0.016     48.446      0.000
    PT4                0.770      0.015     49.858      0.000
    PT5                0.801      0.014     57.054      0.000

 Intercepts
    PT1                4.383      0.104     42.175      0.000
    PT2                4.165      0.099     41.945      0.000
    PT3                4.751      0.112     42.475      0.000
    PT4                3.594      0.087     41.239      0.000
    PT5                3.791      0.091     41.522      0.000

 Variances
    PT                 1.000      0.000    999.000    999.000

 Residual Variances
    PT1                0.379      0.023     16.487      0.000
    PT2                0.304      0.021     14.555      0.000
    PT3                0.419      0.024     17.462      0.000
    PT4                0.408      0.024     17.170      0.000
    PT5                0.358      0.023     15.907      0.000

STDY Standardization

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 PT       BY
    PT1                0.788      0.015     54.078      0.000
    PT2                0.834      0.013     66.497      0.000
    PT3                0.762      0.016     48.446      0.000
    PT4                0.770      0.015     49.858      0.000
    PT5                0.801      0.014     57.054      0.000

 Intercepts
    PT1                4.383      0.104     42.175      0.000
    PT2                4.165      0.099     41.945      0.000
    PT3                4.751      0.112     42.475      0.000
    PT4                3.594      0.087     41.239      0.000
    PT5                3.791      0.091     41.522      0.000

 Variances
    PT                 1.000      0.000    999.000    999.000

 Residual Variances
    PT1                0.379      0.023     16.487      0.000
    PT2                0.304      0.021     14.555      0.000
    PT3                0.419      0.024     17.462      0.000
    PT4                0.408      0.024     17.170      0.000
    PT5                0.358      0.023     15.907      0.000

STD Standardization

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 PT       BY
    PT1                0.718      0.025     28.434      0.000
    PT2                0.783      0.025     30.927      0.000
    PT3                0.653      0.024     27.098      0.000
    PT4                0.789      0.029     27.454      0.000
    PT5                0.800      0.027     29.100      0.000

 Intercepts
    PT1                3.990      0.029    137.334      0.000
    PT2                3.909      0.030    130.471      0.000
    PT3                4.071      0.027    148.892      0.000
    PT4                3.684      0.033    112.616      0.000
    PT5                3.784      0.032    118.810      0.000

 Variances
    PT                 1.000      0.000    999.000    999.000

 Residual Variances
    PT1                0.314      0.018     17.719      0.000
    PT2                0.268      0.017     15.958      0.000
    PT3                0.308      0.017     18.414      0.000
    PT4                0.429      0.024     18.207      0.000
    PT5                0.357      0.021     17.217      0.000

R-SQUARE

    Observed                                        Two-Tailed
    Variable        Estimate       S.E.  Est./S.E.    P-Value

    PT1                0.621      0.023     27.039      0.000
    PT2                0.696      0.021     33.249      0.000
    PT3                0.581      0.024     24.223      0.000
    PT4                0.592      0.024     24.929      0.000
    PT5                0.642      0.023     28.527      0.000

QUALITY OF NUMERICAL RESULTS

     Condition Number for the Information Matrix              0.150E-01
       (ratio of smallest to largest eigenvalue)

MODEL MODIFICATION INDICES

Minimum M.I. value for printing the modification index    10.000

                                   M.I.     E.P.C.  Std E.P.C.  StdYX E.P.C.

ON Statements

PT1      ON PT3                   13.131     0.156      0.156        0.147
PT1      ON PT4                   10.045    -0.117     -0.117       -0.131
PT3      ON PT1                   13.131     0.153      0.153        0.162
PT3      ON PT4                   14.998    -0.136     -0.136       -0.163
PT4      ON PT1                   10.045    -0.159     -0.159       -0.141
PT4      ON PT3                   14.998    -0.189     -0.189       -0.158
PT4      ON PT5                   19.784     0.215      0.215        0.209
PT5      ON PT4                   19.784     0.178      0.178        0.183

WITH Statements

PT3      WITH PT1                 13.131     0.048      0.048        0.154
PT4      WITH PT1                 10.045    -0.050     -0.050       -0.136
PT4      WITH PT3                 14.998    -0.058     -0.058       -0.160
PT5      WITH PT4                 19.784     0.077      0.077        0.196

DIAGRAM INFORMATION

  Use View Diagram under the Diagram menu in the Mplus Editor to view the diagram.
  If running Mplus from the Mplus Diagrammer, the diagram opens automatically.

  Diagram output
    /Users/Granger/Google Drive/Website/Stats Resources/Mplus/Files for post/Rudimentary analyses in

     Beginning Time:  10:01:14
        Ending Time:  10:01:14
       Elapsed Time:  00:00:00

MUTHEN & MUTHEN
3463 Stoner Ave.
Los Angeles, CA  90066

Tel: (310) 391-9971
Fax: (310) 391-8971
Web: www.StatModel.com
Support: Support@StatModel.com

Copyright (c) 1998-2015 Muthen & Muthen

 

There are two highlighted regions in the output that we want to pay particular attention to. The first region pertains to the Model Fit of our perspective-taking scale (i.e., how well our scale captures reality). Most researchers report the following fit indices: Chi-square test of model fit, CFI, RMSEA, and SRMR. What these mean is a whole other post, but here are the general “rules of thumb” (Hu & Bentler, 1999):

  • Chi-square test of model fit: non-significant (or as small a value as possible — this fit index is unfortunately vulnerable to larger sample sizes, so people can often shrug off a signficant value with the right reference, e.g., Bentler, 1990)
  • Comparative Fit Index (CFI): Equal to or greater than .95
  • Root Mean Square Error of Approximation (RMSEA): Equal to or less than .06
  • Standardized Root Mean Square Residual (SRMR): Equal to or less than .08

In the sample output, you can see that some fit indices meet or surpass our rules of thumb (including the CFI and SRMR) and some fit indices are edging on problematic (including the chi-square test of model fit and RMSEA). Messiness like this is very common in research but the general take-away here is that the scale is satisfactory but not great.

The second region we need to pay attention to is the Standardized Model Results, STDYX Standardization.  Here we have what are called our factor loadings (or lambdas; under the Estimate column) which are kind of like correlations between the observed variables and the latent variable. In general, you want factor loadings no lower than .40, but higher is even better. In this example, our items are loading on the latent factor very well – which is a good sign!

Finally, if you happen to use Mplus Diagrammer instead of Mplus editor, Mplus will produce sweet diagrams such as this to help you visualize your factor analysis:

PT factor diagram

And that is about it for the basics of how to use and interpret the BY command! And now for some Mplus syntax humor: Good BY see you later;

References

Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107(2), 238-246.

Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1-55.

McCrae, R. R., & Costa, P. T. (1987). Validation of the five-factor model of personality across instruments and observers. Journal of Personality and Social Psychology, 52(1), 81-90.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s