This study was based on data from a psychiatric outpatient clinic in Trondheim, Norway. Patients were referred by general practitioners or other mental health clinics. Patients filled all instruments before starting treatment. Data was collected using a digital platform from February to November 2020 and informed consent was given electronically. There were no exclusion criteria, but patients diagnosed with certain specific disorders (eg psychosis and obsessive-compulsive disorder) received outpatient treatment elsewhere and were not represented in this sample. A total of 857 patients agreed to participate, 145 refused. Fifteen patients completed the forms twice and the most recent was deleted.
Forty-three of the patients did not answer all the items. Of these, 26 did not answer at least one question on one of the three instruments (average age 33.44 years, 18 women) and were deleted. The final sample consisted of 831 patients, with a mean age of 30.03 years (South Dakota= 9.99, median = 27, range = 18–72), and 510 were female (61%).
Data for ICD-10 diagnoses was extracted in November 2020. This has led to no diagnosis being available for some patients who have just started treatment. In this sample, 638 (77%) of patients received a mental and behavioral diagnosis of ICD-10 at the time of data extraction. More women than men had been diagnosed (see Table 1). The most common diagnoses were mood disorders (37%) and anxiety disorders (34%). A total of 193 (23%) had comorbid diagnoses (with at least two diagnoses from ICD-10, Chapter 5 subsections), and of these, 99 (12%) had been diagnosed with a mental disorder. mood (F30-F39) and anxiety or stress disorder (F40-F49).
The majority of patients scored above the threshold for depression and anxiety (≥ 10 for PHQ-9 and GAD-7 total score; see Table 1). Women scored statistically significantly higher on GAD-7 and were more associated with scoring above cutoff for PHQ-9 and GAD-7.
Patients with a mood disorder and not an anxiety disorder (not= 211) scored significantly higher and more often above the cutoff on the PHQ-9, and higher on the WSAS, than patients with an anxiety disorder and not a mood disorder (not= 185; PHQ-9 you= 3.35, p< .001, h2= 6.27, p = 0.012; WSAS you= 4.05, p< .001). Patients with an anxiety disorder and not a mood disorder scored higher on GAD-7, but not significantly more often above the cutoff (GAD-7 you= − 2.26, p= 0.024,h2= 1.72,p= 0.189).
Patients with a diagnosis of comorbidity ( not= 193) had significantly higher and more often above-cutoff scores on all instruments compared to patients diagnosed with a single diagnosis ( not= 445; PHQ-9 you= − 4.95, p< .001, h2= 15.88, p<0.001; GAD-7 you= − 4.02, p< .001,h2= 13.61,p<0.001; WSASyou= − 2.60,p= 0.001).
The nine-point Patient Health Questionnaire-9 (PHQ-9) measures the severity of depression and can also be used as a diagnostic tool. . It comes with a diagnostic algorithm, but using the sum score and applying a cutoff ≥ 10 has been suggested to be more sensitive for detecting depression . PHQ-9 uses a 4-point Likert scale ranging from 0 ( no way) to 3 ( almost every day). Its psychometric properties have been extensively tested [25,26,27]and it has demonstrated good properties as a measure of severity in a large psychiatric sample . The psychometric properties of the Norwegian version have been tested in adolescent girls and adult women with and without eating disorders [28, 29].
The Seven Item Scale of Generalized Anxiety Disorder-7 (GAD-7)  was developed to detect and measure the severity of generalized anxiety disorder. However, it has been shown to work well as a measure of other anxiety symptoms. [16, 30]. The GAD-7 uses a 4-point Likert scale identical to that of the PHQ-9. It is considered a reliable and valid measure of anxiety symptoms in heterogeneous psychiatric outpatients, among others in Norway and the United States. [14, 16]. PHQ-9 and GAD-7 are available in multiple languages .
The Occupational and Social Adjustment Scale (WSAS)  measures functional impairment. It consists of five items that assess impairment in daily functioning (work, household chores, social leisure, private leisure, and relationships) that are scored on a 9-point Likert scale from 0 (not altered at all) at 8 ( very severely affected). The psychometric properties of WSAS have been demonstrated in various studies, in a Norwegian outpatient setting  and in England, where it is suggested to be a good complement to PHQ-9 and GAD-7 .
State  was used for data preparation and testing for group differences. Mplus version 8.4  was used for CFA, MI and SEM. Missing items were less than 0.01% on all variables. Little’s MCAR test showed non-significant results (PHQ-9 p= 0.88, GAD-7p = .78,WSASp = .73), indicating that the data was completely missing at random. No imputation was performed.
The WLSMV (Weighted Mean Square Means and Variance Adjustment) estimator was used because it is less prone to bias than other estimators for ordinal data . Several fit indices were used :h2 as a measure of absolute fit, root mean square error of approximation (RMSEA) for sparsity correction, and comparative fit indices, comparative fit index (CFI) and Tucker’s index -Lewis (TLI) . Thresholds near or below 0.06 for the RMSEA and above 0.95 CFI and TLI were used to indicate a good fit .
A bifactor model was specified using the bifactor-(S – 1) modification, specified with a specific factor and reference domain . Bifactor-(Sc – 1) was estimated with a specific cognitive group factor and using the somatic domain as a reference. Bifactor-(Ss – 1) was estimated with a specific somatic group factor and using the cognitive domain as a reference.
Internal consistency was measured with composite reliability, which was offered as a superior alternative to other measures . A value between 0.7 and 0.9 was used for satisfactory internal consistency. Discriminant validity was calculated with CFA confidence intervals, using standardized 95% upper bound confidence intervals (UL) for the correlation between factors. UL < 0.8 indicates no problem, 0.8–0.9 indicates marginal problems, 0.9–1.0 indicates moderate problem, and greater than 1.0 indicates severe problems .Hierarchical Omega was estimated and hierarchical omega above 0.8 has been interpreted to indicate a predominantly one-dimensional construct . Additionally, unidimensionality was also interpreted if the hierarchical omega for the general factor was greater than 0.7, the percentage uncontaminated correlations (PUC) was less than 0.8, and the common explained variance (ECV) of the general factor was greater than 0.6 .
Measure invariance (MI) was evaluated sequentially, for configural, metric, and scalar invariance, where each step involved more equality constraints. Configural invariance was obtained if the model of free and fixed loadings by sex were equivalent, i.e., the number of factors and the indicator-factor models were considered to be the same for men and women. . If configuration invariance was supported, then metric invariance was tested, where factor loadings were also constrained. If metric invariance was achieved, scalar invariance was assessed by constraining the item thresholds to be equal in all groups. Scalar invariance implies that differences in latent means are unbiased and can be considered true gender differences. We followed the recommendations of Millsap and Yun-Tein  and Pendergast with colleagues  to test MI with ordered categorical measures. The Mplus DIFFTEST function was used to compare model fit . However, it has been suggested that using ΔCFI ≥ – 0.01 and ΔRMSEA < 0.015 is superior for assessing MI, rather than relying on an insignificant ∆𝜒2 . Thus, ΔCFI and ΔRMSEA were used for threshold guidance. For concurrent validity, latent path modeling with SEM was used with bifactor-(S − 1).