OUP user menu

A Critical Analysis of the Tender Points in Fibromyalgia

R. Norman Harden MD, Gadi Revivo DO, Sharon Song PhD, Devi Nampiaparampil MD, Gary Golden DO, Marie Kirincic MD, Timothy T. Houle PhD
DOI: http://dx.doi.org/10.1111/j.1526-4637.2006.00203.x 147-156 First published online: 1 March 2007

ABSTRACT

Objective. To pilot methodologies designed to critically assess the American College of Rheumatology's (ACR) diagnostic criteria for fibromyalgia.

Design. Prospective, psychophysical testing.

Setting. An urban teaching hospital.

Subjects. Twenty-five patients with fibromyalgia and 31 healthy controls (convenience sample).

Interventions. Pressure pain threshold was determined at the 18 ACR tender points and five sham points using an algometer (dolorimeter).

Outcome Measures. The patients “algometric total scores” (sums of the patients' average pain thresholds at the 18 tender points) were derived, as well as pain thresholds across sham points.

Results. The “algometric total score” could differentiate patients with fibromyalgia from normals with an accuracy of 85.7% (P < 0.001). Even a single tender point had a diagnostic accuracy between 75% and 89%. Although fibromyalgics had less pain across sham points than across ACR tender points, sham points also could be used for diagnosis (85.7%; Ps < 0.001). Hierarchical cluster analysis showed that three points could be used for a classification accuracy equivalent to the use of all 18 points.

Conclusions. There was a significant difference in the “algometric total score” between patients with fibromyalgia and controls, and we suggest this quantified (although subjective) approach may represent a significant improvement over the current diagnostic scheme, but this must be tested vs other painful conditions. The points specified by the ACR were only modestly superior to sham points in making the diagnosis. Most importantly, this pilot suggests single points, smaller groups of points, or sham points may be as effective in diagnosing fibromyalgia as the use of all 18 points, and suggests methodologies to definitively test that hypothesis.

  • Fibromyalgia
  • Tender Points
  • Diagnostic Criteria
  • Algometer
  • Pressure Pain Threshold

Introduction

Chronic pain is highly prevalent in North America [1,2] and Europe [3–6] with median point prevalence in adults of 15%[7]. Fibromyalgia (FM) probably accounts for a significant portion of the prevalence of chronic pain. The prevalence of FM is estimated to be 2% in the United States, occurring mostly in women in the mid-30s to late 50s. Goldenberg [8] estimated that FM affects approximately six million Americans. This population accounts for about 20% of patient visits to rheumatologists in North America [9,10]. FM is a disorder characterized by widespread pain with point tenderness at defined areas and is often associated with sleep disturbance, fatigue, and morning stiffness [11]. This multisymptomatic syndrome has been extensively studied [8,12–19] and contemporary research implicates abnormalities of sensory processing, sleep architecture, and neuroendocrine function as potential etiologies [13,20–28].

This rather common syndrome can result in significant loss of function and has been equated to rheumatoid arthritis in its disabling effects [29]. As there is currently no “gold standard” diagnostic test or cure for FM, treatments are most often focused on relieving pain and regulating the sleep cycle [30–33]. Physical modalities are commonly used; heat/cold treatments, massage, stretching, and range of motion exercises can be helpful [34–37]. Supervised aerobic conditioning may be helpful as long as care is taken not to overfatigue the patient [35,38–47]. Oral medications are often prescribed, but the efficacy of any specific agent is not uniform [48,49]. Antidepressants with significant sedative and analgesic properties are the mainstays of therapy, and can also help with the insomnia that is prevalent in the syndrome. Many indirect treatments such as cognitive behavioral and biofeedback therapy can also be helpful [50–57].

In 1990, the American College of Rheumatology (ACR) published the current standard for the classification of FM [11]. The first requirement is widespread pain for at least 3 months. Pain must be both above and below the waist, present on the right and left sides of the body, and axial muscle pain must also be present (cervical spine, anterior chest, thoracic spine). The second criterion is report of pain upon palpation in at least 11 of 18 designated tender points. Examination should consist of approximately 4 kg of pressure applied by the thumb or finger to each pressure point.

The dual criteria of widespread pain and tender points results in 88.4% sensitivity and 81.1% specificity for diagnosing FM. Thus far, this is the best combination of sensitivity and specificity available for diagnostic criteria of this syndrome [11]. The current tender point diagnostic criteria focus on the patient's report of pain when a specific force is applied (“approximately” 4 kg/cm2) [11]. A study of 316 health maintenance organization members with fibromyalgia found that tender points predicted only 3.0% and tender point severity ratings predicted only 8.3% of the variance in distress. A minimal difference was found between the variance predicted for physical vs psychological distress [58].

There are, however, limitations to the ACR diagnostic criteria. They are entirely subjective and qualitative, centering on the patient's report of pain and the physician's interpretation of behaviors related to pain such as withdrawing from the stimulus, grimacing, or crying. The subjective and qualitative nature of this diagnostic criteria could potentially be (modestly) improved with quantification of the trigger points.

The present study specifically defined the concept of “tender point” by measuring the algometric force at which the patient feels pain at a specific site [11]. This measurement procedure produces a quantitative value, which is conceptualized as a “pain threshold,” which then could be added to pain thresholds of other designated tender points to create an “algometric total score”[59,60]. This pilot study was designed to explore the utility of such an algometric score (specifically the sum of the pain threshold of all 18 ACR points) in the diagnosis of FM. An additional, and perhaps more important goal of this pilot was to investigate whether a smaller number of tender points would be sufficient to diagnose fibromyalgia.

Methods

Design

A prospective, psychophysical pilot study.

Setting

This trial was conducted at urban academic tertiary care centers.

Subjects

Subjects were recruited from a variety of sources in order to ensure patient diversity. These sources included: 1) the Northwestern University Medical Faculty Foundation rheumatology practice; 2) the Rehabilitation Institute of Chicago's (RIC) Arthritis Clinic; 3) the RIC's Center for Pain Studies; and 4) FM support groups associated with the Arthritis Foundation.

Fifty-six subjects participated including 25 FM patients (aged 25–59; average age 46.2, 20 women, 5 men) who were previously diagnosed with FM, and 31 “normals” (a convenience sample in this pilot, with no pain diagnosis) (aged 25–52; average age 35.8, 18 women, 13 men). The groups did not differ on gender distribution (P > 0.05) but the FM group was significantly older than the normal subjects (P < 0.001).

Inclusion criteria for FM patients included at least a 3-month history of pain consistent with the ACR criteria [11]. Exclusion criteria included a known history of cardiac or peripheral vascular disease, diabetes, inflammatory joint disease, pregnancy, an abnormal exercise test, or use of medications that would alter response to exercise (including nitrates, beta blockers, and calcium channel blockers). After the tender point testing, some of the FM subjects went on to participate in an open label exercise trial (R. Norman Harden, MD, Gadi Revivo, DO, Sharon Song, PhD, Devi Nampiaparampil, MD, Gary Golden, DO, Marie Kirincic, MD, and Timothy T. Houle, PhD, unpublished).

Interventions

Pressure algometry was used in this study to measure the least amount of force applied to elicit a verbal report of “pain” at each point [59]. This pain pressure threshold was assessed at the 18 tender points specified by the ACR, as well as five control points. These five “sham” points were located in regions not specifically associated with FM: glabela, middle biceps (bilateral), and hamstring (bilateral) (Table 1). Two physicians conducted all the algometric assessments, and several group discussions about proper technique were deliberately conducted in an attempt to minimize variability. The “n” of this limited pilot was insufficient to conduct formal reliability studies between the examiners.

View this table:
Table 1

Arbitrary number, name, and anatomic location of tender or sham points with their cluster membership

NumberTender Point NameAnatomical LocationCluster
2 (R, L)OcciputSuboccipital muscle insertions1
3 (R, L)Second ribUpper lateral to the second costochondral junction3
5 (R, L)Lateral epicondyle2 cm distal to the epicondyles3
6 (R, L)KneeMedial fat pad proximal to joint the line3
7 (R, L)OcciputSuboccipital muscle insertions3
8 (R, L)TrapeziusMidpoint of upper border of muscle3
9 (R, L)SupraspinatusOrigins, above scapula near the medial border2
10 (R, L)GlutealUpper outer quadrants of buttocks in anterior fold of muscle2
11 (R, L)Greater trochanterPosterior to the trochanteric prominence2
NumberSham Point NameAnatomical Location
1 (M)GlabelaSuperior to orbital ridge at midline
4 (R, L)Middle bicepsMidway between acromium process and elbow joint
12 (R, L)HamstringMidway between gluteal folds and knee joint on the femoral axis
  • R = right; L = left; M = midline.

A Fischer dolorimeter®[61] (the “algometer”) with a rubber disc of 1 cm2 was applied at a 90° vertical angle to all 18 ACR tender point and sham sites. Previous studies have shown pressure threshold measures obtained using algometers with 1 cm2 contact area to have acceptable interrater and intrarater reliability of pressure scores over time [18,61–64]. Pressure was steadily increased at a rate of approximately 1 kg per second, and each site was tested in succession for each of two trials, allowing for recovery time between trials (approximately 10 minutes between retest at each point) [11,64]. Subjects were instructed to indicate verbally when they first felt pain [11]. The pressure was then stopped, and the pressure was recorded.

These values were obtained twice for each point and then averaged, a method shown to be reliable [65]. The averages for each point were then summed to create an “algometric total score.” Similar procedures have been demonstrated to have adequate intrarater and interrater reliability in previous work [38,42,66].

Statistical Analysis

Data were analyzed using SPSS (Statistical Package for the Social Sciences) 11.5 for Windows (SPSS, Inc., Chicago, IL, USA). Differences between diagnostic groups, genders, and tender points were evaluated using Analysis of Covariance (ancova) procedures. Because the two diagnostic groups differed on average age, and age was inversely related to pain threshold (r(55) = −0.39, P < 0.01), ancova was used to assess group and gender differences on algometric scores while controlling for age. Levene's test was used to assess the homogeneity of variances assumption.

Binary logistic regressions were used to examine the ability of single algometric scores or the total algometric score to correctly discern between the FM and normal groups. Finally, hierarchical cluster analysis was used to group algometric points into meaningful clusters. Squared Euclidian distances served as the similarity measure and Ward's method was used to cluster the participants. The best cluster solution was selected from the dendogram plot as the solution that exhibited the greatest discrepancy between multiple point clusters.

Results

The age-adjusted means and standard errors of algometric scores for all tender points are presented as a function of diagnosis and gender in Table 2. For all subjects, the relationships between tender points was substantial, with interpoint correlations ranging from r = 0.72 to r = 0.96 (all Ps < 0.001). Chronbach's alpha, a measure of internal consistency, further revealed that the points were indeed very consistent within participants, α = 0.989. For this reason, an algometric total score [59,60] could conceptually be computed as the unweighted sum of all nonsham tender points (number of points = 18). This total score was also divided into summary scores for right- and left-sided points, and upper and lower points (lower points = gluteal, greater trochanter, medial knees; upper points = ant scalene, 2nd intercostal space, lateral epicondyle, occiput, trapezius, supraspinatus). A great deal of symmetry was observed. The left-sided points were nearly perfectly correlated (using intraclass correlation) with right-sided points, r(55) = 0.99, P < 0.0001. Furthermore, upper and lower points were also very highly correlated (using intraclass correlation), r(55) = 0.92, P < 0.0001.

View this table:
Table 2

Adjusted means and standard errors of myalgic tender points as a function of diagnosis and gender, after controlling for age

Side/Point No.NormalFibromyalgia
MaleFemaleMaleFemale
1 (sham)5.5 (0.4)5.0 (0.3)2.4 (0.6)1.9 (0.3)
R23.3 (0.25)2.4 (0.21)1.8 (0.4)1.5 (0.2)
L23.1 (0.2)2.4 (0.2)1.9 (0.4)1.3 (0.2)
R36.3 (0.4)4.3 (0.4)2.7 (0.7)1.6 (0.4)
L36.1 (0.4)4.1 (0.3)2.8 (0.6)1.5 (0.3)
R4 (sham)5.8 (0.4)4.3 (0.3)2.3 (0.6)1.8 (0.3)
L4 (sham)5.5 (0.4)4.2 (0.3)2.4 (0.6)1.8 (0.3)
R55.7 (0.5)4.6 (0.4)3.0 (0.7)2.0 (0.4)
L55.8 (0.4)4.5 (0.4)2.9 (0.7)2.0 (0.4)
R67.5 (0.5)5.7 (0.4)4.0 (0.7)2.3 (0.4)
L67.0 (0.5)5.8 (0.4)4.4 (0.7)2.6 (0.4)
R76.2 (0.4)4.4 (0.3)2.6 (0.6)1.8 (0.3)
L76.3 (0.4)4.6 (0.3)2.9 (0.6)1.7 (0.3)
R86.7 (0.4)5.1 (0.4)3.4 (0.7)1.8 (0.4)
L86.2 (0.4)4.9 (0.4)3.2 (0.7)1.9 (0.4)
R98.1 (0.5)6.9 (0.4)3.9 (0.8)2.5 (0.4)
L98.2 (0.6)7.3 (0.5)4.0 (0.9)2.6 (0.5)
R108.8 (0.5)7.4 (0.4)5.1 (0.8)2.6 (0.4)
L108.7 (0.5)7.4 (0.5)4.4 (0.9)2.7 (0.4)
R118.5 (0.5)6.8 (0.5)4.8 (0.9)2.7 (0.4)
L118.2 (0.5)7.1 (0.4)4.8 (0.8)3.0 (0.4)
R12 (sham)9.1 (0.5)7.8 (0.4)6.1 (0.8)3.0 (0.4)
L12 (sham)9.1 (0.6)7.5 (0.5)4.6 (0.9)3.3 (0.5)
  • R = right; L = left.

A 2 × 2 ancova was conducted using gender and diagnosis as independent factors (controlling for age) and the algometric total score as a dependent measure. Our ratio of man to woman was 1:4, and this is consistent with published data on the gender distribution of FM [67,68]. Participants with FM (M = 50.28) had significantly lower pain thresholds than the normal participants (M = 108.43), F(1, 51) = 50.34, P < 0.001. In addition, female participants (M = 67.06) had significantly lower pain thresholds than male participants (M = 91.65), F(1, 51) = 12.07, P < 0.001. Because the analysis of the group main effects violated the homogeneity of variances assumption of ancova (women and those in the control group exhibited more variability than men and the FM group), a nonparametric equivalent, the Mann–Whitney test, was conducted on the age-adjusted scores and supports the interpretation of the results. No additional effect (interaction) was observed between the combination of gender and diagnosis, as the differences between gender and diagnosis were uniform across the four combinations of these factors. Figure 1 illustrates the main effects for gender and diagnosis.

Figure 1

Boxplot of myalgic total score displaying median, interquartile range, and outliers as a function of diagnosis and gender.

Classification

A logistic regression was used to examine the ability of the algometric points to correctly discern between the FM and normal groups. Using only the algometric total score, participants could be correctly classified into their respective groups with an accuracy of 85.7%. Using a more traditional metric, fibromyalgia patients could be identified with a sensitivity = 0.84 and specificity = 0.87. This degree of accuracy resulted in a positive predicted value = 0.84, and was significantly better than chance, χ2(1) = 47.61, P < 0.001. Because of the unexpected differences between the groups on age, age could independently predict group status with 73.2% accuracy. After controlling for age (by entering it into the equation first), algometric score increased classification accuracy by 17.9% (P = 0.008), demonstrating that observed algometric score differences between the groups are in excess of that simply based on the age differences.

Evaluation of the classification errors using the total algometric score revealed that four (7.1%) normals were misclassified as fibromyalgic and four (7.1%) fibromyalgic patients were misclassified as normals. High-threshold fibromyalgic men (N = 3; mean total algometric score = 73.2) were occasionally misclassified as normals, and low-threshold normal women (N = 4; mean total algometric score = 53.6) were misclassified as fibromyalgic, with one fibromyalgic woman classified as normal.

The algometric total score, using all of the tender points, was not necessary for better-than-chance classification. Figure 2 displays the worst-case classification accuracy as a function of number of tender points used in classification. Worst-case accuracy was used as it represents the worst possible classification scheme for this sample, and as such is a conservative measure of discrimination. Total score accuracy (85.7% in this sample) is obtained with as few as six points; and a statistically significant improvement (80% accuracy) over chance classification is seen with the use of three points (see Figure 2). Because of very high interpoint correlation (internal consistency), we examined the algometric scores for meaningful diagnostic subgroups.

Figure 2

Worst-case classification accuracy as a function of number of tender points used. *P < 0.05 greater than chance prediction.

Hierarchical Cluster Analysis

A hierarchical cluster analysis was conducted on the tender points to examine whether they could be grouped in meaningful clusters (the sham points were not used). A three-cluster solution emerged that heavily relied on the similarity of right and left pairs, which could have been predicted given the extremely high correlation between the pairs (pairs of points were more similar than any two different points). However, three meaningful clusters did seem to emerge with groups of the pairs forming larger clusters. Cluster 1 contained only the points from 2 (see Table 1). Cluster 2 contained points 9, 10, and 11. Finally, cluster 3 contained points 3, 5, 6, 7, and 8. To examine whether the sampling of points from these clusters could improve classification accuracy while minimizing the number of tested points, a logistic regression was conducted using one point from each cluster. Using randomly selected points L(eft)2, L9, and R(ight)3, classification accuracy was found to be equivalent to the use of the total score (from 18 points) and was 87.5%. In terms of statistical sensitivity and specificity, the use of one point from each cluster resulted in 88% sensitivity and 87.1% specificity.

Sham Points

To evaluate the relationship between the mean of the sham points and the mean of the tender points, a repeated measures anova was conducted. Sham points (M = 4.4) had a higher mean threshold value than tender points (M = 4.2), F(1, 54) = 17.17, P < 0.001. Although normals had a higher threshold on both sham and tender points, F(1, 54) = 90.57, P < 0.001, the stated relationship between sham and tender points was equivalent for both participants with FM and normals (no interaction). Using age as a covariate did not significantly affect any of these relationships.

To examine whether the groups could be differentiated solely on their thresholds with the sham points (1, R4, L4, R12, and L12; Table 1), a logistic regression was calculated using the sum of sham points as the sole predictor. Interestingly, the use of the sham points was significantly better than chance, χ2(1) = 48.6, P < 0.001, and was equivalent to the use of the total myalgic score (classification accuracy of 85.7%; sensitivity 84%, specificity 87.1%). The results suggest that although the sham points had higher thresholds than the tender points, they were equally efficient in discriminating between the participants with FM from the normal subjects in this sample. This pilot did not address external validation/discrimination.

Discussion

References concerning pain due to palpable tender areas in muscle began to appear in the European medical literature in the early 1800s. Virchow coined the term “muscular rheumatism” in 1852 to describe palpable changes in muscle as a complication of rheumatic fever [69]. In the early 1900s, Gowers discussed musculoskeletal pain in a variety of conditions including “lumbago,” which he thought was due to inflammation, and coined the term “fibrositis”[70]. Stockman was concurrently discussing “connective tissue hyperplasia,” which came to be an early hypothetical pathophysiology of the “fibrositic” condition [71]. Due to the work of Travell, Simons, Wolfe, Yunus, Bennett, and many others [11,19,72–74], clinicians began to entertain a distinction between the regional myofascial pain syndrome and what came to be called FM [46].

Fibrositis, and later FM, was characterized as a more systemic process, often associated with sleep disruption and sometimes with affective diagnoses [18,75,76]. Results from a survey of practitioners in the American Pain Society show that 88% believe that FM is a distinct, legitimate clinical entity, and 86% indicate that myofascial pain syndrome is distinct from FM syndrome (FM) [46].

Fibromyalgia syndrome passed through several temporary criteria [18,19,74,77–80] and early proposals for diagnostic criteria for FM [18] stimulated increased study of the misunderstood and under-researched syndrome, and led to increased acceptance of its clinical validity [80,81]. This research eventually led to the development of formal, consensus-based diagnostic criteria, which were subjected to experimental validation [78,79], and were ultimately officially endorsed by the ACR [11].

The publication of the ACR criteria, tenderness in 11/18 specifically designated tender points, for diagnosing FM was a great step forward in our efforts to understand the syndrome and they have provided an important framework for communication and research regarding FM. Our study found that in FM subjects, there was a significant difference in pain thresholds between the designated ACR tender point sites and sham points. The lower pain thresholds at the ACR-selected tender points suggest that there may be true significance to those points. However, the 18 selected “tender” points have not been proven to be unique or significant; they simply represent the combined opinion of the consensus panel. These points are often tender in “normals” and have not been shown to be externally valid (specific) [76].

There may be other significant problems with the ACR criteria [76,82–84]. For example, the diagnostic system rests on the palpation of these 18 points [66,75] with finger pressure of “approximately” 4 kg/cm2[11]. This method has produced questionable interrater/examiner reliability [83, 85,86]. This problem has been partially improved by the use of mechanical devices that quantify this stimulus (the algometer) [59,61,86,87].

A significant problem with the criteria is its reliance on a patient's subjective response to a nonspecific stimulus [66] that is assessed by a biased operator. The patient's response to palpation may be influenced by a variety of psychological and sociological factors [67,88]. There is no viable method to blind either the subject or the examiner, and the locations of the points are readily available to the lay public (particularly over the Internet). In summary, these criteria are subjective, nonspecific, and thus are predictably quite controversial [82,83].

This pilot study was designed to begin development of a quantitative measurement methodology and to critically assess the 11 of 18 tender points of the ACR criteria. Our analysis of this exploratory data found significant differences in what we have called the “total algometric score” between the normal controls and FM patients, after controlling for age. We tested each of the 18 points for pain threshold using a dolorimeter and then summed these scores to produce the total algometric score [59,60]. This pilot recruited a convenience sample thought to be age appropriate to compute the normative data. Since designing the pilot the epidemiological perspective has changed as to the age range most affected by FM (early thought to be a syndrome of middle-aged women, a concept that is changing) [81,89]. Thus our design forced an accommodation for age differences in the analysis, which complicates interpretation. The results concerning age are very interesting, and should be studied independently. These age and gender data should influence inclusion considerations in future research in fibromyalgia.

The FM patients had significantly lower thresholds at both the sham and ACR tender points. This information is consistent with reports indicating that perhaps FM is a syndrome of disordered central processing of pain (sensitization) [90–94] and may challenge the notion that there is special significance to the points selected by the ACR.

There was very little intraindividual variability in the scores. Relationships between all points within a subject, left- vs right-sided points, upper vs lower points, and sham vs tender points, were extremely high. This invariability, when combined with consistent differences in threshold levels between FM and control groups, enabled single points, smaller groups of points, or sham points to be as effective in classifying individuals into their respective diagnostic groups as the use of all 18 points.

The FM patients have shown decreased perception threshold for cold pain [90,91,94], threshold for heat pain [90,91], tolerance for cold pain [90,91], heat pain tolerance [90,91], aberrations of cold perception [90], abnormal thermal windup and after sensations [94], a decreased spinal nociceptive flexion reflex threshold [91], and germane to the present work, an abnormal summation of mechanical stimuli [93]. Gracely et al. [92] have also shown evidence that there is augmentation in cortical and subcortical areas [92]. Therefore, this research is consistent with the ‘central augmentation/sensitization hypothesis,’ and suggests the 11/18 points of the ACR are neither necessary or specific. Our data suggest a smaller number of points in all four quadrants would be sufficient to detect this central nociceptive augmentation/sensitization syndrome with good sensitivity and specificity. This pilot indicates that as few as three sites may provide enough information to at least distinguish fibromyalgics from normals. Petzke et al. [17] report that three paired sites may be sufficient. A definitive (considerably larger) trial should be designed to confirm this, and to assess specificity of points and sets of points in distinguishing FM from other pain disorders.

The current study used hierarchical clustering techniques to explore the possibility of creating meaningful clusters of patients and tender points. Although theoretically interesting, the small size of the sample renders the cluster solutions to be very sample dependent. A more powerful replication of the present study may very well reveal different cluster solutions. However, the idea of using such meaningful clusters of points to reduce the needed number of tested points appears to be an obtainable goal and this can and should be addressed promptly in future studies. A logical starting point for future research may be a single representative point in each body quadrant.

Interestingly, gender differences, and not tender point sites, appeared to pose the largest problem in correctly distinguishing between FM and normals, with normal women and fibromyalgic men at times being classified into the wrong groups. The mechanisms of gender-based disease proclivity is unclear, particularly in FM [67,68,95]. Some studies suggest that there are pressure pain differences between normal men and women [62], while others have not [63]. It is possible that mechanically induced pressure is more likely to show sex differences than other noxious stimuli [96]. In one particular study women showed lower pressure pain tolerance than men [64]. Diffuse noxious inhibitory controls may work to attenuate temporal summation in men, but not in women (or perhaps FM patients) [97]. Although the current study was not sufficiently powered to properly detect interactions between gender and diagnostic category, the relatively large gender effects observed in this study might reach statistical significance with only modest increases in sample size, and this aspect should be definitively addressed in future studies.

Acknowledgments

The authors would like to thank the many clinicians and researchers whose efforts made this work possible. Many thanks to the Northwestern Memorial Faculty Foundation, Department of Rheumatology, and the Center for Pain Studies at the Rehabilitation Institute of Chicago for their subject referrals. The work would have been impossible without the kind support of the Helen M. Galvin Center for Health and Fitness, and its director Jeff Jones, and the Lawrence and Nancy Glick Pain Research Fund, as well as the patient editing and technical support of Henry Caporoso and Karin Shook.

References

View Abstract