and validating Thymometry-a simple visual scale for recording subjective outcomes.
Dr. Michael F. D'Souza MD FRCGP FFPHM FRSA,
Thymometry is a geometrically exact series of eleven emoticons. A full half circle smile, represents
a subjective score of “100% Happy” while the arcs of subsequent smiles reduce by intervals of 20% down to a “neutral”
straight line. Thereafter the smile turns into a frown with the same 20% intervals until becoming a full upside down semi-circle.
This represents a subjective score of 100% Unhappy.
Thymometry has been designed to be a brief metric for self-perceived quality of life etc. Fig 1.and has now
been in use for over 30 years in Primary Care settings and has been administered using a wide range of paper and electronic
formats. This is a discussion of its value and validity derived from data provided by 8648 subjects. 7322 were routine general
practice consultations of patients, 802 were in a random population survey while the remaining 524 subjects were participants
in primary care outcome studies.
A summary of our findings:
· All 8648 subjects appear to understand the meaning of the Thymometry
· It is simple and brief enough to be applied in routine clinical care and has the advantages in being
commonsensical and trans-cultural. In the general population survey of adults in Kingston Upon Thames UK
the scale was one of the least omitted data point (e.g. Age was omitted three times as often) The majority of the affluent
population surveyed rated themselves as Happy on the Thymometry Scale Fig 2.
In early touch
screen surveys of GP Consulters, Males recorded themselves as being slightly happier than Females and the
young (0-30yrs) and the older (60+) age groups were found to record higher happiness scores than those of middle age. Fig.
3. These interesting observations have been replicated by more recent studies using more elaborate measures. (Viz.
The Gross National Happiness Index with its 33 indicators)
· The Thymometry scale discriminated between GP patients in a logical
and expected way; only 43% of those diagnosed as depressed recorded themselves as being happy compared with the 71% those
attending for immunizations prior to going on holiday. Fig.4
· When used to measure Quality of Life it showed good correlation
with the EuroQol, the briefest currently used gold standard. Fig 5 (Still in preparation)
The Thymometry scores correlated
appropriately with the McGill pain scales in a very small study
· It also correlated very well with measures of Depression
such as the Zung questionnaire. Fig 7
· Unlike analogue line systems, which rely on conceptualizations and the meaning of verbal captions,
a faces scale provide an additional reference to feelings. It had a within subject repeatability of 99.7 % (after a
Its eleven-point scale
permits was more discriminatory than a binary or three point scale. On the other hand it is uncertain whether increasing the
number of points on the faces scales conveys clinically relevant importance but we have designed a 201 point electronic version
to explore this.
· Using the eleven point scale, on average, patients rated, as “important”, a change of
1.625 points or more on this scale: i.e. the difference that would prompt them to change treatments.
It seems to be an adequate
measure of subjective feelings associated with clinical change over time in comparison studies of symptom response to different
Antibiotics. Fig 8
In studies on Hay fever sufferers,
variations in their Thymometry scores mirrored changes in locally measured pollen counts
· In trials comparing interventions in Bronchitis and Depression
sequential Thymometry scores changed sensitively and appropriately with the observed clinical condition
· Because of its subjective nature, Thymometry does have limitations
as a metric. It is not an acceptable ratio scale must not be averaged or analyzed by T-tests.
· Thymometry may however be legitimately used to compare populations.
The concept of calculating a “Bentham score” for populations is suggested as one way of doing this. A
“Bentham score” is the sum of every individuals happiness score taken as a proportion of a notional “perfect
score” in which every individual scores a maximum on the Thymometry scale i.e. exhibits the “greatest good for
the greatest number of people”. Then a simple Chi-squared can be legitimately used to estimate significance between
the Bentham scores of populations or pre and post intervention scores in the same population. Fig 9
At present Thymometry is being
successfully used in a smartphone App designed to assess response the addiction therapy
Because of its brevity and
validity, Thymometry could become incorporated into routine clinical practice and in many other situations where the views
of patients, clients, customers or citizens is needed to evaluate outcomes.
ways of measuring subjective outcomes are needed by Patients, Clinicians, Researchers, and Healthcare managers. Patients need
to know how well any treatments they are offered have suited fellow sufferers and clinicians and others get additional support
from this sort of subjective data when making therapeutic choices. Indeed because it is increasingly common for medical interventions
to be given specifically to improve quality of life, clinicians need to have some easily applicable measure of this in order
to demonstrate effective practice. While those engaged in pragmatic health service research have a special need of tools that
can be applied within the natural setting without disrupting normal clinical practice. Finally Healthcare managers who have
the task of allocating large resources across a wide range of clinical interventions require unbiased patient-centered outcomes
that compare their options irrespective of disease groups or special pleading.
Unfortunately all the existing tools for measuring subjective
outcomes have intrinsic credibility problems and are often too specialized or of inappropriate length to achieve these ends.
Ruta and colleagues 2 have suggested six features that the ideal tool should possess:
(1) Measure the aspects and effects of the illness that the patient decides are most important;
the patient to score the chosen variables;
(3) Be a sensitive measure of within person change over time;
(4) Be applicable
to the whole spectrum of illness seen in primary care;
(5) Be capable of measuring the effects of a wide variety of
(6) Be brief and simple enough to complete in a 7-10 minute consultation.
Achieving this is not easy but
the “Thymometer” (feelings measurer) scale has been devised to try to meet all these requirements and more.
is similar in principle to the well-established COOP-WONCA charts 3,4,5 and the Andrews & Withey D-T
scale.6 However Thymometry offers
an accurate eleven-point scale of geometrically precise faces, rather than an essentially Lickert, 7 point series with indeterminate
inter-point intervals. This means that Thymometry scores have greater discriminating power to compare the value of interventions
in all symptomatic conditions. Furthermore, being a graphical scale, Thymometry can achieve a broad measure of independence
from differences in culture, language and education. Indeed the wide bibliography on cross-cultural use etc. that has been
generated for the COOP-WONCA charts is very likely to applicable to Thymometry. Finally Thymometry tackles the inherent problems
of subjective variability by a process of continuous calibration to derive acceptable averages from mass data collections.
The series of studies reported
later will explored the conventional measures for validity of Thymometry and compared it against Euroqol 7 Zung 8 and the McGill Pain questionnaire.9 However attempts to compare it with the most well known generic
instrument the SF36 10, 11 had to be abandoned because too few patients in these natural setting studies
were prepared to fill in the very long forms provided. It is easy to agree with Jenkinson 12 that “The
SF-36 is not suitable for use within a consultation, which detracts from its clinical usefulness “ Fortunately EuroQol
has been validated against SF36 13 so we have employed the Euroquol as our comparison gold standard.
Most of these studies reported here took place in general practices in Kingston& Richmond Health
Authorities in London UK. The catchment area population was 320,000.
to be discussed
w How did this faces scale come to be conceived and developed
understand and prefer this faces scale?
w What change on this scale do patients judge to be important to them?
w How internally consistent, repeatable is the scale and what is
its inter-rater reliability?
w How sensitive is it to change? (Appropriate responsiveness)
w What is its Face & Content validity?
its concurrent criterion validity against clinical records & the Euroqol as a gold standard
w What is its Construct, Biological, Convergent
& Discriminant validity?
w How feasible is it to use in routine service and what is its power for use in trials?
of the Thymometer scale – The Bentham Score
Our initial studies used conventional 100mm analogue lines to
get outcome scores. However this method relies on subjects conceptualizing the position on a line as being meaningful in terms
of feeling and understanding the written text attached to it. This is difficult for those with language and educational difficulties
so in subsequent studies investigating the symptomatic impact of a range of antibiotics on bronchitis we decided to try using
a scale with a 7-point series of faces very similar to the Withey D-T scale simultaneously with 100mm analogue lines. We found
that patients much preferred these faces scales and completed them twice as often as analogue lines. However there was as
expected loss in power to discriminate from having so few points on the scale. We therefore concentrated on trying to achieve
a satisfactory compromise by developing our scale with a larger series of faces.
We adopted the term Thymometry
for using this sort of scale as it means the measurement of feeling and had already been coined by Peck in 1967 as a measure
of pain related to the number of decibels of sound.1 However, our intention was to apply it in a much more generic
way. To this end we have been refining and developing it over the last thirty years, while using it in the context of a number
of primary care comparative trials and surveys.
The choice of the number of points on this scale has been a contentious issue.
The justification for having more than just three faces; “Happy”, “Neutral” and “Unhappy”
lies in the power of the scale to make finer discrimination whenever needed. It is possible to retrospectively reduce an eleven
data scale to three points see Figs 1 but not vice versa. The current use of the scale touch screen in our App. this is only
a ten-point omits the extreme unhappiness point for purely aesthetic reasons. We have experimented with a twenty-one-point
scale marked in percentage points above and below a neutral mid-point with alternate points both on and between faces. In
use it was clear that the between-faces points were used far less frequently, so it was decided that a simple an eleven point
was all that was needed. Aside from its low power of discrimination, other problem with the original Withey D-T scale was
that it had unspecified expression sizes. Intuitively it was felt that the human abilities to distinguish precise definition
in expressions might be important. Therefore the scale was constructed to have frowns and smiles that were mathematically
exact proportions of an arc corresponding to the relevant percentage differences. The ideal version of this would be a single
face on a screen in which a slider made the straight line neutral position move in 1% intervals to a maximum 100% semi circular
smile or frown.
Irrespective of the number points are used on the Thymometry scale is not a ratio scale. These, by their
nature, are usually taken as indicative of an objective truth. (e.g. the ratio scale for temperature means that a temperature
of –3° K is true throughout the universe). Thus, the using the average of units of
measurement on ratio scales is a legitimate approach to inter-group comparison. However measures such as Thymometry are intrinsically
variable both between and within individuals. Therefore it is not acceptable to think that each individual can grade the same
level of happiness by the same intervals as any other. Indeed it is commonplace to observe that different subjects describe
the same experience in either hyperbolic or conservative ways. Thus Thymometry is an elastic ruler that can only be used with
reference back to within person measurements, i.e. as an interval scale, that is true over short time intervals within each
individual. We measured its within subject repeatability after a 3 minutes as 99.7 %.
Nevertheless, even as an interval
scale, group scatter-gram comparisons can be done using the idea of a reference concept. We have called this concept “the
Absolute Bentham” after the great Utilitarian thinker Jeremy Bentham. Bentham argued that the utilitarian target for
interventions should be “The greatest happiness of the greatest number.” An “Absolute
Bentham” in terms of Thymometry would be scored when every individual in a population recorded a score of 100% (i.e.
as happy as possible). Most populations contain individuals who have lower than absolute scores, if all these individual
scores are summated and taken as a proportion of the “Absolute Bentham” we would get a proportion that is amenable
to both statistical testing and use in graphical presentations. This helps ordinary, non-statistically trained, people to
easily “see for themselves” clear cut results of any evaluation without recourse to statistics Fig 9.
The concepts of Quality of
Life and Usefulness
In addition to these analytical issues, Thymometry can be applied to measure a wide range of conceptual outcomes.
The Concept of Quality of Life was measured on the Thymometry scale as a response to the question “How
you are feeling about every aspect of living” (e.g. 100% = As happy as possible)” and applied the Standard WHO
definitions of health, to the conventional health inquiry “How are you?”
Also to undertake comparative
trials on hay fever remedies, another outcome measure was developed. This was the concept of a treatment’s “Usefulness”.
This required each patient undergoing treatment to synthesize into a single Thymometry score how they feel about a treatment
after balancing their Symptom relief against the treatments Side effects and Ease of use.
The following groups have been studied:
· Healthy Medical centre Staff
Consulting patients of all
· Patient Carers (for surrogate responses)
· Members of the general (public for normative data).
The following types of studies have been done using Thymometry
# Comparative trials of Hay fever treatments
studies of antibiotics for bronchitis
# Mass survey of consulting GP patients
# Comparative study of antibiotic treatments
# Touch-screen validation studies against Euroqol
# Studies on Depressed Patients
# Specific studies on clinical staff
# An App
for helping with harmful habits see https://www.i-bet-me.com
The majority of this study sample consisted of patients consulting their General Practitioners.
The sole entry criterion for participation was that the patient consented to complete a mark-sense form on arrival when they
consulted for any presenting problem. GPs gave no prior guidance on how to complete the forms ensuring that the data
consisted of only the patient's views.
Follow up was not formalized but occurred on return for routine care and the patients
previous scores were not made available to them.
The Construct, Biological, Convergent & Discriminant validity of Thymometry
Construct Validity: tests the degree to which a test measures what it claims, or purports, to be measuring. Not one of the 8648
subjects in our studies reported any difficulties in understanding what the faces meant. And Thymometry’s within subject
repeatability after a few minutes was 99.7 %.
Biological Validity: tests that
scientifically accurate information that is used in an unbiased way conveys a biological idea. We showed that changes
in local pollen counts were reflected appropriately by changes in the Thymometry scores of hay fever sufferers.
Convergent Validity: is reflected by the results of a measure correlating with results from other measures
intended to measure the same concept.
This was shown to occur in in Painful conditions where McGill pain scores appropriately
correlated with Thymometry Fig 6 and in Depression where Zung scores correlates well with Thymometry Fig
7. The change in Thymometry scores in depression at two weeks and at four weeks showed a consistent gradient across
the spectrum of clinical change. The index of responsiveness 14 was calculated as the change in scores of patients reporting
themselves "a little better" divided by the SD of change in scores for patients reporting themselves "about
the same"; it was not calculated for patients reporting "a little worse" because of small numbers in this group
Discriminant validity: tests whether concepts or measurements that are supposed to be unrelated are,
in fact, unrelated. This has not yet been formally checked.
Sensitivity to Change:
This is tested by appropriate responsiveness to situations. It was shown to occur in studies on Antibiotics. See Fig
8. Also a further study was done on our Depressed patients asking them the question "I would be happy
to try any suitable treatment if it made me better by a change of how many …faces ? Responses ranged from 0.5 to 3
with an average score of 1.625. Therefore a two-point change on this scale was be considered to be “important”
It is now becoming widely accepted that seeking guidance from
formal outcome evidence could prove a useful way of dealing with the complexities and expense of modern healthcare.
Not only might investigating effectiveness improve quality, but also comparative studies of efficiency might be the best way
to contain costs. Outcome research should expose those clinical activities which, however well intentioned and established,
are either merely placebos or on more detailed examination found to be harmful to patients. It should also highlight areas
where resources could beneficially be redeployed e.g. shifting from hospital to community services.
However it is well recognized that in a few serious conditions,
such as hypertension and malignancy, underlying health may not be reflected by the presence of distressing symptoms and or
the perception of lowered quality of life. Nevertheless most causes of ill health do cause unpleasant symptoms either immediately
or eventually and having a suitable tool that can easily capture generic symptom change should prove of practical use, provided
reasonable caveats are observed.
Even busy front-line clinicians now have access to IT decision support based on
the collated results of outcome research such as the Cochrane reports. However because current clinical practice is so diverse
and intricate, it has rarely been possible to get large enough numbers recruited into double blind trials to satisfactorily
discriminate between the many interventions available for common conditions. Indeed the costs and difficulties of doing
such gold standard studies, giving sick people placebos, persuading clinicians to randomize etc. does suggest that we should
consider less ambitious approaches to influencing practice. Having a simple measure like Thymometry that can easily be incorporated
into routine clinical data-gathering and then the results automatically analyzed and fed back, could improve the management
of all situations where the goal is to improve quality of life or symptoms.
Thymometry has been designed to serve this purpose and it provides a useful measure
of within-person changes over time. From the many studies we have reported here Thymometry appears to be both valid and responsive.
Despite its brevity it emerged as appropriately responsive to both interventions and secular changes. The two innovative concepts
were required to create this scale. The first was to use a mathematical exact expression to reflect back to the individual
his own General feelings over a period of recent time. The second was to continuously calibrate this scale by the concomitant
measurement of how the users ascribed importance to change on the scale.
Despite its simplicity Thymometry is based on two paradigm shifts. Firstly, instead
of focusing on specialized details of disease assessment to measure change, it confines its assessment of outcome to the measurement
of a patient’ s overall perceptions of change and value. Secondly, it takes a view of health problems, which assumes
that most diseases can be usefully generalized as being either temporary or chronic disturbances of physiological regulation,
which are frequently reflected by patients’ current symptomatology. It is the interaction of treatment effects
with this symptomatology that produces a patient’s perception of change and value. Thus, by design, it uses the
patient’s own synthesis of events in response to therapy to evaluate any community or healthcare intervention, this
occurs irrespective of the nature of the condition it is intended to ameliorate but also obviously includes some measure of
placebo effect. Thymometry itself is a hybrid tool; using a combination of speed of response, change in a Health Index and
the patient’s endpoint assessment of the usefulness of interventions. Within its limitations it could be used
in wide range of situations.
Acknowledgements: I am grateful for the collaboration of all the patients and
General practitioners in Kingston upon Thames & Richmond. I am also grateful to Lilly and Upjohn for their financial support.
of interest: None.
Note on Copyright: Patents can obstruct pro bono science so Thymometry has copyright but no patent. The sole purpose of this is
to prevent commercial agencies profiting from using it without giving a share of their earnings to pro bono research. Most
pro bono researchers will be able to use it for free in exchange for a copy of their completed database, which we can add
to our pooled information source.
1. R.E Peck - Headache: The
Journal of Head and Face Pain, 1967 - Wiley Online Library
2. Wilkin D, Hallam I, Doggett M. Measures
of need and outcome for primary health care. Oxford: Oxford University Press, 1992
3. Ruta DA,
Garratt AM, Leng M, Russell IT. A new approach to quality of life: the patient-generated index. Medical Care 1994; 32: 1109-26.
4.Kinnersley P, Peters T, Stott N. Measuring functional health status in primary care using the COOP-WONCA
charts: acceptability, range of scores, construct validity, reliability, and sensitivity to change. Br J Gen. Pract. 1994;
6.Wong-Chung D, Mateijsen, N, West R, Ravel, L, Van Weel C. Assessing the functional
status during an asthma attack with Dartmouth COOP charts. Family Practice 1991; 8:404-8.
7.Yodfat Y. Functional
status in the treatment of heart failure by captopril: a Multi-centre, controlled, double blind study in family practice.
Family Practice 1991; 8:409-11.
8 Andrews F.M. & Withey S.B. Social Indicators of well being:
Americans perceptions of Life Quality. New York, 19 Plenum Press
9 G Beaumont - Human Psychopharmacology:
Clinical and 1994 - Wiley Online Library
Melzack, 1975), the New
York Heart Association Index (Kossmann, 1964), Cancer Inventory of
Problem Situations (Heinrich et al., 1984), the Sickness
Impact Profile (Bergner et al., 1981), the
Nottingham Health Pro- file (Hunt et al., 1985) and Euroquol (Euroquol Group,
8 Zung Self-Rating Depression Scale
https://en.wikipedia.org/wiki/Zung_Self-Rating_Depression_Scale. The Zung Self-Rating Depression Scale was designed
by Duke University psychiatrist William W.K. Zung MD (1929-1992)
9 The short-form McGill Pain Questionnaire.
by R Melzack - 1987 -
10.Garratt A, Ruta D, Abdulla MI, Buckingham JK, Russell IT. The
SF-36 health survey questionnaire: an outcome measure suitable for routine use in the NHS? BMJ 1993; 306:1440-4.
11.Brazier JE, Jones NMB, O'Cathain A, Thomas KJ, Usherwood T, Westlake L. Validating the SF-36 health survey questionnaire:
a new outcome measure for primary care. BMJ 1992; 305:160-4. [Medline]
12 Jenkinson, C. (1996). MYMOP,
a patient generated measure of outcomes. BMJ 313: 626-626 [Full text]
13 EuroQoL Quality of Life Scale (EQ-5D)
BioPsychoSocial Assessment Tools
for the Elderly - Assessment Summary Sheet . Test: EuroQoL Quality of Life Scale (EQ-5D). Year: 1990; revised 1993.
14 Defining and applying the concept
of quality of life.
by D Felce - 1997 - Cited by 271 - Related articles
J Intellect Disabil Res. 1997 Apr;41 ( Pt 2):126-35.
15 Hay fever
Treatments - Which should be Tried First? M.F.D’Souza, M.Tooley, J.R.H.Charlton (l987) J.Royal College of General
Practitioners 1987; 37: p.296-30.
16 A Method for Evaluating Therapy for Hay fever - A Comparison
of Four Treatments. Charlton et al J.Clin.Allergy 1983; 13: p.329-335.
17 A Bowling (1991) Measuring
Health - A review of quality of life measurement scales ISBN 0-335-15435-2 Open University Press
GH, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chron. Dis 1987;
19.Guyatt GH, Kirshner B, Jaeschke R. Measuring health status: what are the necessary
measurement properties? J Clin Epidemiol 1992; 45:1341-5.
20.Guyatt GH, Eagle DJ, Sackett B, Willan A,
Griffith L, McIlroy W, et al. Measuring quality of life in the frail elderly. J Clin Epidemiol 1993; 46:1433-44.
21 Cairns, J. (1996). Measuring health outcomes. BMJ 313: 6-6 [Full text]
22 Primrose, W R, Seymore,
D G, Ball, A E, Russell, E M (1996). Rate of completion of forms should have been analysed by age. BMJ 313: 626-626 [Full
23 Ruta, D., Garratt, A. (1996). Reliability of such instruments needs to be proved. BMJ
24 J.R.H.Charlton, M.F.D’Souza, M.Tooley, R.Silver (l985) A Community Trial Strategy
for Evaluating Treatment for Symptomatic Conditions Statistics in Medicine, Vol.4, 11-21.
25 Katz JN,
Larson MG, Phillips CB. Comparative measurement sensitivity of short and longer health status instruments. Medical Care 1992;
26 John Helliwell, Richard Layard and Jeffrey Sachs 2011:
World Happiness Report