Evaluating and validating Thymometry-a simple visual scale for recording subjective outcomes.


Dr. Michael F. D'Souza MD FRCGP FFPHM FRSA,




Thymometry is a geometrically exact series of eleven emoticons. A full half circle smile, represents a subjective score of “100% Happy” while the arcs of subsequent smiles reduce by intervals of 20% down to a “neutral” straight line. Thereafter the smile turns into a frown with the same 20% intervals until becoming a full upside down semi-circle. This represents a subjective score of 100% Unhappy.

                  Thymometry has been designed to be a brief metric for self-perceived quality of life etc. Fig 1.and has now been in use for over 30 years in Primary Care settings and has been administered using a wide range of paper and electronic formats. This is a discussion of its value and validity derived from data provided by 8648 subjects. 7322 were routine general practice consultations of patients, 802 were in a random population survey while the remaining 524 subjects were participants in primary care outcome studies.


A summary of our findings:


·       All 8648 subjects appear to understand the meaning of the Thymometry scale.


·       It is simple and brief enough to be applied in routine clinical care and has the advantages in being commonsensical and trans-cultural. In the general population survey of adults in Kingston Upon Thames UK the scale was one of the least omitted data point (e.g. Age was omitted three times as often) The majority of the affluent population surveyed rated themselves as Happy on the Thymometry Scale Fig 2.


·         In early touch screen surveys of GP Consulters, Males recorded themselves as being slightly happier than Females and the young (0-30yrs) and the older (60+) age groups were found to record higher happiness scores than those of middle age. Fig. 3. These interesting observations have been replicated by more recent studies using more elaborate measures. (Viz. The Gross National Happiness Index with its 33 indicators)


·       The Thymometry scale discriminated between GP patients in a logical and expected way; only 43% of those diagnosed as depressed recorded themselves as being happy compared with the 71% those attending for immunizations prior to going on holiday. Fig.4


·       When used to measure Quality of Life it showed good correlation with the EuroQol, the briefest currently used gold standard. Fig 5 (Still in preparation)


·       The Thymometry scores correlated appropriately with the McGill pain scales in a very small study

Fig 6


·        It also correlated very well with measures of Depression such as the Zung questionnaire. Fig 7


·       Unlike analogue line systems, which rely on conceptualizations and the meaning of verbal captions, a faces scale provide an additional reference to feelings.  It had a within subject repeatability of 99.7 % (after a 3 minutes).


·        Its eleven-point scale permits was more discriminatory than a binary or three point scale. On the other hand it is uncertain whether increasing the number of points on the faces scales conveys clinically relevant importance but we have designed a 201 point electronic version to explore this.


·       Using the eleven point scale, on average, patients rated, as “important”, a change of 1.625 points or more on this scale:  i.e. the difference that would prompt them to change treatments.


·       It seems to be an adequate measure of subjective feelings associated with clinical change over time in comparison studies of symptom response to different Antibiotics. Fig 8




·       In studies on Hay fever sufferers, variations in their Thymometry scores mirrored changes in locally measured pollen counts


·       In trials comparing interventions in Bronchitis and Depression sequential Thymometry scores changed sensitively and appropriately with the observed clinical condition


·       Because of its subjective nature, Thymometry does have limitations as a metric. It is not an acceptable ratio scale must not be averaged or analyzed by T-tests.


·       Thymometry may however be legitimately used to compare populations.  The concept of calculating a “Bentham score” for populations is suggested as one way of doing this.  A  “Bentham score” is the sum of every individuals happiness score taken as a proportion of a notional “perfect score” in which every individual scores a maximum on the Thymometry scale i.e. exhibits the “greatest good for the greatest number of people”.  Then a simple Chi-squared can be legitimately used to estimate significance between the Bentham scores of populations or pre and post intervention scores in the same population. Fig 9


·       At present Thymometry is being successfully used in a smartphone App designed to assess response the addiction therapy q.v.


·       Because of its brevity and validity, Thymometry could become incorporated into routine clinical practice and in many other situations where the views of patients, clients, customers or citizens is needed to evaluate outcomes.




Practicable ways of measuring subjective outcomes are needed by Patients, Clinicians, Researchers, and Healthcare managers. Patients need to know how well any treatments they are offered have suited fellow sufferers and clinicians and others get additional support from this sort of subjective data when making therapeutic choices. Indeed because it is increasingly common for medical interventions to be given specifically to improve quality of life, clinicians need to have some easily applicable measure of this in order to demonstrate effective practice. While those engaged in pragmatic health service research have a special need of tools that can be applied within the natural setting without disrupting normal clinical practice. Finally Healthcare managers who have the task of allocating large resources across a wide range of clinical interventions require unbiased patient-centered outcomes that compare their options irrespective of disease groups or special pleading.

 Unfortunately all the existing tools for measuring subjective outcomes have intrinsic credibility problems and are often too specialized or of inappropriate length to achieve these ends. Ruta and colleagues 2 have suggested six features that the ideal tool should possess:


                  (1) Measure the aspects and effects of the illness that the patient decides are most important;  

(2) Enable the patient to score the chosen variables; 

(3) Be a sensitive measure of within person change over time; 

(4) Be applicable to the whole spectrum of illness seen in primary care; 

(5) Be capable of measuring the effects of a wide variety of care;

(6) Be brief and simple enough to complete in a 7-10 minute consultation. 


Achieving this is not easy but the “Thymometer” (feelings measurer) scale has been devised to try to meet all these requirements and more.

The Thymometer is similar in principle to the well-established COOP-WONCA charts 3,4,5 and the Andrews & Withey D-T scale.6 However Thymometry offers an accurate eleven-point scale of geometrically precise faces, rather than an essentially Lickert, 7 point series with indeterminate inter-point intervals. This means that Thymometry scores have greater discriminating power to compare the value of interventions in all symptomatic conditions. Furthermore, being a graphical scale, Thymometry can achieve a broad measure of independence from differences in culture, language and education. Indeed the wide bibliography on cross-cultural use etc. that has been generated for the COOP-WONCA charts is very likely to applicable to Thymometry. Finally Thymometry tackles the inherent problems of subjective variability by a process of continuous calibration to derive acceptable averages from mass data collections.


The series of studies reported later will explored the conventional measures for validity of Thymometry and compared it against Euroqol 7 Zung 8 and the McGill Pain questionnaire.9 However attempts to compare it with the most well known generic instrument the SF36 10, 11 had to be abandoned because too few patients in these natural setting studies were prepared to fill in the very long forms provided.  It is easy to agree with Jenkinson 12 that  “The SF-36 is not suitable for use within a consultation, which detracts from its clinical usefulness “ Fortunately EuroQol has been validated against SF36 13 so we have employed the Euroquol as our comparison gold standard.






Most of these studies reported here took place in general practices in Kingston& Richmond Health Authorities in London UK. The catchment area population was 320,000.


Issues to be discussed


w  How did this faces scale come to be conceived and developed

w  Do patients understand and prefer this faces scale?

w  What change on this scale do patients judge to be important to them?

w  How internally consistent, repeatable is the scale and what is its inter-rater reliability?

w  How sensitive is it to change? (Appropriate responsiveness)

w  What is its Face & Content validity?

w  What is its concurrent criterion validity against clinical records & the Euroqol as a gold standard

w  What is its Construct, Biological, Convergent & Discriminant validity?

w  How feasible is it to use in routine service and what is its power for use in trials?



The development of the Thymometer scale – The Bentham Score


Our initial studies used conventional 100mm analogue lines to get outcome scores. However this method relies on subjects conceptualizing the position on a line as being meaningful in terms of feeling and understanding the written text attached to it. This is difficult for those with language and educational difficulties so in subsequent studies investigating the symptomatic impact of a range of antibiotics on bronchitis we decided to try using a scale with a 7-point series of faces very similar to the Withey D-T scale simultaneously with 100mm analogue lines. We found that patients much preferred these faces scales and completed them twice as often as analogue lines. However there was as expected loss in power to discriminate from having so few points on the scale. We therefore concentrated on trying to achieve a satisfactory compromise by developing our scale with a larger series of faces.   


We adopted the term Thymometry for using this sort of scale as it means the measurement of feeling and had already been coined by Peck in 1967 as a measure of pain related to the number of decibels of sound.1 However, our intention was to apply it in a much more generic way. To this end we have been refining and developing it over the last thirty years, while using it in the context of a number of primary care comparative trials and surveys.


The choice of the number of points on this scale has been a contentious issue. The justification for having more than just three faces; “Happy”, “Neutral” and “Unhappy” lies in the power of the scale to make finer discrimination whenever needed. It is possible to retrospectively reduce an eleven data scale to three points see Figs 1 but not vice versa. The current use of the scale touch screen in our App. this is only a ten-point omits the extreme unhappiness point for purely aesthetic reasons. We have experimented with a twenty-one-point scale marked in percentage points above and below a neutral mid-point with alternate points both on and between faces. In use it was clear that the between-faces points were used far less frequently, so it was decided that a simple an eleven point was all that was needed. Aside from its low power of discrimination, other problem with the original Withey D-T scale was that it had unspecified expression sizes. Intuitively it was felt that the human abilities to distinguish precise definition in expressions might be important. Therefore the scale was constructed to have frowns and smiles that were mathematically exact proportions of an arc corresponding to the relevant percentage differences. The ideal version of this would be a single face on a screen in which a slider made the straight line neutral position move in 1% intervals to a maximum 100% semi circular smile or frown.

Irrespective of the number points are used on the Thymometry scale is not a ratio scale.  These, by their nature, are usually taken as indicative of an objective truth. (e.g. the ratio scale for temperature means that a temperature of –3° K is true throughout the universe). Thus, the using the average of units of measurement on ratio scales is a legitimate approach to inter-group comparison. However measures such as Thymometry are intrinsically variable both between and within individuals. Therefore it is not acceptable to think that each individual can grade the same level of happiness by the same intervals as any other. Indeed it is commonplace to observe that different subjects describe the same experience in either hyperbolic or conservative ways. Thus Thymometry is an elastic ruler that can only be used with reference back to within person measurements, i.e. as an interval scale, that is true over short time intervals within each individual. We measured its within subject repeatability after a 3 minutes as 99.7 %.


Nevertheless, even as an interval scale, group scatter-gram comparisons can be done using the idea of a reference concept. We have called this concept “the Absolute Bentham” after the great Utilitarian thinker Jeremy Bentham. Bentham argued that the utilitarian target for interventions should be  “The greatest happiness of the greatest number.”  An  “Absolute Bentham” in terms of Thymometry would be scored when every individual in a population recorded a score of 100% (i.e. as happy as possible).  Most populations contain individuals who have lower than absolute scores, if all these individual scores are summated and taken as a proportion of the “Absolute Bentham” we would get a proportion that is amenable to both statistical testing and use in graphical presentations. This helps ordinary, non-statistically trained, people to easily “see for themselves” clear cut results of any evaluation without recourse to statistics Fig 9.


The concepts of Quality of Life and Usefulness


In addition to these analytical issues, Thymometry can be applied to measure a wide range of conceptual outcomes. The Concept of Quality of Life was measured on the Thymometry scale as a response to the question “How you are feeling about every aspect of living” (e.g. 100% = As happy as possible)” and applied the Standard WHO definitions of health, to the conventional health inquiry “How are you?”


Also to undertake comparative trials on hay fever remedies, another outcome measure was developed. This was the concept of a treatment’s “Usefulness”. This required each patient undergoing treatment to synthesize into a single Thymometry score how they feel about a treatment after balancing their Symptom relief against the treatments Side effects and Ease of use.




The following groups have been studied:


·       Healthy Medical centre Staff

·       Consulting patients of all ages.

·       Patient Carers (for surrogate responses)

·       Members of the general (public for normative data).



The following types of studies have been done using Thymometry


# Comparative trials of Hay fever treatments

# Comparative studies of antibiotics for bronchitis

# Mass survey of consulting GP patients

# Comparative study of antibiotic treatments

# Touch-screen validation studies against Euroqol

# Studies on Depressed Patients

# Specific studies on clinical staff & carers

# An App for helping with harmful habits see



The majority of this study sample consisted of patients consulting their General Practitioners.  The sole entry criterion for participation was that the patient consented to complete a mark-sense form on arrival when they consulted for any presenting problem.  GPs gave no prior guidance on how to complete the forms ensuring that the data consisted of only the patient's views.

Follow up was not formalized but occurred on return for routine care and the patients previous scores were not made available to them.





The Construct, Biological, Convergent & Discriminant validity of Thymometry


Construct Validity: tests the degree to which a test measures what it claims, or purports, to be measuring. Not one of the 8648 subjects in our studies reported any difficulties in understanding what the faces meant. And Thymometry’s within subject repeatability after a few minutes was 99.7 %.


Biological Validity: tests that scientifically accurate information that is used in an unbiased way conveys a biological idea. We showed that changes in local pollen counts were reflected appropriately by changes in the Thymometry scores of hay fever sufferers.


Convergent Validity: is reflected by the results of a measure correlating with results from other measures intended to measure the same concept.

This was shown to occur in in Painful conditions where McGill pain scores appropriately correlated with Thymometry Fig 6 and in Depression where Zung scores correlates well with Thymometry Fig 7. The change in Thymometry scores in depression at two weeks and at four weeks showed a consistent gradient across the spectrum of clinical change. The index of responsiveness 14 was calculated as the change in scores of patients reporting themselves "a little better" divided by the SD of change in scores for patients reporting themselves "about the same"; it was not calculated for patients reporting "a little worse" because of small numbers in this group


Discriminant validity: tests whether concepts or measurements that are supposed to be unrelated are, in fact, unrelated.  This has not yet been formally checked.


Sensitivity to Change:  This is tested by appropriate responsiveness to situations. It was shown to occur in studies on Antibiotics. See Fig 8.  Also a further study was done on our Depressed patients asking them the question "I would be happy to try any suitable treatment if it made me better by a change of how many …faces ? Responses ranged from 0.5 to 3 with an average score of 1.625. Therefore a two-point change on this scale was be considered to be “important” to patients.








It is now becoming widely accepted that seeking guidance from formal outcome evidence could prove a useful way of dealing with the complexities and expense of modern healthcare.  Not only might investigating effectiveness improve quality, but also comparative studies of efficiency might be the best way to contain costs. Outcome research should expose those clinical activities which, however well intentioned and established, are either merely placebos or on more detailed examination found to be harmful to patients. It should also highlight areas where resources could beneficially be redeployed e.g. shifting from hospital to community services.


However it is well recognized that in a few serious conditions, such as hypertension and malignancy, underlying health may not be reflected by the presence of distressing symptoms and or the perception of lowered quality of life. Nevertheless most causes of ill health do cause unpleasant symptoms either immediately or eventually and having a suitable tool that can easily capture generic symptom change should prove of practical use, provided reasonable caveats are observed.


Even busy front-line clinicians now have access to IT decision support based on the collated results of outcome research such as the Cochrane reports. However because current clinical practice is so diverse and intricate, it has rarely been possible to get large enough numbers recruited into double blind trials to satisfactorily discriminate between the many interventions available for common conditions.  Indeed the costs and difficulties of doing such gold standard studies, giving sick people placebos, persuading clinicians to randomize etc. does suggest that we should consider less ambitious approaches to influencing practice. Having a simple measure like Thymometry that can easily be incorporated into routine clinical data-gathering and then the results automatically analyzed and fed back, could improve the management of all situations where the goal is to improve quality of life or symptoms.


Thymometry has been designed to serve this purpose and it provides a useful measure of within-person changes over time. From the many studies we have reported here Thymometry appears to be both valid and responsive. Despite its brevity it emerged as appropriately responsive to both interventions and secular changes. The two innovative concepts were required to create this scale. The first was to use a mathematical exact expression to reflect back to the individual his own General feelings over a period of recent time. The second was to continuously calibrate this scale by the concomitant measurement of how the users ascribed importance to change on the scale.


Despite its simplicity Thymometry is based on two paradigm shifts. Firstly, instead of focusing on specialized details of disease assessment to measure change, it confines its assessment of outcome to the measurement of a patient’ s overall perceptions of change and value.  Secondly, it takes a view of health problems, which assumes that most diseases can be usefully generalized as being either temporary or chronic disturbances of physiological regulation, which are frequently reflected by patients’ current symptomatology.  It is the interaction of treatment effects with this symptomatology that produces a patient’s perception of change and value.  Thus, by design, it uses the patient’s own synthesis of events in response to therapy to evaluate any community or healthcare intervention, this occurs irrespective of the nature of the condition it is intended to ameliorate but also obviously includes some measure of placebo effect. Thymometry itself is a hybrid tool; using a combination of speed of response, change in a Health Index and the patient’s endpoint assessment of the usefulness of interventions.  Within its limitations it could be used in wide range of situations.


Acknowledgements: I am grateful for the collaboration of all the patients and General practitioners in Kingston upon Thames & Richmond. I am also grateful to Lilly and Upjohn for their financial support.  


Conflicts of interest: None.


Note on Copyright: Patents can obstruct pro bono science so Thymometry has copyright but no patent.  The sole purpose of this is to prevent commercial agencies profiting from using it without giving a share of their earnings to pro bono research. Most pro bono researchers will be able to use it for free in exchange for a copy of their completed database, which we can add to our pooled information source.






1. R.E Peck - Headache: The Journal of Head and Face Pain, 1967 - Wiley Online Library


2. Wilkin D, Hallam I, Doggett M. Measures of need and outcome for primary health care. Oxford: Oxford University Press, 1992


3. Ruta DA, Garratt AM, Leng M, Russell IT. A new approach to quality of life: the patient-generated index. Medical Care 1994; 32: 1109-26.


4.Kinnersley P, Peters T, Stott N. Measuring functional health status in primary care using the COOP-WONCA charts: acceptability, range of scores, construct validity, reliability, and sensitivity to change. Br J Gen. Pract.  1994; 44:545-9


6.Wong-Chung D, Mateijsen, N, West R, Ravel, L, Van Weel C. Assessing the functional status during an asthma attack with Dartmouth COOP charts. Family Practice 1991; 8:404-8.


7.Yodfat Y. Functional status in the treatment of heart failure by captopril: a Multi-centre, controlled, double blind study in family practice. Family Practice 1991; 8:409-11.


8 Andrews F.M. & Withey S.B. Social Indicators of well being: Americans perceptions of Life Quality. New York, 19 Plenum Press


9 G Beaumont - Human Psychopharmacology: Clinical and 1994 - Wiley Online Library

 Melzack, 1975), the New York Heart Association Index (Kossmann, 1964), Cancer Inventory of
Problem Situations (Heinrich et al., 1984), the Sickness Impact Profile (Bergner et al., 1981), the
Nottingham Health Pro- file (Hunt et al., 1985) and Euroquol (Euroquol Group, 1990). ...


8  Zung Self-Rating Depression Scale - The Zung Self-Rating Depression Scale was designed by Duke University psychiatrist William W.K. Zung MD (1929-1992)


9 The short-form McGill Pain Questionnaire.

by R Melzack - ‎1987 - ‎



10.Garratt A, Ruta D, Abdulla MI, Buckingham JK, Russell IT. The SF-36 health survey questionnaire: an outcome measure suitable for routine use in the NHS?  BMJ 1993; 306:1440-4.


11.Brazier JE, Jones NMB, O'Cathain A, Thomas KJ, Usherwood T, Westlake L. Validating the SF-36 health survey questionnaire: a new outcome measure for primary care. BMJ 1992; 305:160-4. [Medline]


12 Jenkinson, C. (1996). MYMOP, a patient generated measure of outcomes. BMJ 313: 626-626 [Full text]



13  EuroQoL Quality of Life Scale (EQ-5D)

BioPsychoSocial Assessment Tools for the Elderly - Assessment Summary Sheet . Test: EuroQoL Quality of Life Scale (EQ-5D). Year: 1990; revised 1993.


14 Defining and applying the concept of quality of life.

by D Felce - ‎1997 - ‎Cited by 271 - ‎Related articles

J Intellect Disabil Res. 1997 Apr;41 ( Pt 2):126-35.


15 Hay fever Treatments - Which should be Tried First? M.F.D’Souza, M.Tooley, J.R.H.Charlton  (l987) J.Royal College of General Practitioners 1987; 37: p.296-30.


16 A Method for Evaluating Therapy for Hay fever - A Comparison of Four Treatments. Charlton et al J.Clin.Allergy 1983; 13: p.329-335.


17 A Bowling (1991) Measuring Health - A review of quality of life measurement scales ISBN 0-335-15435-2 Open University Press


18.Guyatt GH, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chron. Dis 1987; 40:171-8.


19.Guyatt GH, Kirshner B, Jaeschke R. Measuring health status: what are the necessary measurement properties? J Clin Epidemiol 1992; 45:1341-5.


20.Guyatt GH, Eagle DJ, Sackett B, Willan A, Griffith L, McIlroy W, et al. Measuring quality of life in the frail elderly. J Clin Epidemiol 1993; 46:1433-44.


21 Cairns, J. (1996). Measuring health outcomes. BMJ 313: 6-6 [Full text]


22 Primrose, W R, Seymore, D G, Ball, A E, Russell, E M (1996). Rate of completion of forms should have been analysed by age. BMJ 313: 626-626 [Full text]


23 Ruta, D., Garratt, A. (1996). Reliability of such instruments needs to be proved. BMJ 313: 626-627


24 J.R.H.Charlton, M.F.D’Souza, M.Tooley, R.Silver (l985) A Community Trial Strategy for Evaluating Treatment for Symptomatic Conditions Statistics in Medicine, Vol.4, 11-21.


25 Katz JN, Larson MG, Phillips CB. Comparative measurement sensitivity of short and longer health status instruments. Medical Care 1992; 30:917-25.


26 John Helliwell, Richard Layard and Jeffrey Sachs 2011:

World Happiness Report

Enter supporting content here