From Chemometrics to Genealogy

Bernard Vandeginste

  Home :: Contact :: Syndication  :: Login
  34 Posts :: 25 Stories :: 87 Comments :: 0 Trackbacks

News

Welcome to the weblog of Bernard Vandeginste. I hope you will enjoy my articles on events (gebeurtenissen), professional items in the field of chemometrics and also on my hobby, the genealogy search of my ancesters.

Follow me on twitter

Article Categories

Archives

Post Categories

My Blog

My Blogs I read

My Chemometrics links

My Genealogy links

My Qualimetrics links

Friday, February 25, 2011 #

Test results of EQAS samples reported by a Laboratory are transformed into a z-score. Absolute z-scores below 2 are satisfactory, between 2 and 3 are questionable and over 3 are unsatisfactory (see my blog on z-scores). Under normal circumstances we expect that by natural variability scores will vary between -2 and 2 and that by exception a value below -2 or over +2 is found. However, also in the case that all z-scores obtained over time remain within the interval [-2,+2] there might still be a problem with the laboratory.

Indeed, it is very unlikely that a laboratory would generate a series of z-scores all with equal sign. This indicates the presence of a laboratory (+method) bias. This can be checked by evaluating the RSZ = Sum(z)/sqr(m) statistics (m = number of z-scores) (RSZ = Rescaled Sum of Z). Moreover it is very unlikely that all the time z-scores are found at one of the extremes of the [-2,+2] interval. This is evaluated by calculating the RSSZ statistics (Reduced Sum of Squares of Z) = Sum(z2)/m. Both RSZ and RSSZ test values can be evaluated by using the appropriate statistical test.

It is quite striking that several accreditation bodies (e.g. the Standards Concil of Canada, the European Department for the Quality of Medicines of the Council of Europe and the CANMET-MMSL ISO document ) incorporate this time aspect in their guidelines and WADA does not. This despite the fact that in doping testing the detection of systematic errors (or bias) is of great importance.

We should nevertheless warn against an over-interpretation of z-scores

- Comparing z-scores between rounds or between laboratories has to be done with great caution. A single laboratory operating consistently in line with the fitness for purpose criterion would typically produce z-scores in successive rounds covering the range –2 to +2: the following set [0.6, -0.8, 0.3, 1.7, 0.7, -0.1] would be typical. The small ups and downs between the scores do not indicate a change in performance – they arise by chance. So 1.7 is not ‘worse’ than 0.3: it does not indicate deterioration in performance!!

- Because of this ‘natural variation’ it is not sensible to make a ‘league table’ of laboratories (or to attribute points) based on their z-scores in a round. It is not valid to claim that a laboratory scoring 0.3 in a round is better than another scoring 1.7.

- Judgments based on a time-averaged z-scores require caution as well. Averages of z-scores obtained on a number of different analytes should not be used: they may well hide the fact that one of the analytes consistently gives a poor z-score. Averages of scores from the same analyte over several rounds need expert interpretation on a statistical basis, as we indicated above.

posted @ 11:58 PM | Feedback (14)

The WADA Athlete Biological Passport Guidelines (Jan 2010) prescribe that each athletes' blood sample is analyzed in duplicate. Because of instrumental variation and for other reasons these duplicates will always give a different result. The Guideline defines a maximally accepted difference between duplicates for Hemoglobin (0.1 g/Dl) and for Reticulocytes Percentage (0.15 if first measurement < 1% and 0.25 if first measurement > 1%). In principle the accepted maximum  difference between duplicates should be equal to the repeatability of the method. By imposing these values WADA requires a certain minimal performance. Several aspects need some more discussion.

First one should define what a duplicate analysis really is. From the Guideline it appears that WADA means a duplicate injection. Such (bad) practice only includes the instumental variation and disregards the effect of a poor sample preparation (e.g. homogenization). In analytical method validation practice all steps should be duplicated, including the sample preparation. I gues this also applies to clinical tests.

Second, both test results, the 1st one and the 2nd one have the same status. There is no statistically basis for deciding which test result to be reported. It is therefore unclear on which grounds WADA decided that the first result should be reported only and e.g. not the mean of these two results. Two analyses have been performed and it is logical to use all information obtained on the athletes blood sample.

Third, the Guideline is quite vague on what to do when a duplicate analysis fails the requirement of maximum difference. It mentions that a new duplicate analysis is necessary. This implicitely means that the results of the first duplicate are void. However, a too large value between duplicates in fact means that the method  performs outside specifications. In principle this should be evaluated by the analysis of control samples. If the deviation persists, then instrument or other factors should be checked. It is only when the method is back within specifications that the measurement of samples can be continued. In this case it is logical that the results obtained with the second duplicate are reported. Without including such additional check with control samples and/or indication of an instrument malfunction, four (statistically) equivalent test results are collected on a single sample. Again there is no statistical evidence for only reporting the first one of the second duplicate. It would be better to consider all four results together and to check for a possible outlier.

The way it is now described by WADA, a test laboratory may repeat duplicate analysis until (by chance) good duplicates are obtained and then report the so called 'best' result. Such practice is unacceptable as it masks possibly large test variations.

posted @ 10:29 AM | Feedback (14)