Dear Tom, I had a pleasure of reading your paper "An Improved Method for Analyzing Sparse and Irregularly Distributed SST Data on a Regular Grid: The Tropical Pacific Ocean," written in collaboration with Robert Livezey and Samuel Shen, and here are my remarks. First, I go through the paper and make some comments, then I make general conclusions. I will refer to figures R1-4 which I produced to illustrate my points and which you can reach at http://rainbow.ldeo.columbia.edu/~alexeyk/GlobSSTpaper/index.html (let me know if you have any problems with accessing them -- I always can put them into our anonymous ftp directory, if this is preferred). I will also refer to our Atlantic paper (AP) where the reduced space OS is introduced, and to our global SST paper (GP) which describes the global analysis. Abstract. Second to last sentence. ". . . only very weak SST anomalies may be reliably reconstructed. . ." I think this statement is unwarranted. In the context of optimal estimation, under the assumption that you know space and time covariances, you will recover the optimal solution which does not have to be weak. I guess you mean here your negative experiences with estimation of NINO3.4 for periods before 1870 and 1915-1925. However, this does not infer this sweeping statement, as for example our estimate of NINO3.4 for these periods are of believable size, and their reliability is measured by their theoretical error estimates (GP, Fig. 17). P.3, Equation (1). Despite it looks exactly as SRLS, there is an important hidden difference: climatology mu(n) here depends on time, while in SRLS it was constant for the entire reconstruction period. P.4. Item (2.4). "Modes of spatial variability that are not supported by the available data . . ." In the context of optimal interpolation, if the covariances are precisely correct, all modes *are* supported by the available data. P.4-6. Section 2.1 Removal of Low Frequency Anomalies. I think what you are doing here is extremely dangerous. Some problems of significant differences between averages of different length intervals you sensed yourself (P.5, 2nd par). For another illustration look at my Fig R1, which shows XT profile of MOHSST5 data for the equatorial Pacific (averages of two 5x5 boxes bordering on equator). In the area (130W-140W, Eq) there are almost no data available at all until 1960s. So there are just no way to properly estimate 181 month means for this location. You somehow manage to do it, probably because you use your 5 points zonal and 3 points meridional filtering of the data in the interpolative mode. This means that the estimation of the low frequency anomaly in this dynamically very important location is completely at the mercy of your (arbitrary) interpolation (smoothing) procedure. Furthermore, my guess is that in your analysis procedure the long-term trends of the solution are defined by these low frequency anomalies, which are defined in such an inaccurate crude way. This means that in terms of long-term trends your analysis does not add any additional information to the trends of primitively smoothed raw data -- a great loss. In our analysis we do not do anything like that, and I think we are better off without it. Parker et al. in their GISST2.2 do something like that -- separation of global warming trend, and I have some arguments towards that they should not do that; still, what they are doing is not as dangerous as what you do. P. 8, Formula (3). Is this a completely heuristic formula serving your purpose, or there is some statistical meaning behind it? P. 8, bottom. "Since the recentered anomalies are built up by a series of increment fits, with an assumption of zero anomaly first guess for the first analysis, projection of the anomalies onto the increment EOFs is justified and completely represent G." Well, since your analysis represents *the entire anomalies* by the set of increment EOFs anyway, what difference does it make which set of EOFs to use? Moreover, the set of the EOFs of total anomalies (which we use) *is* in some sense optimal for the representation of them. Surely, it will resolve OK increments also. Then why are you so anxious to use the EOFs of increments? P.9, bottom. "...show anomalies between +1 and +4 C..." I do not see these anomalies in MOHSST5 dataset, not in March 1985 anyway (see my Fig R3). Is something wrong with your data? P. 10, 2nd par. You throw away entire "superobservations" for 5x5 boxes. I think it is too much. If a superobservation is suspicious, the estimate of its observational error (in our analysis framework) can be increased, but throw it away completely . . . I think the data is too precious to do that, and the percentage of the data you throw away (3-8%) represents really huge amount of data (in individual ship reports). P 13. 3rd par, and table 1. There are similar experiments described in GP, page 14, starting from the 2nd par. If the numbers from them .37 for 80s and .48 for 10s are squared, to compare them with table 1, we'll get 0.14 and .23, which is better than roughly corresponding numbers in the table 1. Note that our numbers are for the entire globe, the average for the tropical Pacific will be essentially smaller. Also, in the table 1 you may want to separate the discrepancy between NCEP OI and your test fields into two parts: the one projected onto your EOFs, and that which is not (like we did it in GP) and then optimize only projected part, as you have no control on unprojected part of NCEP OI anyway. P. 15, inside 2nd par. ". . . a larger portion of the signal strength is also captured by the new method (Fig. 3)" How come? With 8 patterns you capture more of NCEP OI variance than with 20+ of SRLS? Next sentence "Incrementally adjusting a first guess, rather than gridding a full (recentered) anomaly is one reason for the new method improvement" does not really explain anything, since, as I wrote above, entire field of your new analysis is anyway getting represented by your increment EOFs. P. 15, last par, and Fig 4. I think you make here extremely unfair comparison to us. Our analysis is on 5x5 grid, and you compare it to NCEP OI on 1x1 grid. The fair comparison would be to average NCEP OI on 5x5 grid and show the RMS difference. Precisely such figure is shown in GP as Fig 6c. The maximum in the equatorial Pacific there is 0.5 C -- smaller than the maximum on panels of your Fig. 2. I understand that you want to show the comparison on 1x1 grid, and, I guess, you linearly interpolate our analysis on such grid and show the comparison with NCEP OI -- maximum value 0.9 C. You do not expalain to the reader that you linearly interpolate our analysis, and that because of that the RMSE you present contains all actual NCEP OI 1x1 variability which was not *supposed* to be resolved in our 5x5 analysis. By doing that you actually create a misleading impression that our *method* is inferior to your new method. In fact, exactly opposite is true. If we run the OS analysis by our standard technique with 1x1 EOFs from NCEP OI for the tropical Pacific MOHSST5 data only, the RMSE with NCEP OI will be not larger but smaller than yours (Fig. R2, bottom panel), and its SD will be not smaller but slightly larger than yours (Fig. R2, top panel). The difference between your Fig. 4 and my Fig. R2 is not in the *method* but in the settings (choice of EOFs, resolution). Settings of GP were somewhat different than of your new analysis, so that Fig. R2 provides a fair comparison, while your Fig. 4 is unfair. The following sentence "Several factors may contribute to the excessive flattening in K97, including their use of EOFs based on 5 Deg ship data, which are more noisy and less sharp than 1 Deg OI data" does not do any justice, as, of course if the purpose is to produce the analysis which looks like NCEP OI on 1x1 grid, then obviously there is no better choice for EOFs than those of NCEP OI on 1x1 grid, and we did not do this in K97, because we were playing a different game, with somewhat different goals. P. 16. 1st sentence. "Other possible factors include their use of a large number of EOFs, which may compromise the goodness of fit for those modes that are supported . . ." We developed some tuning technique for choosing optimal number of EOFs (Fig 4 of AP, Fig 5 of GP, and corresponding texts), and applied it. Also Fig. R2 seems to prove that we *do not* compromise the goodness of fit. ". . . and their approach of gridding the entire anomaly using the EOFs, which may not accurately represent all of the anomaly variance" Once again, with you incremental EOFs finally you *do* represent entire anomaly. So what the difference does it make which set is being used? Now, in your paper you use for illustration the analyses for the following 12 individual months (in chronological order) Jan 1910 ensotime Jan 1912 ensotime Jan 1940 ensotime Jan 1943 ensotime Jan 1956 ensotime Jan 1958 ensotime Jan 1973 ensotime Jan 1974 ensotime Jan 1983 ensotime Mar 1985 ensotime Jul 1987 ensotime May 1988 ensotime In this order I illustrate these months by the rows of multi(3)page Fig. R3. Each row contains: MOHSST5 observations for the month, our "standard" (GP) analysis (5x5 resolution), similar analysis for Pacific-only data with EOFs from NCEP OI with 5x5 resolution, and same as previous with 1x1 resolution (that should be directly compared to your new analyses, and was used in Fig R2). P. 16. End of 2nd par. "Mostly because of our data checking, the new method avoids much of the problem." (and also following sentence). I think this is an overstatement. Checking or no checking, your method produces +0.5 anomaly where it should be -0.5, and in a very dynamically important place of the tropical Pacific. I think this should be openly recognized as a failure of the method. What amazes me, however, where are these bad data in the Eastern Eq. Pac? I don't see them in the Mar 1985 MOHSST5 field (Fig R3). I only see one little 0.5 C bullseye around 140W. Is this what spoils your analysis? And what are you checking out for this month? P. 17. 2nd par. You are talking here that our anomalies are "considerably weaker" and more "spatially spread out" than your, ignoring the fact that ours are presented on 5x5 grid. I think this is unfair comparison and unfair statement. Average your anomalies on 5x5 grid, and then we will see whose are weaker and whose are more spread out! Or, alternatively, we can look at the Fig R3, where in the last column I show our analysis for the tropical Pacific only, for NCEP OI EOFs, with 1x1 resolution. For Jan 1983 we reach the maximum of 5 C (while NCEP OI barely reaches 5.5C), and I think this is OK -- after all we are not supposed to resolve all the variance. Your new analysis (as well as SRLS; your Fig 5) actually reaches larger values (5.5 on larger area) that NCEP OI -- this is not supposed to happen, and I think it is quite suspicious. Our 1x1 field also better reproduces NCEP OI field in the NW part of the analysis domain. "For March 1985 the K97 analysis effectively smoothes out the influence of the bad SST superobservations in the eastern-equatorial Pacific, but again it tends to spatially spread out the anomaly too much, particularly -1C anomaly near 165W which is given a north-south spread." Frankly, I find paternalistic tone of this statement intolerable, since this "smoothing out the influence of the bad SST superobservations" is the major problem for your analysis for this month, and the analysis you produced for this month is completely dynamically wrong on the equator. Now, we produced much better field, and you accentuate "a north-south spread?" Well, as far as this spread is concerned, it appears in our analysis as a connection (on the sparse 5x5 grid) of two -1C minimums, one on the equator and another on 10N (this second minimum is completely missing in your new analysis, BTW). If we use NCEP OI set of EOFs, this spread disappears (and the 10N minimum, unfortunately, becomes weaker -- but still exists! -- in our analysis). "A similar excessive weakening and spreading of the anomaly is evident in the K97 analysis for May 1988." What happens for 1988 is much more complicated, and we describe it in detail in GP. and illustrated it by Figures 13 and 14. I also had a detailed discussion with Kevin Trenberth about this particular month. You might want to look to this discussion presented at http://rainbow.ldeo.columbia.edu/~alexeyk/GlobSSTpaper/index.html Simply speaking, EOFs of ship data *are* seriously deficient for analyzing this months. If we use NCEP OI EOFs at 5x5 resolution (Fig R3), things get much better. Our analysis at 1x1 resolution (Fig R3) is actually better (closer to NCEP OI) than your new analysis. P. 17. 3rd par, last sentence. "80 observations" Should be "superobservations"? P. 18. 1st par. Fig 12. I think that general tendencies are very similar to what can be concluded from Fig. 2 of GP. P. 18, 2nd par, Fig 13. On the figure R4 I show our estimates for NINO3.4 from two sources: our "standard" analysis of GP (red lines) and our analysis of the tropical Pacific only with NCEP OI EOFs (black lines). For each analysis i show two products OS (solid lines) which uses time connections from both directions and KF (dash lines) which uses only connections from previous times. In principle the difference between solid and dash lines should be comparable with half of the difference between curves on your Figure 13. In reality our differences for Pacific-only analysis are incomparably smaller than yours, implying, by your own argument, much higher stability of our method. Needless to say, for our "standard" analysis which uses all the data on the globe, and thus even more robust, these differences are almost invisible. Unless sampling is very bad, all four curves are very close on Fig. R4. This demonstrates that for integral quantities like NINO3.4 under decent sampling, it does not really matter which set of EOFs to use. When sampling is really bad, the "standard analysis" makes believable corrections to the flaws of Pacific-only run (1865-1870, for example). P. 18, last par. "Large differences between the forwards and backwards run indicate times when we should have less confidence in the analysis due to insufficient sampling." Even more sensible measure would be error estimates for spatial averages, which our method provides (AP,GP). P. 19. 2nd par. "While broadly similar, the K97 anomalies for 1910 and 1912 are weaker and broader". Once again, our anomalies are on 5x5 grid, and yours are on 1x1. To make proper comparison you have to average yours on 5x5 grid. Or alternatively, look at our 1x1 analyses on the Fig. R3 -- are they weaker and broader? "The K97 bulls eye in the Central Pacific is in a sampled region . . ." Well, for one thing, this -1.5 C anomaly has way too large scale to be called a "bulls eye." And it is in the sampled region, as you noted it, so its magnitude is probably correct. Your analysis for the same month, however, shows -2 C anomaly of larger magnitude, smaller spatial scale in *non-sampled* region -- this is more like a bulls' eye as a result of overfitting! I suggest you either remove this "bulls' eye" comment about us, or add one about your analysis for the same month. ". . . the K97 analysis uses only three months of data for any analysis . . . " This is a WRONG STATEMENT. Our analysis is the result of global in time and space minimization of cost function, and as such it uses the data from entire 1856--1991 period for analyzing every month (that's why it is called Optimal Smoother!). ". . . more sensitive to the formation of bulls eyes associated with isolated data. The new analysis . . ." I'll be happy to discuss this in a greater detail, but now I believe that our method will be systematically better in preventing bulls eyes than yours. P. 19, Last par. "In 1943 the K97 analysis is weaker on equator . . ." Once again, you cannot directly compare weakness for two fields on different gridsizes, but worse than that, you somehow "lost" -2 C contour in the presentation of our Jan 1943 field (compare your Fig 18 and my R3). With this contour put back, our anomaly on 5x5 grid is stronger than yours on 1x1 ! P. 20, 2nd par. "The K97 analysis for January 1956 and 1958 shows isolated bullseyes in both the cool and the warm patterns" These are of too large scales for bulls eyes. We change the set of EOFs (Fig. R3) and we get something quite similar to yours. ". . . caused by that method's slightly greater sensitivity to sampling gaps". We can discuss it, but I strongly doubt that our method has greater sensitivity to sampling gaps than yours. And mentioned above comparison with Fig. R3 supports my opinion. "For January 1973 and 1974 the K97 analysis has anomalies spread out north-south far more than the new method's anomalies. Also in 1973, . . ." Once again, this is a question of resolution. We changed EOFs (see R3), and the difference goes away, right? But once again, you lost -2 C contour in presenting our analysis on your Fig. 22, bottom (compare with R3!). P. 22, 1st par. "Comparisons shown here indicate that our method may better represent the shape and strength of anomalies. . ." I don't think this was actually shown. With the evidence I provided above, exactly the opposite seems more plausible. CONCLUSIONS 1. Separate accounting for low frequency variability is a very attractive idea, but i don't think that the solution you suggest is a good one. I think it is pretty dangerous to estimate this variability as simple 15 years running means from raw data. There must be better ways to estimate this variability. 2. Connection between reduced space OS and your new method is closer than one could expect. It is known (Gelb's book, for example) that OS can be presented as an optimal linear combination of two going in opposite directions in time KF runs. What you present in your new method is a (non-optimal) simple average of two runs. Both these runs can be actually interpreted as a KF for the following system. All observations have equal size white in time and space errors. The model of time transitions is diagonal in EOF space and depends on data sampling at the next step (we can consider data sampling for the entire dataset known in advance). It predicts simple persistence for supported modes and damped persistence for unsupported. Error of this model is assumed to be diagonal in EOF space, infinitely large for supported modes, and infinitely small for unsupported. I claim such a system will produce a KF identical with your forwards and backwards runs. Now, the model I described is clearly suboptimal (inferior to the one we use in GP), with inadequate error, and two individual runs are combined into a final product suboptimally. Specification of observational errors as uniform is also clearly inadequate. This gives almost a precise mathematical statement that your method must be inferior to the reduced space OS. 3. From ideological point of view we start from strict statistical system of assumptions, produce exact solution, make sure that it is consistent with the a priori assumptions, obtain theoretical error estimates for the solution; spatial averages of such solutions are what is called "Optimal Averages" (Smith et al 1994), and their theoretical errors also can be estimated. Compared to that, your scheme, though quite sophisticated, is very arbitrary. 4. I suggest that in a collaborative effort we generally assume *our* scheme, for the reason of 2 and 3 above, without low frequency variability accounting for the reason of 1, or with a special effort designated to accounting for it. I suggest that we use your "measures of support" for diagnostic purposes, and that we incorporated your data checking technique, but not remove the data, but rather increase heuristically its observational error a priori estimate.