Dear Tom,

I had a pleasure of reading your paper "An Improved Method for
Analyzing Sparse and Irregularly Distributed SST Data on a Regular
Grid: The Tropical Pacific Ocean," written in collaboration with
Robert Livezey and Samuel Shen, and here are my remarks.  First, I go
through the paper and make some comments, then I make general
conclusions.  I will refer to  figures R1-4 which I
produced to illustrate my points and which you can reach at 
http://rainbow.ldeo.columbia.edu/~alexeyk/GlobSSTpaper/index.html
(let me know if you have any problems with accessing them -- I always
can put them into our anonymous ftp directory, if this is preferred).
I will also refer to our Atlantic paper (AP) where the reduced
space OS is introduced, and to our global SST paper (GP) which
describes the global analysis.

Abstract. Second to last sentence. ". . . only very weak SST anomalies
may be reliably reconstructed. . ." I think this statement is
unwarranted. In the context of optimal estimation, under the
assumption that you know space and time covariances, you will recover
the optimal solution which does not have to be weak.  I guess you mean
here your negative experiences with estimation of NINO3.4 for periods
before 1870 and 1915-1925.  However, this does not infer this sweeping
statement, as for example our estimate of NINO3.4 for these periods
are of believable size, and their reliability is measured by their
theoretical error estimates (GP, Fig. 17).

P.3, Equation (1).  Despite it looks exactly as SRLS, there is an important
hidden difference: climatology mu(n) here depends on time, while in
SRLS it was constant for the entire reconstruction period.

P.4. Item (2.4).  "Modes of spatial variability that are not supported
by the available data . . ."  In the context of optimal interpolation,
if the covariances are precisely correct, all modes *are* supported by
the available data.

P.4-6. Section 2.1 Removal of Low Frequency Anomalies.  I think what
you are doing here is extremely dangerous. Some problems of
significant differences between averages of different length intervals
you sensed yourself (P.5, 2nd par).  For another illustration look at
my Fig R1, which shows XT profile of MOHSST5 data for the equatorial
Pacific (averages of two 5x5 boxes bordering on equator).  In the area
(130W-140W, Eq) there are almost no data available at all until
1960s. So there are just no way to properly estimate 181 month means
for this location.  You somehow manage to do it, probably because you
use your 5 points zonal and 3 points meridional filtering of the data
in the interpolative mode.  This means that the estimation of the low
frequency anomaly in this dynamically very important location is
completely at the mercy of your (arbitrary) interpolation (smoothing)
procedure.  Furthermore, my guess is that in your analysis procedure
the long-term trends of the solution are defined by these low
frequency anomalies, which are defined in such an inaccurate crude
way.  This means that in terms of long-term trends your analysis does
not add any additional information to the trends of primitively
smoothed raw data -- a great loss. 
     In our analysis we do not do anything like that, and I think we
are better off without it.  Parker et al. in their GISST2.2 do
something like that -- separation of global warming trend, and I have
some arguments towards that they should not do that; still, what they
are doing is not as dangerous as what you do.


P. 8, Formula (3). Is this a completely heuristic formula serving your
purpose, or there is some statistical meaning behind it?

P. 8, bottom. "Since the recentered anomalies are built up by a series
of increment fits, with an assumption of zero anomaly first guess for
the first analysis, projection of the anomalies onto the increment
EOFs is justified and completely represent G." Well, since your
analysis represents *the entire anomalies* by the set of increment
EOFs anyway, what difference does it make which set of EOFs to use?
Moreover, the set of the EOFs of total anomalies (which we use) *is*
in some sense optimal for the representation of them.  Surely, it will
resolve OK increments also.  Then why are you so anxious to use the
EOFs of increments?

P.9, bottom. "...show anomalies between +1 and +4 C..."  I do not see
these anomalies in MOHSST5 dataset, not in March 1985 anyway (see my
Fig R3).  Is something wrong with your data?

P. 10, 2nd par. You throw away entire "superobservations" for 5x5
boxes.  I think it is too much.  If a superobservation is
suspicious, the estimate of its observational error (in our analysis
framework) can be increased, but throw it away completely . . . I
think the data is too precious to do that, and the percentage of the
data you throw away (3-8%) represents really huge amount of data (in
individual ship reports).

P 13. 3rd par, and table 1.  There are similar experiments described
in GP, page 14, starting from the 2nd par. If the numbers from them
.37 for 80s and .48 for 10s are squared, to compare them with table 1,
we'll get 0.14 and .23, which is better than roughly corresponding
numbers in the table 1. Note that our numbers are for the entire
globe, the average for the tropical Pacific will be essentially
smaller.
    Also, in the table 1 you may want to separate the discrepancy
between NCEP OI and your test fields into two parts: the one projected
onto your EOFs, and that which is not (like we did it in GP) and then
optimize only projected part, as you have no control on unprojected
part of NCEP OI anyway.

P. 15, inside 2nd par.  ". . . a larger portion of the signal strength
is also captured by the new method (Fig. 3)" How come? With 8 patterns
you capture more of NCEP OI variance than with 20+ of SRLS? Next
sentence "Incrementally adjusting a first guess, rather than gridding
a full (recentered) anomaly is one reason for the new method
improvement" does not really explain anything, since, as I wrote
above, entire field of your new analysis is anyway getting represented
by your increment EOFs.

P. 15, last par, and Fig 4. I think you make here extremely unfair
comparison to us. Our analysis is on 5x5 grid, and you compare it to
NCEP OI on 1x1 grid.  The fair comparison would be to average NCEP OI
on 5x5 grid and show the RMS difference. Precisely such figure is
shown in GP as Fig 6c. The maximum in the equatorial Pacific there is
0.5 C -- smaller than the maximum on panels of your Fig. 2.  I
understand that you want to show the comparison on 1x1 grid, and, I
guess, you linearly interpolate our analysis on such grid and show the
comparison with NCEP OI -- maximum value 0.9 C.  You do not expalain
to the reader that you linearly interpolate our analysis, and that
because of that the RMSE you present contains all actual NCEP OI 1x1
variability which was not *supposed* to be resolved in our 5x5
analysis.  By doing that you actually create a misleading impression
that our *method* is inferior to your new method. In fact, exactly
opposite is true.  If we run the OS analysis by our standard technique
with 1x1 EOFs from NCEP OI for the tropical Pacific MOHSST5 data only,
the RMSE with NCEP OI will be not larger but smaller than yours
(Fig. R2, bottom panel), and its SD will be not smaller but slightly
larger than yours (Fig. R2, top panel).  The difference between your
Fig. 4 and my Fig. R2 is not in the *method* but in the settings
(choice of EOFs, resolution).  Settings of GP were somewhat different
than of your new analysis, so that Fig. R2 provides a fair comparison,
while your Fig. 4 is unfair.
   The following sentence "Several factors may contribute to the
excessive flattening in K97, including their use of EOFs based on 5
Deg ship data, which are more noisy and less sharp than 1 Deg OI data"
does not do any justice, as, of course if the purpose is to produce
the analysis which looks like NCEP OI on 1x1 grid, then obviously
there is no better choice for EOFs than those of NCEP OI on 1x1 grid,
and we did not do this in K97, because we were playing a different
game, with somewhat different goals.

P. 16.  1st sentence. "Other possible factors include their use of a
large number of EOFs, which may compromise the goodness of fit for
those modes that are supported . . ." We developed some tuning
technique for choosing optimal number of EOFs (Fig 4 of AP, Fig 5 of
GP, and corresponding texts), and applied it.  Also Fig. R2 seems to
prove that we *do not* compromise the goodness of fit.
   ". . . and their approach of gridding the entire anomaly using the
EOFs, which may not accurately represent all of the anomaly variance"
Once again, with you incremental EOFs finally you *do* represent
entire anomaly.  So what the difference does it make which set is
being used?

Now, in your paper you use for illustration the analyses for the
following 12 individual months (in chronological order)
Jan 1910 ensotime
Jan 1912 ensotime
Jan 1940 ensotime
Jan 1943 ensotime
Jan 1956 ensotime
Jan 1958 ensotime
Jan 1973 ensotime
Jan 1974 ensotime
Jan 1983 ensotime
Mar 1985 ensotime
Jul 1987 ensotime
May 1988 ensotime
In this order I illustrate these months by the rows of multi(3)page
Fig. R3.  Each row contains: MOHSST5 observations for the month, our
"standard" (GP) analysis (5x5 resolution), similar analysis for
Pacific-only data with EOFs from NCEP OI with 5x5 resolution, and same
as previous with 1x1 resolution (that should be directly compared to
your new analyses, and was used in Fig R2).

P. 16. End of 2nd par. "Mostly because of our data checking, the new
method avoids much of the problem." (and also following sentence).  I
think this is an overstatement.  Checking or no checking, your method
produces +0.5 anomaly where it should be -0.5, and in a very
dynamically important place of the tropical Pacific. I think this
should be openly recognized as a failure of the method.
     What amazes me, however, where are these bad data in the Eastern
Eq. Pac?  I don't see them in the Mar 1985 MOHSST5 field (Fig R3).  I
only see one little 0.5 C bullseye around 140W.  Is this what spoils
your analysis?  And what are you checking out for this month?

P. 17. 2nd par.  You are talking here that our anomalies are
"considerably weaker" and more "spatially spread out" than your,
ignoring the fact that ours are presented on 5x5 grid.  I think this
is unfair comparison and unfair statement.  Average your anomalies on
5x5 grid, and then we will see whose are weaker and whose are more
spread out!  Or, alternatively, we can look at the Fig R3, where in
the last column I show our analysis for the tropical Pacific only, for
NCEP OI EOFs, with 1x1 resolution.
     For Jan 1983 we reach the maximum of 5 C (while NCEP OI barely
reaches 5.5C), and I think this is OK -- after all we are not supposed
to resolve all the variance. Your new analysis (as well as SRLS; your
Fig 5) actually reaches larger values (5.5 on larger area) that NCEP
OI -- this is not supposed to happen, and I think it is quite
suspicious. Our 1x1 field also better reproduces NCEP OI field in the
NW part of the analysis domain.
     "For March 1985 the K97 analysis effectively smoothes out the
influence of the bad SST superobservations in the eastern-equatorial
Pacific, but again it tends to spatially spread out the anomaly too
much, particularly -1C anomaly near 165W which is given a north-south
spread."  Frankly, I find paternalistic tone of this statement
intolerable, since this "smoothing out the influence of the bad SST
superobservations" is the major problem for your analysis for this
month, and the analysis you produced for this month is completely
dynamically wrong on the equator.  Now, we produced much better field,
and you accentuate "a north-south spread?"  Well, as far as this
spread is concerned, it appears in our analysis as a connection (on
the sparse 5x5 grid) of two -1C minimums, one on the equator and
another on 10N (this second minimum is completely missing in your new
analysis, BTW).  If we use NCEP OI set of EOFs, this spread disappears
(and the 10N minimum, unfortunately, becomes weaker -- but still
exists! -- in our analysis).
     "A similar excessive weakening and spreading of the anomaly is
evident in the K97 analysis for May 1988." What happens for 1988 is
much more complicated, and we describe it in detail in GP. and
illustrated it by Figures 13 and 14.  I also had a detailed discussion
with Kevin Trenberth about this particular month.  You might want to
look to this discussion presented at
http://rainbow.ldeo.columbia.edu/~alexeyk/GlobSSTpaper/index.html
Simply speaking, EOFs of ship data *are* seriously deficient for
analyzing this months.  If we use NCEP OI EOFs at 5x5 resolution (Fig
R3), things get much better. Our analysis at 1x1 resolution (Fig R3)
is actually better (closer to NCEP OI) than your new analysis.

P. 17. 3rd par, last sentence. "80 observations" Should be "superobservations"?

P. 18. 1st par.  Fig 12.  I think that general tendencies are very
similar to what can be concluded from Fig. 2 of GP.

P. 18, 2nd par, Fig 13. On the figure R4 I show our estimates for
NINO3.4 from two sources: our "standard" analysis of GP (red lines)
and our analysis of the tropical Pacific only with NCEP OI EOFs (black
lines). For each analysis i show two products OS (solid lines) which
uses time connections from both directions and KF (dash lines) which
uses only connections from previous times.  In principle the
difference between solid and dash lines should be comparable with half
of the difference between curves on your Figure 13.  In reality our
differences for Pacific-only analysis are incomparably smaller than
yours, implying, by your own argument, much higher stability of our
method.  Needless to say, for our "standard" analysis which uses all
the data on the globe, and thus even more robust, these differences
are almost invisible.  
     Unless sampling is very bad, all four curves are very close on
Fig. R4.  This demonstrates that for integral quantities like NINO3.4
under decent sampling, it does not really matter which set of EOFs to
use.  When sampling is really bad, the "standard analysis" makes
believable corrections to the flaws of Pacific-only run (1865-1870, for
example).

P. 18, last par. "Large differences between the forwards and backwards
run indicate times when we should have less confidence in the analysis
due to insufficient sampling."  Even more sensible measure would be
error estimates for spatial averages, which our method provides
(AP,GP).

P. 19. 2nd par. "While broadly similar, the K97 anomalies for 1910 and
1912 are weaker and broader".  Once again, our anomalies are on 5x5
grid, and yours are on 1x1.  To make proper comparison you have to
average yours on 5x5 grid.  Or alternatively, look at our 1x1 analyses
on the Fig. R3 -- are they weaker and broader?
     "The K97 bulls eye in the Central Pacific is in a sampled region
. . ."  Well, for one thing, this -1.5 C anomaly has way too large
scale to be called a "bulls eye."  And it is in the sampled region, as
you noted it, so its magnitude is probably correct.  Your analysis for
the same month, however, shows -2 C anomaly of larger magnitude,
smaller spatial scale in *non-sampled* region -- this is more like a
bulls' eye as a result of overfitting!  I suggest you either remove
this "bulls' eye" comment about us, or add one about your analysis for
the same month.
     ". . . the K97 analysis uses only three months of data for any
analysis . . . " This is a WRONG STATEMENT.  Our analysis is the
result of global in time and space minimization of cost function, and
as such it uses the data from entire 1856--1991 period for analyzing
every month (that's why it is called Optimal Smoother!).
     ". . . more sensitive to the formation of bulls eyes associated
with isolated data.  The new analysis . . ."  I'll be happy to discuss
this in a greater detail, but now I believe that our method will be
systematically better in preventing bulls eyes than yours.

P. 19, Last par. "In 1943 the K97 analysis is weaker on equator . . ."
Once again, you cannot directly compare weakness for two fields on
different gridsizes, but worse than that, you somehow "lost" -2 C
contour in the presentation of our Jan 1943 field (compare your Fig 18
and my R3).  With this contour put back, our anomaly on 5x5 grid is
stronger than yours on 1x1 !

P. 20, 2nd par. "The K97 analysis for January 1956 and 1958 shows
isolated bullseyes in both the cool and the warm patterns" These are of
too large scales for bulls eyes.  We change the set of EOFs (Fig. R3)
and we get something quite similar to yours.
     ". . . caused by that method's slightly greater sensitivity to
sampling gaps".  We can discuss it, but I strongly doubt that our
method has greater sensitivity to sampling gaps than yours.  And
mentioned above comparison with Fig. R3 supports my opinion.
     "For January 1973 and 1974 the K97 analysis has anomalies spread
out north-south far more than the new method's anomalies. Also in
1973, . . ." Once again, this is a question of resolution.  We changed
EOFs (see R3), and the difference goes away, right?  But once again,
you lost -2 C contour in presenting our analysis on your Fig. 22,
bottom (compare with R3!).

P. 22, 1st par. "Comparisons shown here indicate that our method may
better represent the shape and strength of anomalies. . ." I don't think
this was actually shown.  With the evidence I provided above, exactly
the opposite seems more plausible.

                         CONCLUSIONS

1. Separate accounting for low frequency variability is a very
attractive idea, but i don't think that the solution you suggest is a
good one.  I think it is pretty dangerous to estimate this variability
as simple 15 years running means from raw data.  There must be better
ways to estimate this variability.
 
2. Connection between reduced space OS and your new method is closer
than one could expect.  It is known (Gelb's book, for example) that OS
can be presented as an optimal linear combination of two going in
opposite directions in time KF runs.  What you present in your new
method is a (non-optimal) simple average of two runs.  Both these runs
can be actually interpreted as a KF for the following system.  All
observations have equal size white in time and space errors. The model
of time transitions is diagonal in EOF space and depends on data
sampling at the next step (we can consider data sampling for the
entire dataset known in advance). It predicts simple persistence for
supported modes and damped persistence for unsupported.  Error of this
model is assumed to be diagonal in EOF space, infinitely large for
supported modes, and infinitely small for unsupported.  I claim such a
system will produce a KF identical with your forwards and backwards
runs.  Now, the model I described is clearly suboptimal (inferior to
the one we use in GP), with inadequate error, and two individual runs
are combined into a final product suboptimally.  Specification of
observational errors as uniform is also clearly inadequate. This gives
almost a precise mathematical statement that your method must be
inferior to the reduced space OS.

3.  From ideological point of view we start from strict statistical
system of assumptions, produce exact solution, make sure that it is
consistent with the a priori assumptions, obtain theoretical error
estimates for the solution; spatial averages of such solutions are
what is called "Optimal Averages" (Smith et al 1994), and their
theoretical errors also can be estimated.  Compared to that, your
scheme, though quite sophisticated, is very arbitrary.

4. I suggest that in a collaborative effort we generally assume *our*
scheme, for the reason of 2 and 3 above, without low frequency
variability accounting for the reason of 1, or with a special effort
designated to accounting for it.  I suggest that we use your "measures
of support" for diagnostic purposes, and that we incorporated your
data checking technique, but not remove the data, but rather increase
heuristically its observational error a priori estimate.