Notes on Argo data ================== I have recently spent several months on processing ocean data, a significant part of this was spent on the Argo (and earlier profiling or PALACE buoys) available over the last decade. It seemed worthwhile summarising some of the lessons I have learnt. Overall the message is that Argo is now an extremely valuable data source for the oceans, but more effort on quality issues is needed to make the most of the data. Sections 1, 2 and 5 below relate specifically to data from the ARGO GDACs (Global Data Assembly Centres, ie USGODAE and CORIOLIS) while sections 3 and 4 relate to all profiler data. Some of the issues may be specific to our processing system (and in retrospect there are one or two things I would have done differently), they are included to provide context and to show some of the issues that can arise in practical applications. Bruce Ingleby 26 January 2007 1. Argo NetCDF Format ===================== There are violiations of the existing Argo format. These are fairly rare but are a significant nuisance, particularly c). a) PLATFORM_NUMBER defined as STRING8 - sometimes the strings are only 5 characters long. (5-digit WMO numbers should be padded with blanks.) b) WMO_INST_TYPE defined as STRING4 - sometimes the strings are only 3 characters long, very occasionally 0 characters long. c) NaN (not-a-number) values in the data arrays such as TEMP/PSAL_ADJUSTED, seen occasionally in Indian Ocean data. (Because I check for valid values level-by-level this can result in my profile having a mixture of PSAL_ADJUSTED and PSAL values - not advisable apparently, but the current format doesn't help, see note 2b.) 2. Format and archive wish list =============================== a) We would prefer fewer, larger files: monthly data by ocean basin rather than daily data. I briefly tried concatenating several daily files together using ncrcat (see http://nco.sf.net), but this did not work - "record variable exists and is size zero". It also seems a bit wasteful of bandwidth that the files aren't compressed. b) From my point of view it is much simpler if there is one set of arrays (could be the _ADJUSTED ones) which contain the best estimates of the variables - whether original or modified in some way. Reading both TEMP and TEMP_ADJUSTED and deciding which to use, and similarly for the _QC variables, is an added, unnecessary complication. c) I may have made a few errors in retyping buoy numbers or dates from the greylist (section 5) into my code (although given time I could have automated the reformatting of the greylist). Ideally there would be additional variables in the main NetCDF records ("greylist" indicators for the PRES, TEMP or PSAL profiles) which users could choose to use or not. 3. Argo data from GDAC, GTSPP and WOD05 ======================================= a) Early profilers are in the GDAC and GTSPP data with a five digit buoy identifier. Later profilers are stored by GDAC with a seven digit identifier and by GTSPP by the identifier prefixed with a Q. Thus it is easy to compare data from the same buoy between GDAC and GTSPP. For WOD05 data we store the cruise number making it more difficult to pick out the same buoy. b) As expected there are considerable overlaps between the Argo data held in the three archives. However on occasion the time stored can be one or more hours different and in this case our system will not detect the reports as duplicates and both reports may be assimilated. c) I found a number of pre-Argo profilers (1999/2000) in the North Atlantic that are temperature only in the GDAC and GTSPP, but where WOD05 has salinity as well (and sometimes additional profiles - possibly from WOCE archives?). d) Sometimes the GTSPP report is a "cleaned up" version of the GDAC report, eg unphysical pressures or temperatures removed, or sometimes the salinities are missing - generally after the buoy has been "greylisted". In principle I am in favour of flagging values rather than removing them but some values are such junk there seems little point in archiving them. 4. Argo quality issues ====================== During my recent work I have seen many more instances of "bad" Argo data than I expected. In part this is due to the large (by ocean standards) data volume available in recent years. Most Argo data are generally of good quality but a significant minority have data quality problems. This section describes the physical data; flags and greylist are described in the next section. Table 1 (at bottom) gives an idea of the scale of profile rejections. a) "Frozen profiles": in some cases identical (or almost identical) temperature and salinity profiles are reported month after month from the same buoy (the latitude/longitude change). In a few cases there is an intermittent update: one particular profile is repeated several times, then a different profile is repeated several times etc. - it is fairly obvious that this is happenning if a sequence of reports from a buoy are examined/overplotted - however if single reports are examined separately the problem can easily be undetected - if the profile is recently "frozen" it can look quite good by comparison with the background - my estimate is that 40-50 buoys have been afflicted by this problem (which I was previously unaware of), some have stopped reporting now b) Bias problems: salinity biases have been recognised (and to some extent addressed) for a number of years, it is now clear that a few buoys have temperature biases. For our QC system "large" biases tend to be rejected and pose less of a problem than "moderate" biases. It is usually easiest to detect biases at lower levels in the profile where variability is less (sometimes our system rejects these lower levels, if more than half of the profile is flagged the whole profile will be rejected). In poorly observed or very variable regions it is sometimes difficult to decide if there is a measurement bias or not. c) Pressure sensor problems: there can be "junk" values - sometimes in the middle of an otherwise sensible profile. There can also be more subtle problems including apparent biases. Of course the T-S relationship is unaffected and it can be difficult to decide if an apparent upwards/downwards shift is physical or erroneous. - a handful of values at 2200 m and below pass our QC procedures, we should change to automatically reject them d) A few buoys had wild temperature zig-zags in the top 100m or so and apparently reasonable values below. (In our system, and in the greylist, it is not currently possible to reject part of a profile so the whole profiles were rejected.) e) I came across one apparent position error - from buoy 3900077 in January 2005. 5. Argo greylist and quality flags ================================== The Argo greylist is available from either ftp://ftp.ifremer.fr/ifremer/argo/ar_greylist.txt ftp://usgodae2.usgodae.org/pub/outgoing/argo/ar_greylist.txt it lists suggested rejections. My recent experience suggests that the greylist is useful but incomplete. a) The greylist has not been updated for the last six months - a disincentive to operational use. b) A significant number of buoy problems are not included in the greylist (my estimate is between one third and one half), including some "frozen profiles". There are also instances where I felt the rejection should start a few months earlier or should include temperature as well where the greylist just indicated a salinity problem. There were a smaller number of instances where the greylist indicated a problem but I decided against a profile rejection because the automatic QC dealt adequately with any errors. c) In some cases the greylist indicated a problem starting at the point where a particular buoy disappears from the archive - superfluous from a user point of view. d) The two GDACs gave different answers to the question: "Does the greylist apply just to the real-time data or to delayed-mode data as well." I suspect that some confusion exists amongst the processing centres as well. In general I applied profile rejections to all data modes with a few exceptions where the delayed-mode data incorporated a salinity bias correction. e) For delayed mode and adjusted data I used the Argo TEMP/PSAL_ADJUSTED_QC flags - with some misgivings because I didn't have time to examine them in any depth. In general the numbers are not large and these are likely to be a second-order effect, whereas whole profile rejection (greylisting) is a first order effect. (I came across a case in 2002/2003 where Argo QC flags had been raised against salinities from 6900169 although a bias correction had been applied and the profiles looked reasonable.) Table 1. Summary of suspect cruises rejected 1990-2005 Year No rej PFL XBT CTD Temp Salt Num /OSD ARGO 1990 4 0 1 3 3 1 1991 6 0 4 2 4 2 1992 6 1 1 4 5 1 1993 4 1 1 2 2 2 1994 3 0 0 3 2 1 1995 7 1 4 2 5 2 1996 3 2 1 0 3 0 1997 5 4 0 1 2 3 1998 18 18 0 0 6 12 1999 16 16 0 0 2 14 311 2000 20 19 0 1 3 17 371 2001 14 13 0 1 0 14 572 2002 8 8 0 0 1 7 888 2003 49 47 2 0 24 25 1264 2004 115 114 1 0 78 37 1855 2005 165 164 1 0 111 54 2555 Notes 1. These are "prior" rejections - the automated QC system can reject other reports. If there is a "prior" temperature rejection, the whole report, including salinity is rejected. 2. The number of distinct buoys in the ARGO GDAC archive (Num ARGO) is included for comparison with the PFL rejections. 3. If a particular buoy is rejected from (mid-)2003 until its failure in 2004 it is counted in both years, whereas the greylist generally just gives the rejection start date. Addendum 4 May 2006 For 2006 v1c 100 PFL floats were rejected: 69 Temp, 31 Salt. I did not individually reject SOLO FSI floats as these were subject to a block reject. I also noticed that a significant proportion (possibly a third) of the 2005 rejects were for SOLO FSI floats - hence the reduction from 2005 to 2006. It seems that most (if not all) of the "Frozen profiles" (see above) were from SOLO FSI floats.