Notes on Argo data
==================

I have recently spent several months on processing ocean data, 
a significant part of this was spent on the Argo (and earlier 
profiling or PALACE buoys) available over the last decade.  
It seemed worthwhile summarising some of the lessons I have 
learnt.  Overall the message is that Argo is now an extremely 
valuable data source for the oceans, but more effort on quality 
issues is needed to make the most of the data.  

Sections 1, 2 and 5 below relate specifically to data from the 
ARGO GDACs (Global Data Assembly Centres, ie USGODAE and CORIOLIS) 
while sections 3 and 4 relate to all profiler data.  
Some of the issues may be specific to our processing system 
(and in retrospect there are one or two things I would have done 
differently), they are included to provide context and to show 
some of the issues that can arise in practical applications.  

  Bruce Ingleby    26 January 2007


1. Argo NetCDF Format
=====================

There are violiations of the existing Argo format.  These are fairly 
rare but are a significant nuisance, particularly c).  

a) PLATFORM_NUMBER defined as STRING8 - sometimes the strings 
are only 5 characters long.  
(5-digit WMO numbers should be padded with blanks.)

b) WMO_INST_TYPE defined as STRING4 - sometimes the strings 
are only 3 characters long, very occasionally 0 characters long.  

c) NaN (not-a-number) values in the data arrays such as 
TEMP/PSAL_ADJUSTED, seen occasionally in Indian Ocean data.  
(Because I check for valid values level-by-level this can result 
in my profile having a mixture of PSAL_ADJUSTED and PSAL values 
- not advisable apparently, but the current format doesn't help, 
see note 2b.)

2. Format and archive wish list 
===============================

a) We would prefer fewer, larger files: monthly data by ocean basin 
rather than daily data.  I briefly tried concatenating several 
daily files together using ncrcat (see http://nco.sf.net), 
but this did not work - "record variable exists and is size zero".  
It also seems a bit wasteful of bandwidth that the files aren't 
compressed.  

b) From my point of view it is much simpler if there is one 
set of arrays (could be the _ADJUSTED ones) which contain 
the best estimates of the variables - whether original or 
modified in some way.  Reading both TEMP and TEMP_ADJUSTED 
and deciding which to use, and similarly for the _QC variables, 
is an added, unnecessary complication.  

c) I may have made a few errors in retyping buoy numbers 
or dates from the greylist (section 5) into my code 
(although given time I could have automated the 
reformatting of the greylist).  Ideally there would be 
additional variables in the main NetCDF records 
("greylist" indicators for the PRES, TEMP or PSAL 
profiles) which users could choose to use or not.  


3. Argo data from GDAC, GTSPP and WOD05
=======================================

a) Early profilers are in the GDAC and GTSPP data with a five 
digit buoy identifier.  Later profilers are stored by GDAC with 
a seven digit identifier and by GTSPP by the identifier prefixed 
with a Q.  Thus it is easy to compare data from the same buoy 
between GDAC and GTSPP.  For WOD05 data we store the cruise number 
making it more difficult to pick out the same buoy.  

b) As expected there are considerable overlaps between the Argo 
data held in the three archives.  However on occasion the time 
stored can be one or more hours different and in this case our 
system will not detect the reports as duplicates and both reports 
may be assimilated.  

c) I found a number of pre-Argo profilers (1999/2000) 
in the North Atlantic that are temperature only in the GDAC 
and GTSPP, but where WOD05 has salinity as well (and sometimes 
additional profiles - possibly from WOCE archives?).  

d) Sometimes the GTSPP report is a "cleaned up" version of the 
GDAC report, eg unphysical pressures or temperatures removed, 
or sometimes the salinities are missing - generally after the 
buoy has been "greylisted".  In principle I am in favour of 
flagging values rather than removing them but some values are 
such junk there seems little point in archiving them.  

4. Argo quality issues
======================

During my recent work I have seen many more instances of "bad" 
Argo data than I expected.  In part this is due to the large 
(by ocean standards) data volume available in recent years.  
Most Argo data are generally of good quality but a significant 
minority have data quality problems.  This section describes the 
physical data; flags and greylist are described in the next section.  
Table 1 (at bottom) gives an idea of the scale of profile rejections.  

a) "Frozen profiles": in some cases identical (or almost identical) 
temperature and salinity profiles are reported month after month 
from the same buoy (the latitude/longitude change).  In a few cases 
there is an intermittent update: one particular profile is repeated 
several times, then a different profile is repeated several times etc.  
- it is fairly obvious that this is happenning if a sequence of reports 
from a buoy are examined/overplotted 
- however if single reports are examined separately the problem can 
easily be undetected
- if the profile is recently "frozen" it can look quite good by 
comparison with the background
- my estimate is that 40-50 buoys have been afflicted by this problem 
(which I was previously unaware of), some have stopped reporting now 

b) Bias problems: salinity biases have been recognised (and to some 
extent addressed) for a number of years, it is now clear that a few 
buoys have temperature biases.  For our QC system "large" biases 
tend to be rejected and pose less of a problem than "moderate" biases.  
It is usually easiest to detect biases at lower levels in the profile 
where variability is less (sometimes our system rejects these lower 
levels, if more than half of the profile is flagged the whole profile 
will be rejected).  In poorly observed or very variable regions it is 
sometimes difficult to decide if there is a measurement bias or not.  

c) Pressure sensor problems:  there can be "junk" values - sometimes 
in the middle of an otherwise sensible profile.  There can also be 
more subtle problems including apparent biases.  Of course the T-S 
relationship is unaffected and it can be difficult to decide if an 
apparent upwards/downwards shift is physical or erroneous.  
- a handful of values at 2200 m and below pass our QC procedures, 
we should change to automatically reject them 

d) A few buoys had wild temperature zig-zags in the top 100m or so 
and apparently reasonable values below.  
(In our system, and in the greylist, it is not currently possible 
to reject part of a profile so the whole profiles were rejected.) 

e) I came across one apparent position error - from buoy 3900077 
in January 2005.  

5. Argo greylist and quality flags
==================================

The Argo greylist is available from either 
ftp://ftp.ifremer.fr/ifremer/argo/ar_greylist.txt
ftp://usgodae2.usgodae.org/pub/outgoing/argo/ar_greylist.txt
it lists suggested rejections.  My recent experience suggests 
that the greylist is useful but incomplete.  

a) The greylist has not been updated for the last six months 
- a disincentive to operational use.  

b) A significant number of buoy problems are not included in the 
greylist (my estimate is between one third and one half), 
including some "frozen profiles".  There are also instances 
where I felt the rejection should start a few months earlier 
or should include temperature as well where the greylist 
just indicated a salinity problem.  There were a smaller number 
of instances where the greylist indicated a problem but I 
decided against a profile rejection because the automatic QC 
dealt adequately with any errors.

c) In some cases the greylist indicated a problem starting 
at the point where a particular buoy disappears from the 
archive - superfluous from a user point of view.  

d) The two GDACs gave different answers to the question: 
"Does the greylist apply just to the real-time data or to 
delayed-mode data as well."   I suspect that some confusion 
exists amongst the processing centres as well.  In general 
I applied profile rejections to all data modes with a few 
exceptions where the delayed-mode data incorporated a 
salinity bias correction.  

e) For delayed mode and adjusted data I used the Argo 
TEMP/PSAL_ADJUSTED_QC flags - with some misgivings because 
I didn't have time to examine them in any depth.  
In general the numbers are not large and these are 
likely to be a second-order effect, whereas whole profile 
rejection (greylisting) is a first order effect.  
(I came across a case in 2002/2003 where Argo QC flags had 
been raised against salinities from 6900169 although a 
bias correction had been applied and the profiles looked 
reasonable.)  


Table 1.  Summary of suspect cruises rejected 1990-2005

Year  No rej   PFL  XBT  CTD    Temp Salt     Num
                        /OSD                 ARGO
1990       4     0    1    3       3    1
1991       6     0    4    2       4    2
1992       6     1    1    4       5    1
1993       4     1    1    2       2    2
1994       3     0    0    3       2    1
1995       7     1    4    2       5    2
1996       3     2    1    0       3    0
1997       5     4    0    1       2    3
1998      18    18    0    0       6   12
1999      16    16    0    0       2   14     311
2000      20    19    0    1       3   17     371
2001      14    13    0    1       0   14     572
2002       8     8    0    0       1    7     888
2003      49    47    2    0      24   25    1264
2004     115   114    1    0      78   37    1855
2005     165   164    1    0     111   54    2555

Notes

1. These are "prior" rejections - the automated QC system 
can reject other reports.  If there is a "prior" temperature 
rejection, the whole report, including salinity is rejected.  
2. The number of distinct buoys in the ARGO GDAC archive 
(Num ARGO) is included for comparison with the PFL rejections.  
3. If a particular buoy is rejected from (mid-)2003 until 
its failure in 2004 it is counted in both years, whereas 
the greylist generally just gives the rejection start date.   

Addendum   4 May 2006

For 2006 v1c 100 PFL floats were rejected: 69 Temp, 31 Salt.  
I did not individually reject SOLO FSI floats as these were 
subject to a block reject.  I also noticed that a significant 
proportion (possibly a third) of the 2005 rejects were for 
SOLO FSI floats - hence the reduction from 2005 to 2006.  
It seems that most (if not all) of the "Frozen profiles" 
(see above) were from SOLO FSI floats.