Geography: Address Matching Research
The purpose of this paper is to summarise research findings on the quality of address information used for planning and processing the 2001 Census and consider developments planned for 2011.
The postcode has been used as a fundamental tool for Census enumeration and output planning for some considerable time by General Register Office for Scotland (GROS). An in-house GIS facility continuously maps and maintains postcode boundaries covering the entire land area of Scotland against Ordnance Survey’s (OS) digital mapping (Land-Line). The use of the postcode as a geo-coding methodology has been developed to enable GROS to assign any address in Scotland to any higher area geography used in the output of Census products (as well as in other areas of GROS business).
The Royal Mail’s Postcode Address File (PAF) is our sole source of postcode data (supplemented by local authorities’ planning information on new developments to help us to locate and digitise new postcodes). Postcode boundaries are continuously quality assured against their corresponding high quality OS Address Points.
Based on the Scottish experience of the 1991 Census, the GROS strategy for 2001 was to freeze geography as late as possible and use the frozen set of live small user postcodes used in Enumeration to underpin all Census geographies. The January 2001 target for freezing struck a balance between the need for current information and the lead-time required for the production of geography end products for the field operation. A subsequent index was maintained for post-frozen postcodes via updates supplied during processing (to replace such postcodes with one from the frozen set). Addresses from the December 2000 version of PAF were pre-listed in Enumerators’ Record Books.
A geography database set up as part of the enumeration area planning exercise was used extensively during the processing of Census forms to check the validity of postcodes. Postcode queries raised during this process were passed to the geography team for resolution and various automated and manual systems were used to ensure the final Census database contained postcodes that were accurate and valid and in the frozen set.
An internal report on 2001 Data Collection Processes highlighted addresses in PAF found during enumeration to be demolished/derelict, duplicate and non-existent.
Wholesale matching of planned and captured addresses proved impossible because it is not easy to standardise the datasets for automated comparison. So an exercise in Edinburgh, a largely urban local authority, focussed on non existent addresses highlighted by the 2001 Census. Considerable IT and administrative resources were used to extract the data, clean or re-format it where necessary and complete an automated comparison with the current PAF to see if they were still non-existent. 4741 non pre-listed address records were extracted from the Post Edit Pre Imputation (PEPI) database – i.e. addresses collected by enumerators and subsequently captured by Lockheed Martin. Those with blank / incomplete / nonsense address information were excluded (252 records). The remaining 4489 records were compared against the current PAF.
- 3,181 (71 percent) were matched automatically to the current PAF;
- Out of a total of 1,308 records that failed automatic matching, 630 (14 percent) records were eventually manually matched to PAF and 678 (15 percent) records (mainly flats) were not found on PAF (so never matched).
Many address records fail automatic matching processes simply because the same address can be held in so many formats by different sources. This is also borne out by the results of a sample test carried out by a Royal Mail Bureau on our behalf on the same datasets
Until recently, legal opinion was that providing information on addresses to Royal Mail Address Management Centre (RMAMC) raised issues regarding confidentiality of personal Census information. Solicitors now confirm that such action poses no confidentiality risk to personal information and our address anomaly findings are being reviewed with Royal Mail and Ordnance Survey.
Developments for the future
The launch of the Scottish Assessors’ Portal providing Internet access to the Valuation Roll and Council Tax information, operational difficulties in matching captured and planned addresses plus a planned Address Check task in preparation for the 2006 Census Test, all contributed to our decision not to proceed further with matching of addresses from the 2001 Census. Instead an in-house address matching exercise was set up to compare the Assessors’ Portal addresses with what was found in the field. The anomalies discovered are the subject of exchanges between GROS and the Lothian Joint Valuation Board. The Assessors’ Portal does however appear to be an attractive option for address identification and more work will be done in liaison with Assessors to discuss issues such as quality of their address lists.
Address quality needs to improve in order to provide a high quality accurate national address base. This would not only enhance the quality of GROS postcode geography products but could enable post-out of Census forms and revolutionise Enumeration - the most expensive aspect of a conventional Census.
GROS currently plans to extend the 2004 Address Check exercise to 60,000 addresses in the 2006 Census Test and envisages local authority consultation to further assess the feasibility and timescale of creating a definitive address list tailored to Census requirements - specifically excluding non-residential addresses and highlighting unoccupied, multi-occupied and Communal Establishment (CE) addresses. GROS findings will inform the next phase of the Assessors’ Project which will consider data quality improvement. The Test will provide valuable information about how GROS will take forward address issues in advance of the 2011 Census when all Scottish Councils will be involved.