Sunday 25 April 2010

Getting boundaries right

I've got to grips with the boundary data that the Ordnance Survey released earlier this month. The data was in shape files and, with the help of OSM::SK53, I have extracted OSM style data from them. Shape files are a twenty-year-old format that can include a projection file (*.prj). This was where the problem lay that stopped me using the data before. With Jerry's help the projection file was altered and then the process of changing the projection OS use to the one OSM uses was easy.

If you are thinking of using OS shape files be warned: don't trust their *.prj files - they are incomplete.

Once the shape files were in the OSM projection I could then extract polygons or polylines from them. I created a python script that will extract a named parish boundary polygons or numbered polylines that make up the coastline. I spent a few mind-numbing hours loading each of the parishes that fall on the outside edge of the county of the East Riding of Yorkshire. I created a relation for each parish, and deleted each of the sections of ways that were duplicated between parishes. The outer edges of each parish also marked the edge of the county too, so the relations for the county and the region has improved too.

I have also updated the coastline ways from the Boundary Line data set from OS. I managed to leave them broken over night, as OSM::PA94 realised and helped to fix. The coastline is not often updated - it's not part of the normal rendering process for Mapnik, so I'll have to wait to see what Mapnik makes of it.

I have noticed a few things as part of this. The boundaries sometimes follow a stream or river, but sometimes the boundary leaves the river briefly, probably because the river has moved, but the boundary hasn't. The boundaries do not follow the centre line of roads. They clearly lie at one side or the other and at certain points you can see where the boundary jinks across to the other side.

Lastly I have to say the match between the boundaries and the surveyed areas is very close and I'm very comfortable in using the OS data for boundaries, especially because there is no better way to survey this data on the ground.

I will write up the detail of the steps involved if there is any interest.

No comments: