Mode
Dark | Light
Email Me , μBlog me
or follow the feed  

GML in R

In this post I explain how to load .gml geographic data in R. Tutorials for spatial analysis in R use mainly ‘ESRI Shapefiles’ (for example in the Introduction to visualising spatial data in R from Robin Lovelace James Cheshire and other). ‘ESRI Shapefiles’ are easy to load with R and are perfect for a first approach. However, GML are more suitable for long term archives and they may play an important role when trying to work directly with archived data. This post aims to provide some help to import gml in R as this file format is not so well documented.

What is .gml?

  • an open format
  • GML utilises XML to express geographical features (works well with VCS, like Git)
  • it is recommended for (long-term) archive

Prerequisite

  • gdal : the library for reading and writing raster and vector geospatial data formats
  • rgdal R bindings to gdal

Get some data (for Linux)

I use the dataset from Palmisano, A :

  • (2012) Diachronic and spatial distribution of Khabur ware in the early second millennium BC: http://dx.doi.org/10.5334/data.1334754978

This is how to get the data with the command line and Linux . On other OS, download the data and convert the shapefile to .gml

wget http://discovery.ucl.ac.uk/1344126/1/data.rar
unrar e data.rar
rm data.rar
ogr2ogr -f "GML" "../media/2015-10-20--Vector-points.gml"
"2015-10-20--sites.shp"

Get info about your .gml

The first thing to do before trying to import .gml is to get information about your layer. For this example, the geographic file is “2015-10-20–Vector-points.gml”, stored in the directory ../media. On the command line try

ogrinfo "../media/2015-10-20--Vector-points.gml"
## INFO: Open of `../media/2015-10-20--Vector-points.gml'
##       using driver `GML' successful.
## 1: sites (Point)

or with R command line

library(rgdal)
ogrListLayers("../media/2015-10-20--Vector-points.gml")
## [1] "sites"
## attr(,"driver")
## [1] "GML"
## attr(,"nlayers")
## [1] 1

This output indicates that there is one layer, named “sites” in the file 2015-10-20–Vector-points.gml.

Import GML in R with readOGR()

The function to import spatial data in R is

readOGR()

The tricky part is, that the loading process changes with type of drivers. Reading ESRI Shapefiles is different from reading GML or GPX

For GML

  • “dsn” is the path to the file
  • “layer” is the name of the layer
vector <- readOGR(dsn="../media/2015-10-20--Vector-points.gml",
                layer = "sites",
                encoding =  "UTF-8"
                )

For the sake of comparison

reading an ESRI shapefile “Vector-points.shp” with readOGR()

  • “dsn” is the path to the file (without the file name)
  • “layer” is the name of the shapefile
    vector <- readOGR(dsn = "../media",
                      layer = "2015-10-20--sites",
                      encoding =  "UTF-8"
                     )

Things I found confusing

The output of ogrListLayers depends from data source name, which interpretation varies by driver. That means it shows just one file even if you have an ESRI Shapefile and GML in the same directory.

../media
├── 2015-10-20--sites.dbf
├── 2015-10-20--sites.shp
├── 2015-10-20--sites.shx
├── 2015-10-20--Vector-points.gfs
└── 2015-10-20--Vector-points.gml

If you call this function within the directory showed above, and use “only” dsn = "../media", then it shows only the ‘ESRI Shapefile’:

    ogrListLayers("../media")
## [1] "2015-10-20--sites"
## attr(,"driver")
## [1] "ESRI Shapefile"
## attr(,"nlayers")
## [1] 1

At the beginning, with this output, I thought that I had a problem with my driver and my gml wasn’t recognised … but changing the data source name to dsn = "../media/2015-10-20--Vector-points.gml" shows that there is a gml file too.

    ogrListLayers("../media/2015-10-20--Vector-points.gml")
## [1] "sites"
## attr(,"driver")
## [1] "GML"
## attr(,"nlayers")
## [1] 1

However dsn = "../media/2015-10-20--sites.shp" works too

    ogrListLayers(dsn = "../media/2015-10-20--sites.shp")
## [1] "2015-10-20--sites"
## attr(,"driver")
## [1] "ESRI Shapefile"
## attr(,"nlayers")
## [1] 1

Closing words

Now you can read every .gml file you want