Oslo Summer School in Comparative Social Science Studies 2013
Using Spatial Data in the Social Sciences: An Applied Survey
Lecturer: Professor Roger Bivand,
Department of Economics, NHH,
Norwegian School of Economics, Norway
Main disciplines: Human Geography, Economics,
Research Methods, Political Science, Sociology
This course is fully enlisted!
Dates: 22 - 26 July 2013
Course Credits: 10 pts (ECTS)
Limitation: 20 participants
The use of spatial data in the social sciences has an established position, dating back to the first studies of urban poverty in the 1800s, and studies of elections in the 1900s. New sources of spatial data, obtained through self-reporting – for example geotagged tweets and volunteered information – both blur the distinction between quantitative and qualitative data, and between researcher and informant. Using the kinds of data that are becoming available does, however, depend on researchers obtaining an adequate knowledge of the challenges involved in representation and analysis, including spatio-temporal data.
The course is intended to provide a survey of topics in the representation and analysis of spatial data in the social sciences. Research practices vary across disciplines, with opportunities for learning from one-another when using data derived from similar sources. It has been claimed that geographical information is becoming pervasive, that digital representations of our surroundings are increasingly entering into daily life. But are these representations unproblematic? Do the available representations impact our choices with regard to understanding and/or inference? Should we expect to extend aspatial methods of analysis to spatial data without modifying, or at least challenging, their assumptions? These are the key topics to be addressed in the course, which will necessarily be open-ended, because the various disciplines using spatial data may reach different conclusions.
Monday 22 July:
1. Representing spatial data:
Spatial data is keyed to reference systems just as temporal data is keyed to time zones. Knowledge of how they are constructed is important in integrating data by spatial (temporal) position. Spatial objects have position in space (and time), and use of spatial data implies understanding of these objects. The objects used often inherit characteristics from their sources, be they point locations from GPS or geocoded addresses, administrative boundaries used to aggregate observations, transects or trajectories, or pixels captured by earth observation satellites.
Tuesday 23 July:
2. Visualizing spatial data:
Mapping may be used to associate names or statements with spatial position, often also implying contextualization. Choices in visualizing spatial data affect the ways in which users perceive content, so creators of visualizations should be aware of alternatives. The ease with which content can be exposed and manipulated on base maps from GoogleTM Map and Earth, or OpenStreetMap is ensnaring, but may deserve caution. Thematic mapping complements topographic mapping by adding visual representations of observations of attributes or variables, which may be measured on various scales, such as presence/absence, intensity, or rate. Colour keys may also influence the perception of users, for example of crime hotspots, which can be made to look more or less alarming, depending on the intentions of the content creator.
Wednesday 24 July:
3. The support of spatial data:
Support is the term used to describe the link between the observation and the spatial entity used for observation. Often the entities are not chosen to suit the data generation processes, but are those “to hand”. Using data such as tweet locations opens up the risk of ecological fallacy, frequently also seen as the modifiable areal unit problem (MAUP). Would the observed values change if the shape and placing of the entity were manipulated? Support is closely tied to the design of observations, sampling schemes, and of course electoral re-districting.
Thursday 25 July:
4. Spatial processes and autocorrelation:
In situations in which values of a variable of interest can be predicted from its near neighbours, the assumption of independence of observations is not sustained. The presence of spatial processes expressed as spatial autocorrelation may be used to enhance models. However, their presence also affects inference in models which do not take them into account. There are a number of ways to represent relationships between observations, expressing approximations to unobserved spatial processes. These may be used for testing for spatial autocorrelation, but such tests assume that our understanding of the data generation process is adequate, without omitted covariates or inappropriate functional forms.
Friday 26 July:
5. Modelling spatial data:
Finally, the course will survey the fitting of spatial regression models for continuous and discrete response variables, as applied in spatial econometrics, political science, and other disciplines. Extensions to eigenvector filtering, spatial quantile regression, and local spatial regression will also be mentioned. It will be pointed out that, in applied work, the best model may be one in which no residual spatial process is found; the parsimonious model may be one in which the correctly specified model has no “spatial story” of spillovers or other unobserved causal factors. However, on occasion, spatial processes are helpful in modelling, sometimes because it is not possible to observe the covariates that are proxied by relationships between neighbouring observations.
The course will be taught as lectures with practical examples, most of which may be reproduced using R and contributed R packages. If you wish to track the examples as well as the lecture presentation, and/or would like to use the practical examples to assist in absorbing the material and in planning your written assignment, please bring a laptop with R installed. A script to permit required contributed packages to be installed will also be made available here immediately before the course commences.
Basic essential readings
In order to gain most benefit from the course, it is desirable that participants have a reasonable grasp of some or many of the underlying conceptualisations, and how they have been reflected in software tools. Starting from Ward & Gleditsch, work back through the other suggestions until you feel confident that you understand what is going on in terms of concepts and methods, including the use of software (especially R and its Spatial Task View).
- Roger S. Bivand, Edzer J. Pebesma, and Virgilio Gómez-Rubio, 2008. Applied Spatial Data Analysis with R. Springer, New York (ch. 1-5, see discount info, or forthcoming second edition ch. 1-6 if published in time).
- John Fox, 2009. A Mathematical Primer for Social Statistics.Thousand Islands, CA, Sage (QASS 159).
- David O'Sullivan and David J. Unwin, 2010. Geographic Information Analysis. Hoboken, NJ: Wiley (ch. 1-4, 7, 8, 11, 12).
- Michael D. Ward and Kristian Skrede Gleditsch, 2008. Spatial Regression Models. Thousand Islands, CA, Sage (QASS 155).
- Stephen Wise, 2002. GIS Basics. London, Taylor & Francis, (or GIS Fundamentals, the forthcoming second edition if published in time).
Syllabus reading list (preliminary)
- Altman, M. and M. P. McDonald. 2011. BARD: Better Automated Redistricting. Journal of Statistical Software, 42(4): 1-28.
- Bivand, R. S. 2008. Implementing representations of space in economic geography. Journal of Regional Science, 48: 1-27.
- Bivand, R. S. 2009. Applying Measures of Spatial Autocorrelation: Computation and Simulation. Geographical Analysis, 41(4): 375-384.
- Bivand, R. S. 2010. Exploratory Spatial Data Analysis. In Manfred Fischer and Arthur Getis (eds) Handbook of Applied Spatial Analysis. Springer, Heidelberg, pp. 219-254.
- Bivand, R. S. 2012. After "Raising the Bar'': applied maximum likelihood estimation of families of models in spatial econometrics. Estadística Española, 54(177): 71-88.
- Bivand, R. S. and S. Szymanski, 2000. Modelling the spatial impact of the introduction of Compulsory Competitive Tendering. Regional Science and Urban Economics, 30: 203-219.
- Briant A., Combes, P. P. and M. Lafourcade. 2010. Dots to boxes: do the size and shape of spatial units jeopardize economic geography estimations? Journal of Urban Economics, 67:287-302.
- Dray, S. et al., 2012. Community ecology in the age of multivariate multiscale spatial analysis. Ecological Monographs, 82:257-275.
- Gelfand A. E. 2010. Misaligned Spatial Data: The Change of Support Problem. In Alan E. Gelfand et al. Handbook of Spatial Statistics. Chapman & Hall/CRC, Boca Raton, 517-539.
- Getis A. 2010. Spatial Autocorrelation. In Manfred Fischer and Arthur Getis (eds) Handbook of Applied Spatial Analysis. Springer, Heidelberg, pp. 255-278.
- Gibbons, S. and H. G. Overman. 2012. Mostly Pointless Spatial Econometrics? Journal of Regional Science, 52(2): 172–191.
- Gotway, C. A. and L. J. Young. 2002. Combining incompatible spatial data. Journal of the American Statistical Association, 97(458): 632-648.
- Graham, M. 2010. Neogeography and the Palimpsests of Place. Tijdschrift voor Economische en Sociale Geografie 101(4): 422–436.
- Graham, M. and M. Zook. 2011. Visualizing Global Cyberscapes: Mapping User Generated Placemarks. Journal of Urban Technology 18(1): 115-132.
- Griffith, D. A. 2010. Spatial Filtering. In Manfred Fischer and Arthur Getis (eds) Handbook of Applied Spatial Analysis. Springer, Heidelberg, pp. 301-318.
- Haining, R. P. 2010. The Nature of Georeferenced Data. In Manfred Fischer and Arthur Getis (eds) Handbook of Applied Spatial Analysis. Springer, Heidelberg, pp. 197-217.
- Lee, B. A. et al. 2008. Beyond the census tract: patterns of determinants of racial segregation at multiple geographic scales. American Sociological Review 2008, 73:766-791.
- Le Gallo, J. and B. Fingleton. 2012. Measurement errors in a spatial context. Regional Science and Urban Economics, 42: 114–125.
- LeSage, J. P. and R. K. Pace, 2010. Spatial Econometric Models. In Manfred Fischer and Arthur Getis (eds) Handbook of Applied Spatial Analysis. Springer, Heidelberg, pp. 355-376.
- Longley, P. A., & Cheshire, J. A. 2012. Identifying Spatial Concentrations of Surnames. International Journal of Geographic Information Science, 26(2): 309-325.
- Menon, C. 2012. The bright side of MAUP: Defining new measures of industrial agglomeration. Papers in Regional Science, 91(1): 3-28.
- McMillen, D. P. 2010. Issues in Spatial Data Analysis, Journal of Regional Science, 50, 119–141.
- McMillen, D. P. 2012. Perspectives on Spatial Econometrics: Linear Smoothing with Structured Models. Journal of Regional Science, 52(2): 192–209.
- McMillen, D. P. 2013. Quantile Regression for Spatial Data. Springer, Heidelberg.
- Ratcliffe, J. H. and M. J McCullagh. 1999. Hotbeds of crime and the search for spatial accuracy. Journal of Geographical Systems, 1(4): 385-398.
- Revelli, F. 2003. Reaction or interaction? Spatial process identification in multi-tiered government structures. Journal of Urban Economics, 53: 29-53.
- Revelli, F. 2006. Performance rating and yardstick competition in social service provision. Journal of Public Economics, 90: 459-475.
- Revelli, F. and P. Tovmo, 2007. Revealed yardstick competition: local government efficiency patterns in Norway. Journal of Urban Economics, 62: 121-134.
- Shelton, T., Zook, M. and M. Graham. 2012. The Technology of Religion: Mapping Religious Cyberscapes. The Professional Geographer 64(4): 602-617.
- Wakefield J. C. and H. Lyons. 2010. Spatial Aggregation and the Ecological Fallacy. In Alan E. Gelfand et al. Handbook of Spatial Statistics. Chapman & Hall/CRC, Boca Raton, 541-558.
- Waller L and B. Carlin. 2010. Disease mapping. In Alan E. Gelfand et al. Handbook of Spatial Statistics. Chapman & Hall/CRC, Boca Raton, 217-243.
- Weidmann, N. B. and K. Skrede Gleditsch. 2010. Mapping and Measuring Country Shapes. The R Journal, 2(1): 18-24.
- Wheeler D. C. and A. Páez, 2010. Geographically Weighted Regression. In Manfred Fischer and Arthur Getis (eds) Handbook of Applied Spatial Analysis. Springer, Heidelberg, pp. 461-486.
- Zook, M., Graham, M., Shelton, T. and S. Gorman. 2010. Volunteered Geographic Information and Crowdsourcing Disaster relief: A Case Study of the Haitian Earthquake. World Health and Medical Policy 2(2): 7-33.
Additional readings (preliminary)
- Yongwan Chun and Daniel A Griffith, forthcoming. Spatial Statistics and Geostatistics: Theory and Applications for Geographic Information Science and Technology. Thousand Islands, CA, Sage.
- Carlo Gaetan and Xavier Guyon. 2010. Spatial statistics and modeling, New York, Springer.
Roger Bivand is a British geographer educated at Cambridge and the London School of Economics, and is Professor of Geography in the Department of Economics at NHH Norwegian School of Economics. He is active in development of contributed software for analysing spatial data using the R statistical language, and is an Ordinary Member of the R foundation.