The Department of Geography’s Spatial Data Science ‘pathway’ offers undergraduates outside of Computer Science a unique opportunity to complete a suite of modules focused on the fundamentals of data science in Python. We believe that our spatial data science modules not only offer students a valuable set of tools for undertaking research at undergraduate and graduate levels, but that they also offer you a competitive advantage in today’s job market by helping you to stand out from the crowd.
What is Spatial Data Science?
According to Hal Varian, Google Chief Economist and UC Berkeley Professor, data science is:
“The ability to take data — to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it — that’s going to be a hugely important skill in the next decades.”Hal Varian [ref]
So if data science is about using code and algorithms to extract, process, visualise, analyse, and communicate insights derived from large volumes of data, then spatial data science is — on one level — simply about the practice of using spatial data in this process.
And when you start to look, it turns out that spatial data is everywhere because so much of ‘big data’ — everything from credit card purchases to Oyster card taps and Airbnb property listings — happens somewhere. Spatial data science can be used in retail and transportation planning, in risk management (especially reinsurance), in churn prediction and targeted marketing, in real estate (e.g. PropTech) and location-based services… the list goes on!
All of this means that the people with the skills to handle spatial data are in demand. But don’t take our word for it, here’s what Carto has to say on the matter:
However, the use of locational data presents special challenges for the data scientist: not only is spatial data stored differently from ‘regular’ data, but the presence of geography in a data set can also undermine some of the basic assumptions upon which statistical approaches depend. The short version of this is that observations are not independent: the fact that one wealthy person lives in an area makes it more likely that other wealthy people live nearby because households do not choose where to live at random. You might have heard of it as ‘birds of a feather flock together’ but it in practical analytical terms it means that patterns that look significant to a naive analyst may not actually mean all that much.
The Spatial Data Science Pathway
The SDS pathway is composed of a suite of taught and online/self-directed modules:
- Code Camp (an online ‘boot camp’, freely available to all)
- Foundations of Spatial Data Science
- Principles of Spatial Data Science
- Applications of Spatial Data Science
- Directed Readings (currently for Geography students only)
The modules are designed to build, one on top of the other, so that it is possible to complete the pathway despite having little or no prior experience of computer programming.
If you have already studied programming…
If you already have some experience in coding — particularly if it’s with the Python programming language — then we will consider applications from students wanting to skip the Foundations module and start with Principles. In rare situations we will also consider applications by Computer Science students wishing to jump in at the deep end with the Applications module, provided that they familiarise themselves with the spatial analytical concepts and methods through self-study of suggested reading.
We reserve the right to refuse applications to begin with Principles or Applications if we have reason to believe that a student has insufficient experience to cope with the pace and content of the module. Students wishing to start with Principles or Applications are therefore strongly advised to speak with us well in advance of their registration for the module.
We also expect them to review the content of the preceding modules and complete any practicals in order to ensure that they are comfortable with the materials covered. We are not in a position to provide additional support in these cases.
How We Teach
Most modules are a mix of lectures and practicals, typically an interactive 1–2 hour lecture and a 2-3-hour practical scheduled to run in a computer cluster with lecturer and GTA support.
Our philosophy is that you will learn a lot more about what real (spatial) data science entails by working with real-world data sets than with the kinds of heavily ‘curated’ data that you might have encountered in a statistics class. Real data is messy, uncertain, and frequently incomplete, so we think it’s important you see how the spatial data science workflow functions from end to end: acquisition, cleaning, transformation, and analysis.
Most of our teaching is done using the Python programming language, and to get you up and running we offer both a Docker image and an installation script for Anaconda Python. We strongly encourage you to install the required tools on your own computer so that you can work more quickly and effectively.
Who we are
Find out about people involved in the SDS pathway at the About page.
To read more about our teaching and learning, see blog posts with the Teaching tag.