Originally submitted: October 2023
This program is the culmination of an introductory Python course. I got the idea for it from this literature review I had done previously, where the article authors claimed that Airbnb is disrupting the rental market in New Zealand without providing any proof. My aim was to produce some exploratory data visualisations to see whether any obvious correlations are unveiled between the number and price of Airbnb listings and residential rental properties. The scope for this project was just to produce the visualisations, but at some point I hope to add to it by doing some statistical analysis.
You can find the Python script, input files and a set of outputs in the Github repository here. The program is capable of producing six different map types for a given location, with three cities (Auckland, Wellington and Christchurch) included in the input data. I have included two example outputs here so that you can get an idea of what the program does without having to run it yourself.
Example output 1: Ratio of rental counts to Airbnb counts in Christchurch, by SA2 and number of bedrooms
Example output 2: Ratio of median rental prices to median Airbnb prices in Christchurch, by SA2 and number of bedrooms
As a Master of Applied Data Science student with a particular interest in spatial data science, I knew that I wanted to take this project as an opportunity to make some form of interactive map in Python.
In terms of what data I wanted to visualise with that map, I took inspiration from a journal article I had reviewed for another course which I felt needed improvement. This article, “Disrupting the regional housing market: Airbnb in New Zealand” (Campbell et al., 2019), failed to actually prove whether or not Airbnb is disrupting the regional housing market as the title suggests. In my review, I suggested that the authors could have used census data on the number of rental properties and average weekly rent and compared these to the number of “entire home” Airbnb listings and their nightly rate (multiplied by seven). I am keen to carry out this suggestion myself and see what the results are.
At a minimum, I aim to produce an interactive map which displays the following:
The process began with gathering the necessary data. Getting the Airbnb data was simple, as it was readily available from the Inside Airbnb website in CSV format (Inside Airbnb, 2023).
However, obtaining rental property data proved to be more difficult. Originally I had planned to use Trade Me’s API to get data on rental listings from their Property section, but Trade Me declined my application to access their API. Instead I had to rely on my back-up plan of using Stats NZ rental data (Stats NZ–Tatauranga Aotearoa, 2018a). This turned out to be a blessing in disguise, as I soon realised that it made more sense to use the Stats NZ data since it is supposed to reflect the entire rental stock of the country, whereas the Trade Me rental listings would only reflect rental properties which were on the market (and on that platform) at the time.
The Stats NZ data that I used was from the 2018 Census, and at its lowest level was divided into areal units called SA2s which are comparable to suburbs. In order to map this data I also needed to download the SA2 geographies from Stats NZ, which I chose to do in shapefile format (Stats NZ–Tatauranga Aotearoa, 2018b). Due to the size of both these files and the rental data files when looking at the whole of New Zealand, it was at this point that I decided to narrow my focus down to the three largest cities – Auckland, Wellington and Christchurch.
Now that I had these files downloaded, I needed to get them into Python as dataframes. This ought to have been straightforward, and it was, except for the rental data. I downloaded this using the data tool NZ.Stat, which has its fair share of quirks and thankfully is in the process of being replaced. The Excel export option claimed to generate an XLS file, but I worked out that the resulting file needed to be converted to XLSX in order to be read correctly.
Once all the data was in, it needed to be cleaned and filtered. Once again the rental data stood out here, as it had multilevel columns (one level for bedrooms, one for price categories) which I needed to merge into a single level.
As I just mentioned, the rental data uses price categories (e.g. “$200 - $299”, and an open-ended “$600 and over”), whereas the Airbnb data gives the exact price for every Airbnb listing. This meant I had to get a bit creative with how I chose to interpret these categories and compare them to the Airbnb prices. While I could not calculate an average price with this data, I could still find the median price category, which is arguably the more suitable measure of central tendency anyway. When it came to calculating the relative prices I did choose to use the average of each price category (e.g. “$200 - $299” would be $249.50), but the open-ended upper category complicated things. I chose to settle this somewhat informally by researching the upper quartile market rents of the relevant cities on the Tenancy Services website, noting the highest price for each location, and finding that they were all close to the nice round number of $1000 (Tenancy Services, 2023). This gave me an average of $800 for the highest price category.
Choosing a formula for the relative counts and prices (or ratios) was an interesting part of the process, as I needed to consider how zero values would be treated. My initial instinct was to take the rental count (for example) and divide it by the Airbnb count, but in cases where there are no Airbnb listings this would result in a division by zero error. So instead I came up with the following approach:
This results in a ratio where, if we consider more rental properties and fewer Airbnbs to be a good thing, then a bigger value is better. For example, a ratio of 6 means there are six times as many rental properties as Airbnbs. Meanwhile, a ratio of -2 means there are two times as many Airbnbs as rental properties. I decided to keep the same logic for comparing the prices – not because I think that higher rental prices are a positive thing, but because Airbnb prices being higher than rental prices is what is threatening the rental market.
I ran into a couple of hurdles while using the Plotly Express library to create my maps. Firstly, I found that the plotting of the maps was requiring excessive amounts of memory, sometimes to the point where it would run into a memory error. I mitigated this by using the TopoJSON library to simplify the geometry (basically reducing how many points have to be plotted) and deleting the map objects once the maps have been written to HTML files (as they continued to draw on memory otherwise).
These strategies helped, but I was still struggling to plot the rental price map in particular, as its categorical structure meant that it had a discrete scale instead of a continuous one. After a lot of researching, I learned that the “animation” argument that I was using to split map layers by number of bedrooms does not work correctly with discrete data in cases where not all of the categories are present in the first layer (Plotly, n.d.). This was not too much of an issue since I could easily convert the scale to be continuous, thus also making it consistent with the other maps.
However, using a continuous scale can be problematic in cases where there are NA/NaN values. For the price maps, SA2s which did not have any rental properties/Airbnbs would receive a zero, but I did not want this to be misinterpreted as their median price being $0. To solve this, I manually defined the colour scale to start off with a “discrete” NA category and then go on as a continuous scale. I used a similar technique to define the midpoint for the ratio maps.
When using the program with the included datasets, all the user needs to do is run main() and follow the prompts to enter the relevant filenames, city, and their chosen filename prefix in order to get an HTML file of each map type. If desired, they could comment out certain map types they do not want.
The user can run the program with other NZ.Stat or Inside Airbnb datasets if desired; they will just need to edit the global constants to conform with the new datasets. This flexibility means that the program will work for other cities and/or time periods.
Future edits or additions to this program could include:
Campbell, M., McNair, H., Mackay, M., & Perkins, H. C. (2019). Disrupting the regional housing market: Airbnb in New Zealand. Regional Studies, Regional Science, 6(1), 139-142. https://doi.org/10.1080/21681376.2019.1588156
Inside Airbnb (2023, September 2). Get the data. Retrieved September 22, 2023, from http://insideairbnb.com/get-the-data
Plotly (n.d.). Intro to animations in Python. Retrieved October 20, 2023, from https://plotly.com/python/animations/#current-animation-limitations-and-caveats
Stats NZ–Tatauranga Aotearoa (2018a). Weekly rent paid by household, by number of bedrooms, 2006, 2013, and 2018 Censuses. NZ.Stat. Retrieved October 9, 2023, from https://nzdotstat.stats.govt.nz/wbos/Index.aspx
Stats NZ–Tatauranga Aotearoa (2018b). Statistical Area 2 Higher Geographies 2018 (generalised). Stats NZ Geographic Data Service. Retrieved September 22, 2023, from https://datafinder.stats.govt.nz/layer/95065-statistical-area-2-higher-geographies-2018-generalised/
Tenancy Services (2023, October). Market rent. Retrieved October 20, 2023, from https://www.tenancy.govt.nz/rent-bond-and-bills/market-rent/