Airbnbs vs. rental properties, visualised

Originally submitted: October 2023

This program is the culmination of an introductory Python course. I got the idea for it from this literature review I had done previously, where the article authors claimed that Airbnb is disrupting the rental market in New Zealand without providing any proof. My aim was to produce some exploratory data visualisations to see whether any obvious correlations are unveiled between the number and price of Airbnb listings and residential rental properties. The scope for this project was just to produce the visualisations, but at some point I hope to add to it by doing some statistical analysis.

You can find the Python script, input files and a set of outputs in the Github repository here. The program is capable of producing six different map types for a given location, with three cities (Auckland, Wellington and Christchurch) included in the input data. I have included two example outputs here so that you can get an idea of what the program does without having to run it yourself.

Example output 1: Ratio of rental counts to Airbnb counts in Christchurch, by SA2 and number of bedrooms

Example output 2: Ratio of median rental prices to median Airbnb prices in Christchurch, by SA2 and number of bedrooms

Problem and proposed solution

As a Master of Applied Data Science student with a particular interest in spatial data science, I knew that I wanted to take this project as an opportunity to make some form of interactive map in Python.

In terms of what data I wanted to visualise with that map, I took inspiration from a journal article I had reviewed for another course which I felt needed improvement. This article, “Disrupting the regional housing market: Airbnb in New Zealand” (Campbell et al., 2019), failed to actually prove whether or not Airbnb is disrupting the regional housing market as the title suggests. In my review, I suggested that the authors could have used census data on the number of rental properties and average weekly rent and compared these to the number of “entire home” Airbnb listings and their nightly rate (multiplied by seven). I am keen to carry out this suggestion myself and see what the results are.

At a minimum, I aim to produce an interactive map which displays the following:

A choropleth map layer coloured by the count of residential rental property listings for each areal unit at the smallest scale possible (i.e. suburb), relative to the count of Airbnb “entire home” listings within the same areal unit.
A second choropleth map layer coloured by the average weekly rent price within each areal unit, relative to the average Airbnb “entire home” nightly rate multiplied by seven within the same areal unit.

Development process

The process began with gathering the necessary data. Getting the Airbnb data was simple, as it was readily available from the Inside Airbnb website in CSV format (Inside Airbnb, 2023).

However, obtaining rental property data proved to be more difficult. Originally I had planned to use Trade Me’s API to get data on rental listings from their Property section, but Trade Me declined my application to access their API. Instead I had to rely on my back-up plan of using Stats NZ rental data (Stats NZ–Tatauranga Aotearoa, 2018a). This turned out to be a blessing in disguise, as I soon realised that it made more sense to use the Stats NZ data since it is supposed to reflect the entire rental stock of the country, whereas the Trade Me rental listings would only reflect rental properties which were on the market (and on that platform) at the time.

The Stats NZ data that I used was from the 2018 Census, and at its lowest level was divided into areal units called SA2s which are comparable to suburbs. In order to map this data I also needed to download the SA2 geographies from Stats NZ, which I chose to do in shapefile format (Stats NZ–Tatauranga Aotearoa, 2018b). Due to the size of both these files and the rental data files when looking at the whole of New Zealand, it was at this point that I decided to narrow my focus down to the three largest cities – Auckland, Wellington and Christchurch.

Now that I had these files downloaded, I needed to get them into Python as dataframes. This ought to have been straightforward, and it was, except for the rental data. I downloaded this using the data tool NZ.Stat, which has its fair share of quirks and thankfully is in the process of being replaced. The Excel export option claimed to generate an XLS file, but I worked out that the resulting file needed to be converted to XLSX in order to be read correctly.

Once all the data was in, it needed to be cleaned and filtered. Once again the rental data stood out here, as it had multilevel columns (one level for bedrooms, one for price categories) which I needed to merge into a single level.

As I just mentioned, the rental data uses price categories (e.g. “$200 - $299”, and an open-ended “$600 and over”), whereas the Airbnb data gives the exact price for every Airbnb listing. This meant I had to get a bit creative with how I chose to interpret these categories and compare them to the Airbnb prices. While I could not calculate an average price with this data, I could still find the median price category, which is arguably the more suitable measure of central tendency anyway. When it came to calculating the relative prices I did choose to use the average of each price category (e.g. “$200 - $299” would be $249.50), but the open-ended upper category complicated things. I chose to settle this somewhat informally by researching the upper quartile market rents of the relevant cities on the Tenancy Services website, noting the highest price for each location, and finding that they were all close to the nice round number of $1000 (Tenancy Services, 2023). This gave me an average of $800 for the highest price category.

Choosing a formula for the relative counts and prices (or ratios) was an interesting part of the process, as I needed to consider how zero values would be treated. My initial instinct was to take the rental count (for example) and divide it by the Airbnb count, but in cases where there are no Airbnb listings this would result in a division by zero error. So instead I came up with the following approach:

If both variables are equal, ratio = 0.
If the rental variable is less than the Airbnb variable, ratio = Airbnb variable * (-1 / rental variable (+ 1 if rental variable = 0)).
If the rental variable is greater than the Airbnb variable, ratio = rental variable * (1 / Airbnb variable (+ 1 if Airbnb variable = 0)).

This results in a ratio where, if we consider more rental properties and fewer Airbnbs to be a good thing, then a bigger value is better. For example, a ratio of 6 means there are six times as many rental properties as Airbnbs. Meanwhile, a ratio of -2 means there are two times as many Airbnbs as rental properties. I decided to keep the same logic for comparing the prices – not because I think that higher rental prices are a positive thing, but because Airbnb prices being higher than rental prices is what is threatening the rental market.

I ran into a couple of hurdles while using the Plotly Express library to create my maps. Firstly, I found that the plotting of the maps was requiring excessive amounts of memory, sometimes to the point where it would run into a memory error. I mitigated this by using the TopoJSON library to simplify the geometry (basically reducing how many points have to be plotted) and deleting the map objects once the maps have been written to HTML files (as they continued to draw on memory otherwise).

These strategies helped, but I was still struggling to plot the rental price map in particular, as its categorical structure meant that it had a discrete scale instead of a continuous one. After a lot of researching, I learned that the “animation” argument that I was using to split map layers by number of bedrooms does not work correctly with discrete data in cases where not all of the categories are present in the first layer (Plotly, n.d.). This was not too much of an issue since I could easily convert the scale to be continuous, thus also making it consistent with the other maps.

However, using a continuous scale can be problematic in cases where there are NA/NaN values. For the price maps, SA2s which did not have any rental properties/Airbnbs would receive a zero, but I did not want this to be misinterpreted as their median price being $0. To solve this, I manually defined the colour scale to start off with a “discrete” NA category and then go on as a continuous scale. I used a similar technique to define the midpoint for the ratio maps.

How to use the program

When using the program with the included datasets, all the user needs to do is run main() and follow the prompts to enter the relevant filenames, city, and their chosen filename prefix in order to get an HTML file of each map type. If desired, they could comment out certain map types they do not want.

The user can run the program with other NZ.Stat or Inside Airbnb datasets if desired; they will just need to edit the global constants to conform with the new datasets. This flexibility means that the program will work for other cities and/or time periods.

Things that went well

I am proud of being able to complete the whole project in Python, as I originally thought that I would need to follow in the footsteps of students who did similar projects and exported their data to ArcGIS for mapping.
Although I was looking forward to learning how to work with APIs, I am glad that Trade Me denied my application, because my back-up plan ended up being better than the original plan.
I was able to achieve my minimum viable product plus a few extra features (splitting layers by bedrooms, extra maps for count and price, flexible functionality).
I am happy that this project made me find a workaround for importing NZ.Stat data, and I am interested in developing that further to work with a wider range of datasets.

Challenges

Getting denied by Trade Me was a bit stressful due to the sudden change of plans, added time pressure and so on, but as I have already discussed it worked out well in the end.
Exporting the NZ.Stat data was more frustrating than it should have been, but I got there eventually.
Deciding how to calculate the ratios was an interesting challenge. I chose to go with something that was both simple to calculate and easy to understand and compare, but I do wonder whether I should have used some form of nonlinear scale to better capture differences.
The memory usage issue when making maps was somewhat unexpected as I had never encountered such a problem when making maps in R or ArcGIS.
Not being able to use discrete categories with map animations was frustrating as it was not broadly documented, especially considering how problematic the limitation could be.
Perhaps the most important challenge was dealing with the quality of the Stats NZ rental data. As I mentioned earlier, this data is from the 2018 Census, which was notoriously plagued with data quality issues. Additionally, my decision to use data at the smallest areal unit possible meant that SA2s with small (but non-zero) counts in particular categories had their counts censored for confidentiality reasons. This meant that while my Airbnb data was complete, there would be some SA2s which appeared to have no rental properties in some cases where this was not true.

Future work

Future edits or additions to this program could include:

Extending the NZ.Stat import functionality.
Exploring different ratio formulas as mentioned above.
Doing statistical analysis to complement the map visualisations.
Running datasets for the entire country at a larger areal scale (e.g. territorial authorities). This should get around the confidentiality issue mentioned above.
Running comparisons over time, e.g. by downloading the newest dataset from Inside Airbnb in a year’s time to see if the (relative) counts and prices have increased.

References

Campbell, M., McNair, H., Mackay, M., & Perkins, H. C. (2019). Disrupting the regional housing market: Airbnb in New Zealand. Regional Studies, Regional Science, 6(1), 139-142. https://doi.org/10.1080/21681376.2019.1588156

Inside Airbnb (2023, September 2). Get the data. Retrieved September 22, 2023, from http://insideairbnb.com/get-the-data

Plotly (n.d.). Intro to animations in Python. Retrieved October 20, 2023, from https://plotly.com/python/animations/#current-animation-limitations-and-caveats

Stats NZ–Tatauranga Aotearoa (2018a). Weekly rent paid by household, by number of bedrooms, 2006, 2013, and 2018 Censuses. NZ.Stat. Retrieved October 9, 2023, from https://nzdotstat.stats.govt.nz/wbos/Index.aspx

Stats NZ–Tatauranga Aotearoa (2018b). Statistical Area 2 Higher Geographies 2018 (generalised). Stats NZ Geographic Data Service. Retrieved September 22, 2023, from https://datafinder.stats.govt.nz/layer/95065-statistical-area-2-higher-geographies-2018-generalised/

Tenancy Services (2023, October). Market rent. Retrieved October 20, 2023, from https://www.tenancy.govt.nz/rent-bond-and-bills/market-rent/