Description

This is a short demo on webscrapping, multithreading, and some google APIs using kijiji as the website of choice. Keep in mind you shouldn't blast a website with requests as you are likely to get a time out if not a full ban.

Features

  • Can select the price range, and location to filter the listings
  • Leaves out dupplicates
  • Checks over 2000 listings in under 15mins
  • Saves the listings to a spreadsheet on GoogleDrive

Project Difficulties

  • Badly formated data.
  • Run time was getting out of hand after a couple scrapes of the website.
  • Speed issues writting the data to a google spreadsheet on google drive.
  • Space issues locally when working with the data.

My Solutions

  • Auto retry with exponential back off in order to make sure data is more complete.
  • Changed comparison checking algorithm to run in linear time from quadratic time.
  • Implemented multithreading in order to speed up the run time while checking for similarities.
  • Used Google SQL Instance in order to store data for the duration of this project.