Description
This is a short demo on webscrapping, multithreading, and some google APIs using kijiji as the website of choice. Keep in mind you shouldn't blast a website with requests as you are likely to get a time out if not a full ban.
![](images/scrapper.gif)
Features
- Can select the price range, and location to filter the listings
- Leaves out dupplicates
- Checks over 2000 listings in under 15mins
- Saves the listings to a spreadsheet on GoogleDrive
Project Difficulties
- Badly formated data.
- Run time was getting out of hand after a couple scrapes of the website.
- Speed issues writting the data to a google spreadsheet on google drive.
- Space issues locally when working with the data.
My Solutions
- Auto retry with exponential back off in order to make sure data is more complete.
- Changed comparison checking algorithm to run in linear time from quadratic time.
- Implemented multithreading in order to speed up the run time while checking for similarities.
- Used Google SQL Instance in order to store data for the duration of this project.