Chicago Real Estate Web Scraping and Analysis

Project Overview

This project involved scraping Chicago real estate listings from online sources and performing analysis. A Python script was created to handle the scraping process and compile a dataset, and Power BI desktop was used to clean the dataset, perform analysis, and provide visualizations. The goal was to identify patterns in rental prices, assess neighborhood livability, and explore factors affecting pricing across different ZIP codes in Chicago.

Key libraries used:

Source Considerations

To begin, I inspected listings from different sources to see what kind of information was consistently provided about the rental units. I also had to ensure through the sites' robots.txt that I could use the data collected for eductational purposes. Considering what was most often provided and what might provide interesting insight into the state of Chicago's real estate situation, I selected these attributes for the dataset:

Web Scraping

Initially, I used the Python library BeautifulSoup for scraping, but most listings were loaded dynamically and weren't found in the raw source html. Due to these restraints, I switched to the library Selenium and implemented grid-based scraping by iterating through map coordinates to fetch pertinent listing pages, then extracted the dataset features from each page. The system I used for scraping is as follows:

Below is a summary of key statistical measures for the final dataset’s numerical columns, generated using the Pandas describe() function.
df describe

Data Mining & Analysis

After gathering the data, I processed and analyzed it using Power BI. Key analyses included:

Data Visualization in Power BI

The data was structured into an interactive dashboard, allowing users to filter and explore various attributes.

Key Findings

Code and Implementation

The source code for this project is available on GitHub. The following files are included:

Future Work

Next steps could include incorporating more features to the dataset such as unit type(condo, apartment, house) or popular amentities(pets allowed, gym, personal parking). More cities and states can easily be incorporated using the same scraping scripts.

Project Takeaways

Before this project, I had never applied web scraping other than in very simple academic work. Now, I know how to use libraries that allow for scraping dynamic sources rather than only the source html. In addition, I learned a few techniques to decrease the likelihood of anti-bot or captcha detection.

Outside of Python, I used Power BI for the first time here. I've used Tableau before, and I felt that some knowledge of that carried over to Power BI. I gained some experience using DAX and Power Query to manipulate data and provide the visualizations that I wanted to.