Real-Time Game Frame Classifier

Project Overview

For this project, I created and trained a Convolutional Neural Network (CNN) using PyTorch to classify the tone of video game screenshots. Initially, the model was designed for individual image inference, but I later added real-time inference functionality, enabling predictions through an overlay on live gameplay.

Key libraries used:

Data Sources

The dataset consists of approximately 900 PNG images with a source resolution of 3440x1440, captured from gameplay recordings across 16 different video games.

Images are categorized into three tone-based directories: "Horror," "Action," and "Scenic." The classification criteria are as follows:

Data Preparation

Using Torchvision’s transforms package, I applied various transformations to improve model generalizability. Initially, flip and crop transformations were not included, but their addition improved performance on unseen games.

Transformations applied:

After applying transformations, I split the dataset into training (80%) and test (20%) sets and defined DataLoaders with a batch size of 32.

Model Definition

The model consists of three convolutional layers, three fully connected layers, and pooling layers after each convolution. The expected input shape is [batch_size, 3, 224, 224], where 3 represents the RGB channels.

Model Summary

The summary above illustrates transformations applied to a sample input.

Training

Training for five epochs produced optimal results. Cross-entropy loss and accuracy were tracked throughout the process, using Adam optimization to adjust weights and zero gradients.

Accuracy and loss plots:

Accuracy Plot Loss Plot

Live Inference

Utilizing OpenCV, MSS, PIL, and PyGetWindow, the model classifies frames in real time. The gameplay window is captured, processed, and classified, with an overlay displaying the results.

Silent Hill 3 – Correctly identifies horror tone.

Dragon Ball Sparking! Zero – Some action scenes misclassified as scenic.

Garry’s Mod – Inconsistent classification of bright environments.

Future Work

Next steps include tuning CNN parameters, experimenting with different pooling methods, and increasing dataset diversity. I also plan to explore deeper architectures leveraging my available hardware.

Code and Implementation

The source code for this project is available on GitHub.

Project Takeaways

This project was my first hands-on experience with computer vision. Implementing my own CNN deepened my understanding of data preprocessing, model architecture, and real-time inference optimization.