For this project, I created and trained a Convolutional Neural Network (CNN) using PyTorch to classify the tone of video game screenshots. Initially, the model was designed for individual image inference, but I later added real-time inference functionality, enabling predictions through an overlay on live gameplay.
Key libraries used:
The dataset consists of approximately 900 PNG images with a source resolution of 3440x1440, captured from gameplay recordings across 16 different video games.
Images are categorized into three tone-based directories: "Horror," "Action," and "Scenic." The classification criteria are as follows:
Using Torchvision’s transforms
package, I applied various transformations to improve model generalizability. Initially, flip and crop transformations were not included, but their addition improved performance on unseen games.
Transformations applied:
After applying transformations, I split the dataset into training (80%) and test (20%) sets and defined DataLoaders with a batch size of 32.
The model consists of three convolutional layers, three fully connected layers, and pooling layers after each convolution. The expected input shape is [batch_size, 3, 224, 224]
, where 3 represents the RGB channels.
The summary above illustrates transformations applied to a sample input.
Training for five epochs produced optimal results. Cross-entropy loss and accuracy were tracked throughout the process, using Adam optimization to adjust weights and zero gradients.
Accuracy and loss plots:
Utilizing OpenCV, MSS, PIL, and PyGetWindow, the model classifies frames in real time. The gameplay window is captured, processed, and classified, with an overlay displaying the results.
Silent Hill 3 – Correctly identifies horror tone.
Dragon Ball Sparking! Zero – Some action scenes misclassified as scenic.
Garry’s Mod – Inconsistent classification of bright environments.
Next steps include tuning CNN parameters, experimenting with different pooling methods, and increasing dataset diversity. I also plan to explore deeper architectures leveraging my available hardware.
The source code for this project is available on GitHub.
This project was my first hands-on experience with computer vision. Implementing my own CNN deepened my understanding of data preprocessing, model architecture, and real-time inference optimization.