Berkeley Artificial Intelligence Research Lab (BAIR)
Predicting Social Media Addiction
using Ensemble and Deep Learning
Project Overview
This project explored how deep learning models can be used to identify patterns associated with social media addiction from behavioral and usage data. Working with real-world datasets, I examined how engagement signals correlate with indicators of compulsive use. Rather than treating prediction as an endpoint, the project emphasized careful evaluation and interpretability to understand what models were learning and where their predictions became unreliable.
Methods and Work
Preprocessed social media usage data to extract features relevant to engagement and addiction-related patterns.
Designed and trained neural network models to predict addiction-related risk indicators.
Evaluated Logistic Regression, Random Forest, and Multi-Layer Perceptron model performance using appropriate classification metrics, including precision–recall analysis.
Analyzed model behavior to identify limitations, bias, and sensitivity to data quality.
Outcomes
Increased model accuracy from 96% to 99% after refining neural network architecture and feature selection.
Found that strong performance metrics did not always translate to stable predictions across different data splits and user groups.
Observed that predictive outcomes were highly sensitive to how addiction-related labels were defined, encoding assumptions about human behavior that were not visible in aggregate metrics..
Developed a critical perspective on the limits of using predictive models to represent complex social and psychological phenomena.
Confusion matrix for the Random Forest model, highlighting asymmetric errors and the limits of high aggregate accuracy.
Project Overview
This project focused on identifying AI-generated and manipulated media using deep learning models trained on real-world social media data. Working with datasets sourced from platforms such as Instagram and TikTok, I examined how subtle visual and distributional artifacts distinguish synthetic media from authentic content. Rather than treating detection as a binary task, the project emphasized careful evaluation to understand where models succeed, where they fail, and how confidence degrades under noisy conditions.
Methods and Work
Preprocessed visual and metadata features to account for platform-specific compression and filtering
Built Python-based, Multi-Layer Perceptron classification models to detect AI-generated and filtered media using real-world Instagram and TikTok datasets
Assessed model performance using precision–recall analysis and confusion matrices
Analyzed how dataset quality and labeling noise influence robustness and generalizations
Outcomes
Improved classification accuracy from 92% to 97% through iterative model and feature refinement.
Found that model confidence degraded significantly when trained on noisier or inconsistently labeled datasets, even when accuracy appeared high.
Observed recurring failure cases where AI-generated media closely resembled heavily filtered authentic content, complicating binary detection.
Developed an understanding of how detection systems can project false certainty in real-world moderation and verification contexts.
Identifying AI-Generated Media using Deep Learning
Reflection
Across these projects, I began to notice how easily strong performance metrics can mask deeper weaknesses in a system. In both detection and behavioral prediction models, assumptions about data quality, labeling, and evaluation unintentionally shape what the system appears to understand. I learned that models can produce confident outputs while reinforcing bias, often obscuring who is affected when errors occur. Because of this, my focus shifted from improving accuracy alone to treating evaluation and interpretability as central design concerns. I now approach applied machine learning with more care, paying closer attention to how technical decisions influence trust and understanding in real-world contexts.