Discussion

Project Plans

Instructor Feedback: “Interesting choice of topic. However, you cannot (and I repeat - cannot) just use a single subreddit. You must figure out a way of finding and including data from across the corpus leveraging NLP tools.”
Instructor Feedback: “You also must include external data.”

We expanded our dataset to include three subreddits: AmItheAsshole, AmIOverreacting, and AskReddit, representing different types of advice-seeking behavior. This allowed for broader analysis and comparisons across platforms.
We incorporated external data by including the Dear Abby advice column dataset, enabling comparisons between modern and traditional advice-seeking platforms.

Peer Feedback: “The EDA should have more structured goals to guide the analysis.”
Peer Feedback: “Make sure your EDA focuses on telling a coherent story rather than just exploring random trends.”
Instructor Feedback: “Highlight how Reddit’s popularity compares to Dear Abby and emphasize the dynamics of newer subreddits like AmIOverreacting.”

We refined our EDA goals to focus on specific questions, such as seasonal and hourly posting patterns, the popularity of subreddits, and engagement metrics.
The analysis was structured to emphasize the comparison between Reddit and Dear Abby and the dynamics of newer subreddits like AmIOverreacting. This ensured a more coherent narrative and actionable insights.

Instructor Feedback: “Don’t just stick to one topic modeling method. Experiment with multiple approaches to find what works best for your dataset.”
Peer Feedback: “Your topic modeling results could benefit from more detailed comparisons across subreddits.”
Peer Feedback: “Try to align the topic modeling findings with user sentiment and judgment labels for a richer analysis.”

We tested multiple topic modeling methods, including TF-IDF, Latent Semantic Analysis (LSA), and Non-Negative Matrix Factorization (NMF), ultimately selecting NMF for its superior interpretability and performance.
Detailed comparisons of topic distributions across subreddits were included to highlight the thematic differences between platforms like AskReddit, AmItheAsshole, and Dear Abby.
Sentiment and judgment labels were integrated into the topic analysis, allowing us to explore correlations and draw deeper insights.

Instructor Feedback: “Focus on explaining your models’ performance and feature importance to provide actionable insights.”
Peer Feedback: “Consider balancing your data and addressing any ambiguous labels to improve model performance.”
Peer Feedback: “It might be interesting to explore how sentiment scores and topics contribute to predicting community judgments.”

We addressed ambiguous labels (e.g., ESH in AITA and “Unclear” in AIO) by removing them, ensuring clearer and more reliable training data.
Feature importance analysis was conducted, highlighting the impact of sentiment scores, topics, and engagement metrics on model predictions.
Performance metrics such as precision, recall, and AUC scores were emphasized to communicate model strengths and weaknesses effectively.

Instructor Feedback: “Ensure that your website is easy to navigate and that results are presented clearly and visually.”
Peer Feedback: “Consider adding more detailed captions and context for graphs and tables to make the results self-explanatory.”
Peer Feedback: “Highlight key takeaways from each section to help viewers quickly understand the main points.”

The website was structured into clearly defined sections (e.g., EDA, NLP, ML, Discussion), with intuitive navigation and a consistent format.
Detailed captions and explanations were added to all graphs and tables, ensuring they could be understood independently.
Key takeaways were summarized at the end of each section to reinforce the main points and improve the user experience.