The Hot Hand: A Fallacy or Phenomenon?

Introduction

Within the world of sports, the terms “hot hand” or being “in the zone” are frequently used to describe a streak of exceptional performance by a player. The “hot hand” term originated in the sport of basketball, suggesting that a player who has made numerous successive shots is more likely to make their next basket, as opposed to the same person who had missed their last few shots. For example, a basketball player may make an exceptional number of shots in a short period of time, such as Klay Thompson scoring 37 points in a quarter. Another example is Joe DiMaggio’s 56 game hitting streak in 1941, one of the most incredible hitting runs in baseball history. Ordinarily, these types of “runs” or “streaks” are rare. There is “a belief that the performance of a player during this particular period is significantly better than expected on the basis of the player’s overall record” 1. Often, this heightened performance can be attributed to increased confidence by the player. 

This belief is shared by a majority of players, coaches, and fans, yet there is little statistical evidence to support this phenomenon. In fact, a majority of studies suggest that the “hot hand” is a fallacy and advise coaches not to consider it when selecting plays. The “hot hand” phenomenon has been studied by many psychologists and statisticians who still debate this issue to this day. 

While the term “hot hand” may be most commonly associated with the world of sports, “studies have been done in other academic fields outside the sports domain, such as economics and cognitive science” 2. Through this project, I plan to explore the mystery of streaks both within and outside the world of sports.

Prior Research

The initial investigation into this topic @gilovich1985hot was published in 1985. It analyzed assorted data, including professional basketball field goal data from the 1980-1981 season, professional basketball free-throw data from the 1980-1982 seasons, and a controlled shooting experiment. While the study found that over 91% of fans agreed that a player has a better chance of making a shot after having just made his last two or three shots than he does after having just missed his last two or three shots, none of their data showed any evidence of this phenomenon. Instead, @gilovich1985hot argued that there is a wide-spread misperception of random sequences:

People’s intuitive conceptions of randomness depart systematically from the laws of chance. It appears that people expect the essential characteristics of a chance process to be represented not only globally in the entire sequences, but also locally, in each of its parts. For instance, people expect even short sequences of heads and tails to reflect the fairness of a coin and contain roughly 50% heads and 50% tails. This conception of chance has been described as a ‘belief in the law of small numbers’ according to which the law of large numbers applies to small samples as well. A locally representative sequence, however, deviates systematically from chance expectation: It contains too many alternations and not enough long runs.

@bar2006twenty published a review and critique of 20 years of “hot hand” research in 2006. This paper reviewed the @gilovich1985hot study in addition to subsequent research that used data from various sports including basketball, baseball, golf, darts, tennis, bowling, and more. Baseball and basketball studies dominate the literature on this subject, yet the strongest support for the “hot hand” can be found in more individual sports such as horseshoe pitching and tennis.

Demonstrations of hot hands per se are rare and often weak, due to various reasons: using and unrealistic model and questionable data, setting questionable definitions for hot and cold players, relating streakiness to difficulty of task, combining and analyzing data of all players as a group, and other constraints related to the kind of sport studied.

In the end, this study found that the question remains unresolved.

The Debate

The debate surrounding the “hot hand” is multifaceted, encompassing methodological concerns, statistical intricacies, and divergent interpretations of empirical findings. Addressing the limitations of prior research and exploring alternative definitions of success are pivotal to unraveling this enduring mystery.

10 Questions to be Answered

  1. What data is available for my topic?
  2. What does current literature on the topic argue?
  3. How can I build off of the current research and approach the topic in a novel way?
  4. Should I limit the scope of my topic to sports or can I expand past that?
  5. How should I define success? (the concept of the hot hand is that success breeds success)
  6. What is the best way to visualize the data?
  7. Do athletes and/or the public believe in this phenomenon?
  8. Is there any evidence that the hot hand exists?
  9. Does the hot hand impact strategy of a game? Should it impact strategy?
  10. If the hot hand does exist, in what sport (or area outside of sports) is there the most evidence in support of the phenomenon?

Goals and Hypothesis

Seeking evidence that the hot hand exists, the null hypothesis posits its non-existence. This exploration aims to contribute nuanced insights, bridging gaps in current understanding and reevaluating the perennial debate surrounding the elusive “hot hand.”

Data Gathering

Data Cleaning

Data Exploration

Naive Bayes

Clustering

Dimensionality Reduction

Decision Trees

Conclusion

Introduction to the Hot Hand Phenomenon:

In the dynamic realm of sports, the concepts of a “hot hand” or being “in the zone” have become ubiquitous, often used to describe a player’s extraordinary performance streak. Originating in basketball, the term suggests that a player on a successful streak is more likely to continue their success. This belief, rooted in the increased confidence of the player, extends beyond basketball, resonating in baseball and other fields. Despite its prevalence, statistical evidence supporting the existence of the hot hand is scarce, with numerous studies suggesting it is a fallacy. This project embarks on an exploration within the sports domain to scrutinize the mystery of streaks.

Diverse Dataset Exploration:

Throughout the semester, I undertook a comprehensive exploration of the hot hand phenomenon, employing an array of datasets with varying complexities. The NCAA basketball data, scraped from the Villanova Men’s Basketball team in R, offered insights into shot data. The baseball data, sourced from FanGraphs and scraped in R, included both individual player data and detailed pitch-by-pitch information. Additionally, news text data was gathered using an API in Python. Each dataset’s unique cleaning process laid the foundation for an in-depth Exploratory Data Analysis (EDA), allowing for the refinement of hypotheses and the delineation of the investigative path.

EDA Signals and Insights:

Promising signals emerged during the EDA phase, particularly in individual player baseball data and the NCAA shot data. The former hinted at leveraging past hard-hit data to predict future performance, showcasing potential autocorrelation or seasonality effects. This finding holds promise for refining future models. The NCAA shot data, while lacking strong correlations individually, exhibited a descending trend in lag variables, suggesting the immediate prior shot’s influence on the current outcome.

Model Performance and Limitations:

The Naive Bayes model, designed to predict made or missed shots, failed to outperform random guessing, aligning with the null hypothesis that the hot hand does not exist. Further stages of analysis required the addition of numerical feature variables for the NCAA data, including shot value, score differential, and shooter field goal percentage.

While I was successful in reducing dimensions (4 variables accounted for over 75% of overall variance), the various clustering methods (KMeans, DBSCAN, and Hierarchical Clustering) failed to conclusively establish the presence of discernible data clusters in the shot data. Decision trees and random forests, though outperforming the Naive Bayes classifier, primarily relied on shot value rather than the lag variable, reaffirming its lower predictive power.

Looking Ahead:

I recognize the limitations of predominantly relying on NCAA shot data to confirm the hot hand fallacy. I plan to revisit this topic at a later date, armed with a deeper understanding of time series data and an expanded arsenal of machine learning algorithms. Future analyses may explore this topic using alternative definitions of success, such as launch speeds and angles in baseball. For now, my findings align with prior research, reinforcing the notion that the hot hand is indeed a fallacy.

Extra Joke

What do you get when you cross a pirate with a data scientist?



Someone who specializes in Rrrr.

Thank you for taking the time to explore my project, and kudos for enduring all the humor along the way!

Footnotes

  1. @gilovich1985hot↩︎

  2. @bar2006twenty↩︎