Forecasting and Prediction

The Forecasting Proficiency Test

The Forecasting Proficiency (FPT) aims to gather the state of the art in research on forecasting ability, from talent spotting in crowds to identification of Superforecasters, to develop a general test of forecasting ability. Numerous studies over the past decade and a half have studied human forecasting judgment. Convergent results across these studies suggest that some people are consistently more accurate in their forecasts than others. However, the psychometric evaluation of what makes a good individual forecaster has been only one element of these studies. Our latest work puts it at center stage. While the ultimate goal of this project is test development, we are trying to answer several crucial questions along the way:

  • Can actual forecasting tasks and standalone cognitive assessment tasks provide complementary information about forecasting ability, or is one source of information superior to the other?
  • If forecasting tasks are to be a part of the test, is it possible to generate test items that involve real forecasts that have stable and interpretable item properties?
  • If standalone cognitive tasks are to be part of the test, how can we maximize their predictive utility? Can we use different types of tasks to form a theoretical model of what makes a good forecaster?

Real-time Accuracy Scoring

A fundamental problem in using forecasting judgments to conduct psychometric evaluation of individuals is that there is a time lag between the time a forecast is made and the time the ground truth is obtained. So, unlike a problem on a math test, the accuracy of a forecast cannot be scored right away. Even though knowing about a forecaster’s past accuracy is a well-established method of evaluating their ability, this scoring delay makes it impractical in many applied settings. Our recent work has shown that a way around this problem is to evaluate forecasters by comparing their predictions to the wisdom of the crowd, which is immediately available, rather than the ground truth, which is not. We are now looking at ways to extend this result to other types of forecasts, and figure out how to best use this intersubjective evaluation method in concert with other data to optimize forecast aggregation methods.

Forecasting in Teams

The principle of the wisdom of crowds is a cornerstone of judgmental forecasting: the collective predictions of large groups tend to be more accurate than the majority of individual members. But what about groups that work together? Results from forecasting tournaments provide evidence that forecasters working in teams outperform those who work alone, but few studies have examined this phenomenon in isolation. We are currently working on a series of studies to do just that: put together teams of forecasters of varying backgrounds and sizes and study what makes the optimal forecasting team.

Probabilistic Belief Elicitation

What is the best way to elicit probability judgments? Is it best to ask for numeric probabilities? Allow people to use sliders or bars in lieu of numbers? What about more complex graphical interfaces that interpolate continuous distributions? While at its core, this is a human computer interaction question, there is more to it than that as well. It is also a question of judgment coherence. For example, should probability judgments be restricted to proper unitarity (i.e., should they be forced to sum to 1)? This can be burdensome for subjects for complex problems, and there is evidence that “coherentizing” incoherent forecasts after they are made is a reasonable way to improve judgments. Alternatively, can we provide automated methods to make it easier for subjects to submit coherent judgments? We are designing a variety of potential improvements to methods of probabilistic belief elicitation common in the literature, and plan to test them in a series of experiments. Our tests will focus on both the accuracy of the forecasts generated, as well as the comfort, ease of use, and user satisfaction with each method.

Relevant Publications

Himmelstein, M., Atanasov, P., & Budescu, D. V. (2021). Forecasting forecaster accuracy: Contributions of past performance and individual differences. Judgment and Decision Making16(2), 323-362. https://doi.org/10.1007/978-3-031-30085-1_6

Himmelstein, M., Budescu, D. V., & Ho, E. H. (2023). The wisdom of many in few: Finding individuals who are as wise as the crowd. Journal of Experimental Psychology: General, 152(5), 1223–1244. https://doi.org/10.1037/xge0001340

Atanasov, P., & Himmelstein, M. (2023). Talent spotting in crowd prediction. In M. Seifert (Ed.), Judgment in Predictive Analytics (pp. 135–184). Springer International Publishing. https://doi.org/10.1007/978-3-031-30085-1_6

Himmelstein, M., Budescu, D. V., & Han, Y. (2023). The wisdom of timely crowds. In M. Seifert (Ed.), Judgment in Predictive Analytics (pp. 215–242). Springer International Publishing. https://doi.org/10.1007/978-3-031-30085-1_8

Ho, E., Himmelstein, M., & Budescu, D.V. (2024). A measure of probabilistic coherence to identify superior forecasters. International Journal of Forecasting. https://doi.org/10.1016/j.ijforecast.2024.02.005

Benjamin, D. M., Morstatter F, Abbas, A. E., Abeliuk A., Atanasov, P., Bennett, S., Beger, A., Birari, S., Budescu, D.V., Catasta, M., Ferrar, E., Haravitch, L., Himmelstein, M., … Galstyan, A. (2023). Hybrid forecasting of geopolitical events. AI Magazine, 44. 112-128. https://doi.org/10.1002/aaai.12085