Thank you for visiting the MLB Draft Predictor. The objective here is to understand relationships between college and MLB performance in order to model and predict value in the MLB amateur draft. Use the tool below to interactively do this, and also please first check out the other pages (in order from left-to-right) to better understand how this predictor was created and what the results mean.

Below, input a player's NCAA D1 statistics on the left to output his projected MLB draft position and value on the right. Hitter and pitcher input fields are pre-loaded with stats from the 2010 and 2011 12th overall picks: Cincinnati Reds catcher Yasmani Grandal (formerly University of Miami) and Milwaukee Brewers pitcher Taylor Jungmann (formerly University of Texas).* MLB outputs are color-coded to a range of six current and former major-leaguers to provide a reference tool. For example, the current output shows Grandal drafted higher than Payton, Papelbon, Kinsler, and Sipp and predicts his major league value will be greater than all but Griffey and Kinsler.

Has this player been drafted?
Overall pick number:
Team's strength of schedule:
Team's total park factor:
Hitter or pitcher?
At bats:
Home runs:
Stolen bases:
Sacrifice flies:
Sacrifice hits:
Innings pitched:
At bats against:
Hits allowed:
Doubles allowed:
Triples allowed:
Home runs allowed:
Walks allowed:
Hit-by-Pitches allowed:
Sacrifice Flies allowed:
Sacrifice Hits allowed:
Runs allowed:
Earned Runs allowed:
Overall pick number WAR% of draft class
Your Player 12 ???
Ken Griffey, Jr. 1st 8.18%
Paul Maholm 8th 2.59%
Jay Payton 29th 2.29%
Jonathan Papelbon 114th 4.39%
Ian Kinsler 496th 6.15%
Tony Sipp 1333rd 0.69%


In certain situations, why does hitting more home runs (for example) lead to lower predicted values?

By themselves, homeruns may have a positive association with pick number and WAR (see scatterplots and correlation table). However, in the context of a multiple regression that takes into account other variables, certain variables may indicate a counter-intuitive direction in order to account for as much variability as possible. For instance, in predicting a 2002-06 hitter's WAR%, players who hit for power but struggled to reach base did not do as well as those who did not.

In certain situations, why does strength of schedule (for example) have no effect on predicted values?

Again, such variables may hold relationships with pick number and WAR by themselves. However, these variables may have no significant effect in the context of other variables that can explain the overall relationship much better, and thus are excluded from the multiple regression.

What predictions are most likely to be accurate?

Compare the R-squared values of the nine-different regressions to see which scenarios (overall pick number vs. WAR% of draft class, hitters vs. SP vs. RP, overall pick number known vs. unknown) have had the most accurate models on historical data. Within each regression, look at the scatterplot of residuals vs. predicted values to see which players the models got right and wrong. For regressions in which the players are clustered more vertically (instead of horizontally) around the residuals line, these models are more likely to project extremely high or low inputs towards average outputs in order to minimize error. Keep in mind that because overall pick number and WAR% of draft class have non-linear relationships -- and logs are used -- an extremely high pick number of WAR% output requires a non-linear increase in inputs. And of course, actual season inputs from 2002-06 that meet the playing time requirements (50 AB and 15 IP) will provide the most realistic outputs.