What's a good way to derive speed ratings using the Smartform Racing Database?

0 votes

I would like to derive speed ratings for horses in the Smartform Racing Database.  By speed rating, I mean something akin to the AVESPRAT variable discussed in SEARCHING FOR POSITIVE RETURNS AT THE TRACK: A MULTINOMIAL LOGIT MODEL RUTH N BOLTON; RANDALL G CHAPMAN, Management Science (1986-1998); Aug 1986; 32, 8; ABI/INFORM Global, pg. 1040.  AVESPRAT is the average speed rating for the last four races of each horse.  Quoting from the paper, "A speed rating for a horse compares its time in a race with the track record for that distance.  The track record is assigned a value of 100 and a point is deducted for each one-fifth of a second that the horse's time is below that mark.  The horse's raw speed is then adjusted by a factor to equate the track records at the various tracks used in this study, to attempt to account for differences in tracks.  Thus, speed rating has been transformed to be comparable across tracks."

My initial ideas are:

  1. To compute "track record", I thought to find the minimum of winning_time_secs grouped by course, race_type, distance_yards, and going.  Since this level of grouping may be too granular, I thought to collapse groups along the 'going' dimension by making some sort of "on average" adjustments up/down to winning_time_secs.
  2. The database doesn't have the finishing times of the other (non-winning) horses.  So instead I'm going to estimate their finishing times by multiplying distance_behind_winner by implied speed of winner.  I suppose I would make the same "on average" adjustments up/down according to 'going'.
  3. These two steps should give me the horses' raw speeds.  I haven't thought through how I would normalise across different distances, race types, or courses.

Does this sound like the bones of a sensible approach?  Is there a better, more straightforward method?

asked May 15, 2011 in Smartform by gillpa Handicapper (730 points)
edited May 15, 2011 by gillpa

2 Answers

+1 vote
 
Best answer
I am a big fan of speed and so would certainly suggest that speed is something you want to look at in database handicapping. However you do need to bear two things in mind when reading papers by Chapman, Bolton, Ziemba etc... is that they are academic and often American,

Because they are academic they don't take into account the reality of betting and American means they have very different data to create their figures with.

When creating speed figures in the UK if you haven't read it yet I would suggest Nick Mordin's book Mordin On Time, it is old now but a very good introduction to creating speed figures by hand in the UK.

When you are building automated speed figures you need to take this step first:

Make sure the finish times are accurate.

You will not always get accurate finish times in the UK and so you need to put a filter on your data to make sure you aren't getting any runners finishing in supersonic times and throwing off your figures.

The standard times is what is going to make your figures work or fail and if you are using the official ones then I would suggest you run everything through a check to make sure that what you are getting is accurate. The first step in any data analysis is to run your data through integrity checks before analysing. To be perfect you should create your own standard times, there are numerous ways of doing this and some are better than others but it is going to involve a lot of work, if you don't want this work to start with then I would suggest using the ones available.

You then compare your race with the standard times which will enable you to calculate the going allowance. This going allowance will bring the finish time back into line with the standard whatever the going. The traditional way in the UK is to then calculate the winners speed rating and derive the other horses speed ratings from the winner based  on the distance behind. Personally I actually estimate the other horses finish times and calculate it based on that.

That is the basics, there are also other methods which are more scientific than the traditional one but all of them require some form of manual tweaking to be done completely automatically with a significant amount of integrity checks along the route to make sure that mistakes aren't being made as these can be very costly.
answered May 16, 2011 by raceadvisor Handicapper (630 points)
0 votes

I agree with raceadvisor in that the best book to acquire the basics of compiling your own speed ratings for racing in the UK and Ireland is Mordin On Time

Once you have that or any other method established, you'll want to use a certain subset of fields in Smartform to apply that methodology (or an adapted one that you have come up with).  The key fields, as well as winning_time_secs, are also distance_behind_winner, going, weight_pounds, distance_yards and standard_time_secs. 

Standard times are themselves a subjective measure, but a critical element in compiling speed ratings.  Smartform gives these, generally for every race, or you can calculate your own, which Mordin touches upon.  The distance_behind_winner field for each horse in each race gives you the number of lengths behind the winner, which is critical in calculating a time or rating for the beaten horses.  Mordin produces various methods (which of course you can adapt) showing how to rate beaten horses according to the distance by which they were beaten.  Generally distance beaten translates into a certain time difference, depending on distance of the race, going and so on.   Below is a query from the Smartform command line that gives you the full subset of raw data that you are likely to work with:

>select scheduled_time, course, going, weight_pounds, winning_time_secs, standard_time_secs, distance_yards, name, distance_behind_winner, finish_position from historic_races join historic_runners using (race_id) where race_id=291567 order by distance_behind_winner;

Above, the where condition with the race_id at the end of the query specifies one race, which happens to be the Epsom Derby in 2010, but of course this can be adapted for any race, day or date range etc.

 

answered May 16, 2011 by colin Frankel (19,280 points)
edited May 18, 2011 by colin
...