Any tips for handling missing overseas form?

0 votes
Overseas (non-UK, non-IRE) form is not included in the data (as far as I can tell).

Are there any tips for factoring this into building models? At the moment we have gaps in form.

Presumably it's only really going to affect group one races and similarly high profile events, but these races are the most interesting ones!

It would be nice to know if anyone has a more graceful way of handling missing overseas form?

And are there any plans to include overseas form? I know it will negatively impact data integrity, but I think it would be nice to have the option to exclude such data rather than have the choice made for us.

asked Jun 12, 2021 by micjrc Plater (170 points)

1 Answer

0 votes
Best answer

Hi - a few points here:

As you say, overseas runners with no form in this country are relatively rare.  However, when they do occur there are a number of common form elements that can often be used -  in particular, trainer form in this country, sire and dam_sire form, jockey form and so on.   Usually, there is also a form string to represent the places achieved in recent races, albeit not necessarily in this country, as well as the number of days since a previous run.  Again, all of these elements are generally significant, certainly in model building, although the situation is not optimal for these particular runners.  

Taking the example of the US runner Kaufymaker on the first day of Royal Ascot 2021, we have the following:


Trainer: Wesley Ward Owner: Mr Gregory Kaufman Ridden by: John Velazquez Cloth number: 17 Stall number: 3

Age: 2 (b. 2019) Colour: ch Gender: f Weight: 8-12 Bred: USA

Dam: Heaven's Touch (b. 2010) Sire: Jimmy Creed (b. 2009) Dam Sire: Montbrook (b. 1990)

Form: 1 (FlatSix) Forecast price: 5/2 Days since ran: 61

Thus the only form missing are the details of the race that Kaufymaker won when running previously at Keenland.  - The other form elements are all useable.

On the question of inclusion of race form from other countries, this means licensing data from other countries.  All Smartform data for all racing in the UK and Ireland is licensed from official sources for the personal use of Smartform subscribers - however, it's simply not viable to do so for other countries given limited demand as the cost can approach six figures or more for full history and ongoing updates, depending on usage.  An alternative would be to webscrape data from overseas racing sites, but this is not somethiing we would advocate as it is usually outside terms of use and may contravene local licences.

answered Jun 14, 2021 by colin Frankel (17,950 points)
selected Jun 15, 2021 by colin