Missing BSP data

0 votes

For many races (numbers below) some BSP data seems to be missing. Is there some bug in database?

For example this race is missing 2 bsp values

[784341] 2018-02-04 15:10:00 Musselburgh
   [1687299] #None BSP:40.79 'Tantamount' 'Tantamount'
   [2169210] #1 BSP:6.17 'Calett Mad' 'Calett Mad'
   [2131568] #2 BSP:0.00 'Cresswell Legend' 'Cresswell Legend'
   [2128031] #3 BSP:0.00 'Connetable' 'Connetable'
   [2194311] #4 BSP:12.50 'Stamp Your Feet' 'Stamp Your Feet'
   [2131576] #5 BSP:7.00 'Jump For Dough' 'Jump For Dough'
   [1835387] #6 BSP:38.00 'Knock House' 'Knock House'
   [1759745] #7 BSP:18.04 'Arthurs Secret' 'Arthurs Secret'
   [2112605] #8 BSP:17.50 'Chain Gang' 'Chain Gang'
   [1699005] #9 BSP:60.00 'Beeves' 'Beeves'
   [2125240] #10 BSP:25.00 'Some Kinda Lama' 'Some Kinda Lama'

This race is also missing 2 values (first number is ID if you want to verify)

[784343] 2018-02-04 16:20:00 Musselburgh
   [2278696] #1 BSP:0.00 'Seddon' 'Seddon'
   [2215231] #2 BSP:0.00 'Chanceanotherfive' 'Chanceanotherfive'
   [2334967] #3 BSP:11.50 'Enlighten' 'Enlighten'
   [2318595] #4 BSP:4.11 'Chateau Marmont' 'Chateau Marmont'
   [2334966] #5 BSP:45.54 'Cpm Flyer' 'Cpm Flyer'
   [2293252] #6 BSP:36.00 'Timesawaiting' 'Timesawaiting'
   [2277759] #7 BSP:4.71 'Theatre Legend' 'Theatre Legend'
   [2334968] #8 BSP:21.00 'Parker' 'Parker'

If we exclude all potential non-runners (not finished horses so this also excludes cancelled horses) then number of zero BSP values is:

select count(*) from historic_betfair_win_prices, historic_runners where bsp=0 and sf_runner_id=runner_id and sf_race_id=race_id and finish_position is not null;
+----------+
| count(*) |
+----------+
|     8645 |
+----------+

date ranges are wide so this seems to be systematic error

 select min(date), max(date) from historic_betfair_win_prices, historic_runners where bsp=0 and sf_runner_id=runner_id and sf_race_id=race_id and finish_position is not null;

+------------+------------+
| min(date)  | max(date)  |
+------------+------------+
| 2009-07-19 | 2018-02-11 |
+------------+------------+

So the question is, why are the values missing and is there a way to get them to the database? With missing values BSP data is more dangerous than useful as betting results do not reflect reality.

-------------

edit: I checked the actual CSV files from betfair (I parsed the data myself previously) and it seems that the BSP values are missing from there too. As there are so many missing BSP values and it's their own data I assume this is deliberate attempt to make it harder to create winning systems. Really annoying to be honest but betfair is annoying as hell....

----------

edit 2: (Note this data has subsequently been updated at source and therefore in Smartform)

asked Feb 13, 2018 by mikkom Plater (170 points)
edited Apr 22, 2018 by colin

2 Answers

+1 vote
If you are building a predictive model then removing them would have next to no impact since you still have a decent size sample to work from.
answered Feb 16, 2018 by PunterBot Handicapper (830 points)
0 votes
Hi,

What you have highlighted is the fact that the Betfair BSP data site of CSV files has not been kept fully up to date (so far) in 2018.  If you check these csv files you will usually find the same values are missing from these files, and that these correspond to the missing values in the tables.  The CSV site has been going since 2008, and having reviewed it regularly, we have noticed that there are often big gaps in recent BSP history that are (often but not always) filled in at a later date.  However, if you see any gaps in the database tables that do not correspond with the CSV data then let us know and we will fix them.  Send to the Smartform contact email rather than Q&A.  

Please also note that we provide this service for free and purely for members' convenience, so that BSP data can easily be joined to the Smartform data.  This means that members do not have to do it themselves.  I think you mention in your post that you used to parse this data yourself.   As we do not control the data source, we are subject to the same CSV sources being kept up to date and as such we can make no guarantees for the provision of this data.  Nevertheless, as and when the recent data becomes publicly available, we do make every effort to update it.  We've also noticed that this has started to happen for some dates in 2018, so we will be going back and updating the database tables accordingly.  

It is also worth noting, as DataScientist answered, in terms of the whole data, for the purposes of building models or even testing systems over more than a couple of months, short term gaps in BSP data are not statistically significant.
answered Feb 21, 2018 by colin Frankel (19,280 points)
...