0 votes

I'm trying to return jockey wins/runs for every row in the historical_runners_beta table, on a rolling basis. The idea is that eventually I will insert them into their own table and use a join. Here is the terrible query I've come up with:

select
   race.scheduled_time,
   rnr.jockey_id,
   rnr.jockey_name,
   (
      select
         count(*)
      from
         historic_runners_beta past_rnr
         join
            historic_races_beta past_race using (race_id)
      where
         past_race.off_time < race.off_time
         and datediff(race.off_time, past_race.off_time) <= 14
         and rnr.jockey_id = past_rnr.jockey_id
   )
   as runs_14_day,
   (
      select
         sum(if(past_rnr.finish_position = 1, 1, 0))
      from
         historic_runners_beta past_rnr
         join
            historic_races_beta past_race using (race_id)
      where
         past_race.off_time < race.off_time
         and datediff(race.off_time, past_race.off_time) <= 14
         and rnr.jockey_id = past_rnr.jockey_id
   )
   as wins_14_day  -- same for 1 year
from
   historic_runners_beta rnr
   join
      historic_races_beta race using (race_id)
where
   race.off_time >= '2004-01-01'   -- also need to filter non-runners/reserve/unknown jockeys
   limit 10    -- temp limit since query takes so long

As you can imagine, the query takes an absolute age, even after adding an index to the jockey_id and off_time columns. I tried to achieve the same result using a window function but to no avail. Does anyone have any suggestions for a better way to achieve this? Thanks

in Smartform by bkw1491 Plater (160 points)

2 Answers

+1 vote
 
Best answer

Hi, give this a try, it should give you some ideas. Here is a method to get 7, 14 and 30day stats for jockeys with rides today.

I prefer to use sub tables and then join together rather than one big query

  • Create a table of distinct jockey_id and jockey_name with rides today
  • Create table historic data for jockeys (I restricted it to 3 years)
  • Create tables of stats for various periods, here is create for 7, 14 and 30days
  • Join the stats tables to the original table of jockeys with rides today

-- create a table of all jockeys with rides today

drop table if exists jockeys_today;

Create  table jockeys_today

select distinct  jockey_id  , jockey_name FROM daily_runners hru

JOIN daily_races hra USING (race_id)

WHERE meeting_date = CURDATE();

-- create a subset table (last 3 years) with  the table of jockeys with a ride today with historical data

drop table if exists jockeys_alldata;

Create  table jockeys_alldata

SELECT hra.meeting_date, hru.jockey_name , hru.jockey_id,      hru.finish_position

FROM jockeys_today jtd

INNER JOIN historic_runners hru using(jockey_id)

INNER JOIN historic_races hra USING (race_id)

WHERE meeting_date >= ADDDATE(CURDATE(), INTERVAL -3 Year)

and ifnull(in_race_comment,'None') <> 'Withdrawn'

and ifnull((unfinished),'None') <> 'Non-Runner'

and starting_price_decimal IS NOT NULL;

-- create a subset table (last 7 days) of jockeys_alldata

drop table if exists jockeys_7d;

Create  table jockeys_7d

SELECT jockey_name as jockey, jockey_id, COUNT(*) as Runs,

SUM(CASE WHEN finish_position = 1 THEN 1 ELSE 0 END) as Wins,

ROUND((SUM(CASE WHEN finish_position = 1 THEN 1 ELSE 0 END) / COUNT(*)) * 100) as Perc

FROM jockeys_alldata

WHERE meeting_date >= ADDDATE(CURDATE(), INTERVAL -7 DAY)

GROUP BY jockey_name, jockey_id;

-- create a subset table (last 14 days) of  jockeys_alldata

drop table if exists jockeys_14d;

Create  table jockeys_14d

SELECT jockey_name as jockey, jockey_id, COUNT(*) as Runs,

SUM(CASE WHEN finish_position = 1 THEN 1 ELSE 0 END) as Wins,

ROUND((SUM(CASE WHEN finish_position = 1 THEN 1 ELSE 0 END) / COUNT(*)) * 100) as Perc

FROM jockeys_alldata

WHERE meeting_date >= ADDDATE(CURDATE(), INTERVAL -14 DAY)

GROUP BY jockey_name, jockey_id;

-- create a subset table (last 30 days) of  jockeys_alldata

drop table if exists jockeys_30d;

Create  table jockeys_30d

SELECT jockey_name as jockey, jockey_id, COUNT(*) as Runs,

SUM(CASE WHEN finish_position = 1 THEN 1 ELSE 0 END) as Wins,

ROUND((SUM(CASE WHEN finish_position = 1 THEN 1 ELSE 0 END) / COUNT(*)) * 100) as Perc

FROM jockeys_alldata

WHERE meeting_date >= ADDDATE(CURDATE(), INTERVAL -30 DAY)

GROUP BY jockey_name, jockey_id;

-- Join the table of todays jockeys with the 7d, 14d and 30d tables, sort by 7d perc descending

Select jockey_id , jockey_name,

ifnull(j7d.Runs,0) as Runs7d,  ifnull(j7d.Wins,0) as Wins7d, ifnull(j7d.Perc,0) as Perc7d,

ifnull(j14d.Runs,0) as Runs14d,  ifnull(j14d.Wins,0) as Wins14d, ifnull(j14d.Perc,0) as Perc14d,

ifnull(j30d.Runs,0) as Runs30d,  ifnull(j30d.Wins,0) as Wins30d, ifnull(j30d.Perc,0) as Perc30d

from jockeys_today jtd

LEFT OUTER JOIN  jockeys_7d j7d using(jockey_id)

LEFT OUTER JOIN  jockeys_14d j14d using(jockey_id)

LEFT OUTER JOIN  jockeys_30d j30d using(jockey_id)

order by j7d.Perc desc;

by Nick14 Handicapper (640 points)
selected by bkw1491
Thanks for the in-depth answer Nick. I think I'll take a similar approach- I'll try creating a temporary table with all runs in the database for a particular jockey, then query that table using a window function to get the runs/wins over each time period. That's also given me a few ideas to better trap non-runners. Thanks for the help
0 votes
You may find the approach to SR calculation in this blog post helpful:

https://blog.betwise.net/2022/05/20/trainers-intentions-jockey-bookings/

I'm not sure if that's faster than the method you give above, but it's certainly fast enough for my purposes (circa 30-60s) because I build the resulting table daily. If you need something per click, I advise a pre-computed temporary table that you regen daily so that the query resulting from a mouse click is a simple a possible. You can then set up a stored procedure and scheduled/cron job to run it daily after the database update has completed.
by hyoung Novice (410 points)
...