How to capture a data subset

0 votes
When I run this query to get information on Mark Johnston's horses in say the last 30 days, I get 12 results over three days (up until 21-Aug-22):

SELECT meeting_date, race_type, trainer_id, finish_position, starting_price_decimal
from historic_runners, historic_races
where historic_runners.race_id = historic_races.race_id
and meeting_date >= "2022-08-19" and race_type = "Flat" and trainer_id = 1883 and starting_price_decimal > 0

Can you please help me where I might be going wrong?

Thanks in advance.

PS - I use two tables because I have other variables in the SELECT statement that I also want to retrieve at the same time.
asked Sep 19, 2022 by jnestor3 Novice (360 points)
edited Sep 19, 2022 by jnestor3

1 Answer

0 votes
You're not going wrong in so far as the query construction is concerned.  However, since trainers have been allowed to start adding spouses and children etc to their trainer_name designation, there have been issues with various source data providers tracking these changes correctly in so far as using a consistent ID number for all trainer name combinations past and present.  In this case it seems you have uncovered one such bug.  In these cases it's necessary to check the name of the trainer to be sure.

Thus:

SELECT meeting_date, race_type, trainer_id, trainer_name, finish_position, starting_price_decimal from historic_runners join historic_races using (race_id) where meeting_date >= "2022-08-19" and race_type = "Flat" and trainer_name regexp "Johnston" and starting_price_decimal > 0;

returns all horses trained by the combination.

You can see the trainer_id is NULL since August 24th which is why the query did not return all results.

If we check daily_races we can see why the issue may have occurred:

SELECT meeting_date, race_type, trainer_id, trainer_name from daily_runners join daily_races using (race_id) where meeting_date >= "2022-08-19" and race_type = "Flat" and trainer_name regexp "Johnston";

If you look at the name of this combination since August 24th it has been abbreviated to C&M Johnston instead of Charlie & Mark Johnston.  The correct ID has been applied with the declarations but not with the results.  We've reported the issue to our source data providers and a fix should be implemented shortly - however, it's as well to be aware of these issues (trainer names changing for the same training entity) since it's possible that discrepancies are introduced to trainer IDs when there are changes.  From time to time we do our own clean ups and apply these to the "beta" product and you can expect the ID issue in this particular case to be fixed directly in the data in a future download.  In the vast majority of cases, all IDs are correctly matched to varying combinations of trainer naming entities, despite the recent changes to trainer representations in the racecard, see for example:

SELECT distinct trainer_id, trainer_name from historic_runners join historic_races using (race_id) where race_type = "Flat" and trainer_name regexp "Gosden";

However, if you are looking  to be precise in analysis of all runners on one trainer, it pays to look at varying trainer name combinations directly, to be sure that all combinations are in fact captured.
answered Sep 20, 2022 by coltest Listed class (2,780 points)
edited Sep 20, 2022 by coltest
Much appreciated. Thank you.
...