Detailed work on correcting database inconsistencies

0 votes
I have been doing some more detailed work on correcting database inconsistencies.  Because I made a couple of errors and had to reload the database from scratch each time, I thought it best to try to write a script to automate the corrections. The good news is that 536 errors in historic_races can be scripted. See the # comments in the script below for details.  The bad news is that 458 races remain to be corrected all of which look to be Lingfield mixed meetings i.e. meetings with a mix of Flat and All Weather Flat races.  The only solution I can see is painstakingly going through each individual race, cross referencing with an alternative data source, and categorising each as wither Flat or All Weather Flat, setting race_type, race_type_id, all_weather and going appropriately.  My plan os to build a spreadsheet, row per race, fill in the details, save as a CSV, and write a script to import the corrections into the database.  But I'm asking here first because it will be quite a lengthy manual task, in case there is a better way.

Please feel free to run the script in your own databases, although you may wish to hold fire until I have the full 100% solution that handles Lingfield mixed meetings.  And it goes without saying it would be great if my work could be reviewed by the Betwise team and perhaps the master copy of the database corrected.

USE smartform;

SET SQL_SAFE_UPDATES = 0;
# Correct flat races incorrectly tagged as all weather (count=24 on 17 July 2016)
UPDATE historic_races
    SET race_type = "Flat", race_type_id = 12, all_weather = 0
    WHERE race_type_id = 15
    AND going NOT IN ("Slow", "Standard", "Standard to Fast", "Standard to Slow")
    AND course NOT IN ("Dundalk", "Kempton", "Lingfield", "Southwell", "Wolverhampton");
# Correct all weather races incorrectly tagged with flat going (count=412 on 17 July 2016)
UPDATE historic_races
    SET going = "Standard", all_weather = 1
    WHERE race_type_id = 15
    AND going NOT IN ("Slow", "Standard", "Standard to Fast", "Standard to Slow");
# Correct all weather races (excluding Lingfield) incorrectly tagged as flat (count=180 om 17 July 2016)
UPDATE historic_races
    SET race_type = "All Weather Flat", race_type_id = 15, all_weather = 1
    WHERE race_type_id = 12
    AND going IN ("Slow", "Standard", "Standard to Fast", "Standard to Slow")
    AND course IN ("Dundalk", "Great_Leighs", "Kempton", "Laytown", "Newcastle", "Southwell", "Wolverhampton");
# Correct Lingfield races that are incorrectly tagged.  Note Lingfield often holds meetings with a combination of flat and all weather flat races
# Argh cannot do these without correct data on the individual races.
# There are 458 Lingfirld races with race_type_id = 12 and going = Standard.
# Some of these should be Flat, some should be All Weather Flat. Original confusion looks to have been mixed meetings.
# I looked at the all_weather flag - sometimes right sometimes wrong seemingly with no reliable pattern.
SET SQL_SAFE_UPDATES = 1;

If after running the script you wish to list all remaining races that need work - those Lingfield mixed meeting races, try this:

select * from historic_races where race_type_id = 12
            AND going  IN ("Slow", "Standard", "Standard to Fast", "Standard to Slow")
            AND course NOT IN ("Dundalk", "Great_Leighs", "Kempton", "Laytown", "Newcastle", "Southwell", "Wolverhampton");

If you run this select without having first run the script above you may get more than 458 races - not 100% sure.

This cures the oddities previously mentioned with all weather races at Nottingham, Brighton, Newbury, etc of which there should be none.

After sorting this will come Newmarket splitting (July & Rowley) and Ascot merging (normal and Royal).  Not sure yet whether to do that with database views (will slow down queries using them) or just script patching up the database as per above.  All will become clear when I dig into the details ...
asked Jul 22, 2016 by hyoung Novice (410 points)

1 Answer

0 votes
Its hard to work it out of course without knowing what the actual fact for each race is
 
BUT
 
If we assume that an all weather Ling races should have this data 
d.race_type = 'Flat
d.track_type = 'Polytrack' 
d.advanced_going = 'Standard'
h.race_type_id = 15 
h.race_type = 'All Weather Flat' 
h.going = 'Standard' 
h.all_weather = 1
 
This will show all those who aren’t
 
select race_id , d.course,d.race_type,d.track_type,d.advanced_going,
h.course, h.race_type_id,h.race_type,h.going,h.all_weather
from daily_races d join  historic_races h using (race_id)
where ( d.course = 'Lingfield'  or h.course = 'Lingfield')
and race_type_id not in(1,13,14)
and
(d.race_type <> 'Flat' or d.track_type <> 'Polytrack' or d.advanced_going <> 'Standard'
or h.race_type_id <> 15 or h.race_type <> 'All Weather Flat' or h.going <> 'Standard' or all_weather <> 1) ;
 
Then we have to start adding some more where conditions
 
Like this, gives us 568 which don’t conform 
 
select race_id , d.course,d.race_type,d.track_type,d.advanced_going,
h.course, h.race_type_id,h.race_type,h.going,h.all_weather
from daily_races d join  historic_races h using (race_id)
where ( d.course = 'Lingfield'  or h.course = 'Lingfield')
and race_type_id not in(1,13,14)
and
(d.race_type <> 'Flat' or d.track_type <> 'Polytrack' or d.advanced_going <> 'Standard'
or h.race_type_id <> 15 or h.race_type <> 'All Weather Flat' or h.going <> 'Standard' or all_weather <> 1)
and  d.track_type = 'Polytrack';
 
answered Jul 27, 2016 by Nick14 Handicapper (640 points)
You're joining daily_races and historic_races? Surely that only works for the days since you started running the database because you'll only have collected daily_races entries since you started the database rather than historic_races which goes back to the year dot?  I think I have to laboriously go through hundreds of races and build a table of a/w vs flat from an alternative data source then use that to correct my copy of the database, if I want the data 100% which I do.  Then I should keep that correction table updated with new mixed meetings so I can fully automate the correction in the event that I ever need to reload the database.
...