Has anyone noticed data integrity issues with the database data?

0 votes
Just started using the Smartform database, I'm starting simple by looking at winning times. There seem to be many instances where the values are just fantasy... For example 'SELECT * FROM historic_races WHERE distance_yards = 1100 AND winning_time_secs > 80'... I'd say there are few (if any) instances where such a short distance has been run in such a slow time and reconciling some random rows from the above results with the Racing Post confirms this.
asked May 25, 2013 in Smartform by greggers Novice (200 points)
edited Jun 3, 2013 by colin

1 Answer

0 votes
mysql> SELECT count(race_id) FROM historic_races WHERE distance_yards = 1100 AND winning_time_secs > 80;
+----------------+
| count(race_id) |
+----------------+
|             47 |
+----------------+
1 row in set (0.21 sec)
 
mysql> SELECT count(race_id) FROM historic_races WHERE distance_yards = 1100 AND winning_time_secs < 80;
+----------------+
| count(race_id) |
+----------------+
|           7637 |
+----------------+
1 row in set (0.29 sec)
 
There are some data anomalies within these winning times (mostly 2008/09), but it's still a fraction of the data.  Nonetheless we'll be looking to clean these and other exceptions/ inaccuracies with a data refresh later in the year - all the data will be validated with original sources first.
 

 

answered May 25, 2013 by colin Frankel (19,280 points)
...