0 votes
I've built a predictive model using machine learning but the data I have used does have some gaps in it so I am considering purchasing the historical data.  Can I load the data into R using an ODBC connection?

My model also produces a set of tables for the daily predictions, am I able to write these back to the database?

Also will I be able to use this data on Amazon Web Services so I can scale things up?

thanks
in Smartform by PunterBot Handicapper (830 points)

1 Answer

0 votes
 
Best answer

Hi,

Yes - R is a great tool for preprocessing Smartform data and building models etc.  

Regarding using R and Amazon etc, from a license point of view this is fine as long as the data is for your personal use and there is no data transfer or resale of data.  You can also make 2 copies of the database (eg. locally and on your own AWS instance) provided that they are both under your control / only accessible by yourself.

From a techical point of view, using RMySQL is probably the best library to use, in order to create an R dataframe that you can manipulate further.  Here's the start of a recent script as an example:

 

#load library

library(RMySQL) 
 
#clean up any open connections
all_cons <- dbListConnections(MySQL()) 
for(con in all_cons) {dbDisconnect(con)} 
 
con <- dbConnect(RMySQL::MySQL(), dbname = "smartform", user="smartform")
 
horse_query  <- paste("select name, race_id, runner_id, race_type_id, race_type, meeting_date, course, distance_yards, distance_behind_winner, starting_price, starting_price_decimal, prize_money, class, trainer_name, jockey_name, jockey_id, trainer_id, finish_position, unfinished, coalesce(finish_position, unfinished) AS 'result', num_runners, going, standard_time_secs, winning_time_secs, stall_number, tack_hood, tack_visor, tack_blinkers, tack_eye_shield, tack_eye_cover,tack_cheek_piece, tack_pacifiers, tack_tongue_strap, coalesce(tack_visor, tack_cheek_piece, tack_eye_cover, tack_eye_shield,  tack_hood, tack_blinkers) AS 'headgear', historic_runners.private_handicap, historic_runners.private_handicap_type, from historic_runners join historic_races using (race_id) order by meeting_date DESC;", sep="'")
 
smartform <- as.data.frame(dbGetQuery(con, horse_query))

 

by colin Frankel (19.7k points)
Thanks for the quick reply.
Yes it's all for my own use, I use AWS mainly because the data I have at the moment is 200M+ rows and it's cheap.
...