Sentiment Analysis - Good sources

0 votes
I want to do some sentiment analysis to try and capture things on the day that my current models aren’t aware of.

Right now I form predictions around 11am once I’ve got the non-runners and the current prices but then I’d also like to capture other info that may indicate a horse isn’t looking great or there is a really strong favourite/runner that’s not reflected in the stats.

Is there any good sources that I can get the info from? I know there is a ton of places to look but I want something that’s generally reliable so I can factor it into my models.

I’m going to be doing this in R which I don’t think poses any limitations on it.
asked May 11, 2018 by DataScientist Handicapper (530 points)

1 Answer

0 votes
Probably the best sentiment analysis is the market before the race, since a horse that looks well or doesn't look well (or is misbehaving) is reflected by the market before the off time, and of course new or strong favourites are the price... as such, it's possible to extract prices from any Betfair market on a fairly high frequency, as was the subject of Automatic Exchange Betting (although this book is now out of date regarding the current Betfair API).  Still, you can use any number of libraries to do this - including the R library by Phill Clarke (Betwise blogger), as mentioned here:  https://answers.betwise.net/524/abettor-api-ng-package-for-r

Any significant drift in prices from the opening price to the off time, or likewise contraction in price, is often significant.

Of course, you have to programmatically determine what is significant as opposed to "noise" - that's a challenge, since often will depend on the whole market rather than one runner.  ie. a significant market move for one horse will cause others to drift, not necessarily indicating that there are negative sentiments against some of the drifters.  

But live price analysis is definitely the way to go.  

A cruder but perhaps more useful tool would be to look at changes in bookie prices before the off - there are fewer movements in price here, so less noise to worry about.
answered May 15, 2018 by colin Guru (12,370 points)
I guess I was hoping for someone on Twitter or something I can get online so I can do some sentiment analysis and add it as an additional step in my selection process.  I am trying to narrow down the number of races in my pool to something more manageable, I can pick a lot of winners in a day from the total number of races but I am fairly sure some of them are a dead loss and should be avoided, I just need some kind of flag to help indicate them.

I've already got a process to get the Betfair prices so I will give this a try and also try the bookmakers also
The live news feeds per race meeting from the Racing Post are the only other source I'd suggest if you want to process text news.  

Though personally I'd prefer price movements.

What do you have in mind in suggesting eliminating races that are a dead loss?  It's easy enough of course to steer clear of certain race types (maidens, sellers, etc).
I currently have a model to predict the probabilities of each horse winning the race, so from that I can essentially reduce the total number of horses from say 300 down to 30 i.e. the top horse in each race.
Then I want to apply a second stage algorithm, maybe a random forest that will use things that I don't include in the model i.e. sentiment analysis to narrow the 30 down to a select few.
Picking the 'top 2' each day is producing around 30 points profit per month, but I know I pick a lot more winners in a day - a bad day is 5 a good day has been 18.


I like the idea of using the price though, at the minute I select my 'top 2' at 11am and email them to myself but ideally I'd like to wait until just before the race and then make an assessment factoring in the price, the assessment has to be done programmatically though so it has to be data driven.
...