-
Notifications
You must be signed in to change notification settings - Fork 2
Prediction
- find and use a prediction framework for online analysis
- select a few events to predict and identify features good for prediction
- make real time predictions
- try to maximize prediction accuracy (find out whats possible)
"Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes." http://www.cs.waikato.ac.nz/ml/weka/
"MOA is an open source framework for data stream mining. It includes a collection of machine learning algorithms (classification, regression, and clustering) and tools for evaluation. Related to the WEKA project, MOA is also written in Java, while scaling to more demanding problems." http://moa.cms.waikato.ac.nz/
Will the next pass of the player with the ball fail or succeed?
- pass successful
- pass missed
For this prediction only the current state of the game is used.
Will the attack of the team result in ball loss, ball out of bounds or shot on goal?
- shot on goal
- ball loss
- ball out of bounds
This prediction is more complex. Not only the current state of the game is used, but also events since the last prediction event are taken into account (e.g. passes count during an attack).
The code structure is designed for easy integration of different classifieres and predictions.
Basic model to understand the structure:
###Prophet The prophet updates all predictors. He is called every second of playtime from the statistics project. New predictors can be added here.
###Learner The learner encapsulates a classifier which can be trained and which makes the predictions.
###Predictor The predictor checks if the a prediction event has occurred and in that case trains the learner. Otherwise a prediction will be done. The predictor can use any learner available.
###PredictionInstance The prediction instance class creates and handles a prediction instance for prediction or train purposes. Attributes (features) for the prediction are defined here.
###Statistics The statistics project asks the prophet to update himself and provides the game information for prediction instance creation.
####TEAMMATE_IN_AREA Number of team mates in in 20 meters circle. ####OPPONENT_IN_AREA Number of opponents in in 20 meters circle. ####PLAYER_PASS_RATE Rate of successful passes to all passes of a player.
(playerPassesSuccessful / (playerPassesSuccessful + playerPassesMissed)) * 100
####PLAYER_BALLCONTACT
Sum of ball contacts of a player.
####LAST_PLAYER_ID
ID of a player made a pass. (unique for every game)
####CURRENT_PLAYER_ID
ID of a player accepted a pass. (unique for every game)
####DISTANCE_TO_NEAREST_TEAMMATE
Distance to nearest team mate.
####DISTANCE_TO_NEAREST_OPPONENT
Distance to nearest opponent.
####CURRENT_PLAYER_X
X position of a player.
####CURRENT_PLAYER_Y
Y position of a player.
####CURRENT_PLAYER_DISTANCE
Accumulated run distance of a player.
####ATTRIBUTE_AREA
The area a player is in, can be own area, middle area or opponent's area.
####ATTRIBUTE_PASS_COUNT
Sum of passes occurred during attack.
####ATTRIBUTE_AVERAGE_VELOCITY
Average velocity of the ball in opponent direction in the last seconds.
averageOfAll(abs(currentBallYPosition - lastBallYPosition) / (currentGameTime - lastGameTime)))
In pattern recognition, the k-nearest neighbor algorithm is a method for classifying objects based on closest training examples in the feature space.
IBk is a lazy classifier in Weka, which is based on Knn. The adventage of the classifier IBk in comparisoin to the Knn classifier in Weka is the method updateClassifier(Instance instance), which update the classifier without creating a new model.
more Information (class description)
- KNN: the number of nearest neighbors to use for prediction
- Cross Validation: the test technique to select the best k value during training (adaptive KNN)
The predictions are preiodically send to the visualization project for visualization.
For testing with Weka and MOA an ARFF file can be created at the end of the game. To enable the creation of the ARFF file a boolean has to be set to true in predictions.Utils:
public static final boolean ARFF_WRITING_MODE = true
The ARFF file will be created in a logs folder in the predictions project root folder.
Best results were achieved with the IBk classifier. Tests have shown that many features including less promising ones do improve the accuracy. Therefore we have used more than ten features for both predictions.
Accuracy (average of whole game): 85%*
Distribution: 73% pass successful, 27% pass missed
Number of instances: 827
Parameters: IBk classifier, KNN adaptive, Linear Nearest Neighbor Search with Euclidean Distance
* In the project a lower accuracy is measured, because a fixed KNN value is used for better visualisation possibilities.
Development of accuracy depending on increasing training set size:
The accuracy starts at a high level, since most passes succeed at the beginning and therefore classification is easy. The overall high level of accuracy can be explained by the same reason: 73% of all passes are successful. At the end a slightly decreasing accuracy is noticeable.
Accuracy (average of whole game): 84%
Distribution: 69% ball loss, 24% ball out of bounds, 7% shot on goal
Number of instances: 332
Parameters: IBk classifier, KNN = 8, Linear Nearest Neighbor Search with Euclidean Distance
Development of accuracy depending on increasing training set size (lower accuracy because other prediction method used):
The accuracy increases during the whole game.
With the sensor data and the statistics data we were able to get a high accuracy for predictions. If the good accuracy values remain when using other football games has yet to be tried. A big problem will be the different playing styles of other players and teams. Also the pass success prediction has shown that the accuracy can decrease after a while. In that case the instance set size has to be limited or other classifiers have to be used.
The wish to predict goals could not be fulfilled, because goal events are too rare to be able to train a classifier properly. But with frequently occuring events like player passes good accuracies have been achieved already after some minutes of playtime. The attack result prediction improved during the whole play.