Friday, March 30, 2012

Predicting player win over a period of time

I would like to create a simple regression equation to predict player win on their next trip. I have tried to create the model using a linear regression tree based on two players (as a test). The result gives me a single node (expected) with only a coefficient instead of a regression equation. I can do this math by hand to get a regression equation and predicted value for the next trip for each player.

The dataset I used for a simple test is.....

Trip #

Player

Win

1

1001

1,250

1

1002

50

2

1001

1,450

2

1002

75

3

1001

1,600

3

1002

100

4

1001

2,000

4

1002

175

I also tried to predict next trip worth using a forecasting model. I was able to process the model but I was not able to browse the model content in the viewer.

Ultimately, I want to predict next trip worth for individual players off of a cube. The cube has about 1.5- 6M records (multiple records per player) depending on the datasource.

FYI - I have created a working linear regression and a forecasting model off of a cube I think I am setting it up correctly.

Can you provide how the mining structure/model column for the linear regression are set up? More specifically,

1. The datatype for each column in the structure

2. The attribute (Predict, PredictOnly, Input, Ignore) set on each of the model column

Thanks

Shuvro

|||

The variables are all set with a numeric datatype in the table

The attribute for each column:

actual continuous, predict

trip -- continuous, input

win continuous, predict

player account number -- discrete, key

Basically, I want to run predict worth at the player level (return a regression equation for each player) over a large list of players. I am not trying to get one regression equation for the whole universe of players.

In the time series model - is there a limit to the number of unique cases that can be inputed into the model?

Thanks so much for your help.

|||

You can have pretty much any number of series for a time series model, I'm not sure it will help you in this circumstance.

For example, you likely don't want to cross-predict between players. For example you may want a model such as

CREATE MINING MODEL PlayerModel
(
Player TEXT KEY,
Trip LONG KEY TIME,
Win LONG CONTINUOUS PREDICT,
Actual LONG CONTINUOUS PREDICT
) using Microsoft_Time_Series

This would create a time series model that contained models for each player for Win and Actual. However, since they are marked "Predict" rather than "Predict Only" Player A's "Win" values could influence Player B's "Actual" values if the data happened to line up that way.

You would think that you could simply make a model like this

CREATE MINING MODEL PlayerModel
(
Player TEXT KEY,
Trip LONG KEY TIME,
Win LONG CONTINUOUS PREDICT ONLY,
Actual LONG CONTINUOUS PREDICT ONLY
) using Microsoft_Time_Series

This causes the time series to only be based on their own valus, and not of others so this cross-predict thing is not an issue. However, you probably want Actual to influence Win for an individual player, just not other players. The modeling scheme for this situation is to make a seperate model for each player.

Similar things would happen if you tried other regression models such as decision trees, logistic, or linear regression. Your model structure would look like

CREATE MINING MODEL PlayerModel
(
Trip LONG KEY,
Players TABLE
(
Player TEXT KEY,
Actual LONG CONTINUOUS PREDICT ONLY,
Win LONG CONTINUOUS PREDICT ONLY
)
) using Microsoft_Linear_Regression

This model wouldn't do anything (probalby return an error) because there simply are no inputs whatsoever. Again, if you changed a column to Predict (or just left it as Input), the values for different players could influence the regressions for other players. Again, the solution is to create unique models/player.

No comments:

Post a Comment