• ! ! ! IMPORTANT MESSAGE ! ! !

    Discussions about police investigations

    In light of recent developments about a player from Premier League being arrested and until there is an official announcement, ALL users should refrain from discussing or speculating about situations around personal off-pitch matters related to any Arsenal player. This is to protect you and the forum.

    Users who disregard this reminder will be issued warnings and their posts will get deleted from public.

Predicting League Position or Squad Cost Revised

Beksl

Sell All The Youngsters
I'll have some spare time in the coming weeks and decided to do a side project and evaluate how informative Squad Cost (and other variables) really are in predicting final league position.

First I need to construct a data set from which I'll do my analysis. Obviously the target variable is league position but what I need from you AMers is suggestions which other attributes/variables to include. Squad cost and wage bill are the obvious ones but I need as much attributes/variables as possible (the data must be available, mind you). Was thinking something along the line of number of passes, cleans sheets, posession %, goals scored etc. (data from 20172018 season). Maybe even some more advanced stats like xG, I'm open to suggestions.

Then I'll evaluate all the attibutes using information theory, computing parameters like Entropy, Gini Index, Information Gain, Gain ratio, Gini Decrease etc. This will tell me which of the attibutes is the most informative, has the highest measure of information of the distributions associated with random variables.

Then I'll train different predictive models/classifiers to a specific/target category (league position). I'll use different algorithms to see which gives me the highest classification accuracy and then test the model(s) on new/test data (from 18/19 season) to see how accurate my classifiers really are.

Attributes/Variables I'll definitely use:
  • League position
  • Squad cost
  • Wage bill

I'll post the results in the coming weeks.
 

krengon

One Arsène Wenger
Trusted ⭐
Sounds interesting, looking forward to the results..

Maybe add something about injuries/health? like how many players played over 25-30 games for a team and if those who had a lot of significant injuries did worse relative to their squad cost/wage bill.
 

Aevi

Hale End FC
Moderator
Sounds interesting, looking forward to the results..

Maybe add something about injuries/health? like how many players played over 25-30 games for a team and if those who had a lot of significant injuries did worse relative to their squad cost/wage bill.
To build on that, it'd be interesting to see it with some metric of how consistently a certain team was put out. Leicester basically played the same team week in week out in 2015/16 because of their lack of injuries. That was a big help in their success.
 

Rain Dance

Established Member
Trusted ⭐
well... we are currently top of the table !

I don't believe I am gonna write this... but I'll take 4th.....
 

BigPoppaPump

Reeling from Laca & Kos nightmares
I'll have some spare time in the coming weeks and decided to do a side project and evaluate how informative Squad Cost (and other variables) really are in predicting final league position.

There's much more productive things you can do with your unemployment.
 

idan

Member
Two seasons will not be enough, especially when you split them into learning and testing..

- Average time that a player is in a club.

-Average age of players.

-The amount of injuries in the season
 

Beksl

Sell All The Youngsters
Two seasons will not be enough, especially when you split them into learning and testing..

- Average time that a player is in a club.

-Average age of players.

-The amount of injuries in the season

Yeah that’s my concern as well.

I’m thinking of including some other economic metrics, like net spend, revenue streams etc.

What’s the best source for injuries, physio room?
 

Fallout

Active Member
- lag of league position (e.g. 1 year or maybe 3 year avg. -- promoted teams would start at 21+)
- re: injuries, i believe cumulative days lost to injury is on the internet somewhere and is the variable i would use
- number of unique player appearances throughout season (or std. dev. of total appearances across players in the first team)
- length of current manager's tenure at season's end (alternatively, number of managers over previous 3 yrs)

i hope u go as far back as possible time-wise and collect as much data as you can on off-the-pitch variables (e.g. finances, injuries) because those are the underlying factors that play into goals, cleansheets, etc that you previously mentioned.

edit: i would also use points accumulated as the outcome variable since it's more refined, and if u wanna make the variable relative, points earned by club divided by points earned among all clubs in league
 
Last edited:

Country: Iceland
Great contribution mate! I only tried linear regression model that I learned in statistics and probabilities for engineering course for squad cost. But as I lack programming skills I couldn't dream of doing the stuff you describe in your original post!

Interesting parameters you could try :
Injury record
Manager win ratios

Silly parameters you could try :
Full backs assists count
Manager trophies count
 

SingmeasongSong

Right Sometimes
Nice one !

Too time consuming for me to work on such a project, very nice one makes the effort.

- Average player height or particularly for CBs
- Average distance covered
- Speed/Acceleration of players in some way
- How attacks over the season have been distributed by left, middle, right which kinda correlates with quality along the whole field
- Correlated to the above, how goals are distributed among all players -> (maybe) being to reliant on 1-2 players is bad

What it'd really like to have shown analytically once, is how much having fast and tricky wingers on both sides makes your team so much more dangerous.

Basically on a chart why we are so ****ing bad there :lol:
 
Before you mine your data, Maybe we can use Football Manager and just simulate some data to test such an Approach. It would be easier to see if the Code works.

Nice idea, btw but that could be quite time consuming
 

Alkane

Active Member
Cool idea, but two seasons data seems too small for a time series analysis. Perhaps you could do monthly data for league position throughout each season however this would require a more complex model
 

Mo Britain

Doom Monger
Great idea but I'll save you a lot of time by guaranteeing we will finish 6th. Although this may be revised downwards at the end of the transfer window.
 

Beksl

Sell All The Youngsters
@Beksl Did you ever get around to this?

I've managed to make a spreadsheet with the most promising attributes/variables from 2017/18 season and did some preliminary analysis computing Entropy, Gini Index etc. The problem I have is I'm trying to use data mining approaches on samples that are really small and unstable (20 teams and there's always three different teams next season).

I'll probably get back to this at the beggning of next year when I'll have some more time.
 
  • Like
Reactions: A_G
Top Bottom