Predicting League Position or Squad Cost Revised

Discussion in 'Arsenal Talk' started by Beksl, Jun 13, 2019.

  1. Beksl

    Beksl Sell All The Youngsters

    I'll have some spare time in the coming weeks and decided to do a side project and evaluate how informative Squad Cost (and other variables) really are in predicting final league position.

    First I need to construct a data set from which I'll do my analysis. Obviously the target variable is league position but what I need from you AMers is suggestions which other attributes/variables to include. Squad cost and wage bill are the obvious ones but I need as much attributes/variables as possible (the data must be available, mind you). Was thinking something along the line of number of passes, cleans sheets, posession %, goals scored etc. (data from 20172018 season). Maybe even some more advanced stats like xG, I'm open to suggestions.

    Then I'll evaluate all the attibutes using information theory, computing parameters like Entropy, Gini Index, Information Gain, Gain ratio, Gini Decrease etc. This will tell me which of the attibutes is the most informative, has the highest measure of information of the distributions associated with random variables.

    Then I'll train different predictive models/classifiers to a specific/target category (league position). I'll use different algorithms to see which gives me the highest classification accuracy and then test the model(s) on new/test data (from 18/19 season) to see how accurate my classifiers really are.

    Attributes/Variables I'll definitely use:
    • League position
    • Squad cost
    • Wage bill

    I'll post the results in the coming weeks.
    CJJ, samshere, truth_hurts and 6 others like this.
  2. krengon

    krengon One Arsene Wenger Trusted

    Sounds interesting, looking forward to the results..

    Maybe add something about injuries/health? like how many players played over 25-30 games for a team and if those who had a lot of significant injuries did worse relative to their squad cost/wage bill.
  3. Aevi

    Aevi Hale End FC Moderator

    To build on that, it'd be interesting to see it with some metric of how consistently a certain team was put out. Leicester basically played the same team week in week out in 2015/16 because of their lack of injuries. That was a big help in their success.
    Beksl, Mark Tobias and krengon like this.
  4. Mark Tobias

    Mark Tobias Mr. Agreeable

    Here we go again:rofl:

    No, in all honesty I'd be very keen to see the results.
  5. Rain Dance

    Rain Dance Well-Known Member Trusted

    well... we are currently top of the table !

    I don't believe I am gonna write this... but I'll take 4th.....
  6. OnlyOne

    OnlyOne Tier 1 Height

    Hurry up.
    Erlis and say yes like this.
  7. BigPoppaPump

    BigPoppaPump Could Never Wifey An Opp Thot

    There's much more productive things you can do with your unemployment.
    Erlis likes this.
  8. idan

    idan Member

    Two seasons will not be enough, especially when you split them into learning and testing..

    - Average time that a player is in a club.

    -Average age of players.

    -The amount of injuries in the season
    Aevi and Beksl like this.
  9. Beksl

    Beksl Sell All The Youngsters

    It’s called practising and learning new things. You should try it.
    CJJ and SingmeasongSong like this.
  10. Beksl

    Beksl Sell All The Youngsters

    Yeah that’s my concern as well.

    I’m thinking of including some other economic metrics, like net spend, revenue streams etc.

    What’s the best source for injuries, physio room?
  11. Fallout

    Fallout Well-Known Member

    - lag of league position (e.g. 1 year or maybe 3 year avg. -- promoted teams would start at 21+)
    - re: injuries, i believe cumulative days lost to injury is on the internet somewhere and is the variable i would use
    - number of unique player appearances throughout season (or std. dev. of total appearances across players in the first team)
    - length of current manager's tenure at season's end (alternatively, number of managers over previous 3 yrs)

    i hope u go as far back as possible time-wise and collect as much data as you can on off-the-pitch variables (e.g. finances, injuries) because those are the underlying factors that play into goals, cleansheets, etc that you previously mentioned.

    edit: i would also use points accumulated as the outcome variable since it's more refined, and if u wanna make the variable relative, points earned by club divided by points earned among all clubs in league
    Last edited: Jun 14, 2019
    Beksl and SingmeasongSong like this.
  12. hydrofluoric acid

    hydrofluoric acid Dishonest To His Federation

    Great contribution mate! I only tried linear regression model that I learned in statistics and probabilities for engineering course for squad cost. But as I lack programming skills I couldn't dream of doing the stuff you describe in your original post!

    Interesting parameters you could try :
    Injury record
    Manager win ratios

    Silly parameters you could try :
    Full backs assists count
    Manager trophies count
    Beksl likes this.
  13. SingmeasongSong

    SingmeasongSong Rarely Right

    Nice one !

    Too time consuming for me to work on such a project, very nice one makes the effort.

    - Average player height or particularly for CBs
    - Average distance covered
    - Speed/Acceleration of players in some way
    - How attacks over the season have been distributed by left, middle, right which kinda correlates with quality along the whole field
    - Correlated to the above, how goals are distributed among all players -> (maybe) being to reliant on 1-2 players is bad

    What it'd really like to have shown analytically once, is how much having fast and tricky wingers on both sides makes your team so much more dangerous.

    Basically on a chart why we are so ****ing bad there :lol:
  14. Before you mine your data, Maybe we can use Football Manager and just simulate some data to test such an Approach. It would be easier to see if the Code works.

    Nice idea, btw but that could be quite time consuming
  15. Tourbillion

    Tourbillion Angry 24/7

    God you're a geek, but thanks!
  16. Alkane

    Alkane Well-Known Member

    Cool idea, but two seasons data seems too small for a time series analysis. Perhaps you could do monthly data for league position throughout each season however this would require a more complex model
  17. Mo Britain

    Mo Britain Doom Monger

    Great idea but I'll save you a lot of time by guaranteeing we will finish 6th. Although this may be revised downwards at the end of the transfer window.
  18. American_Gooner

    American_Gooner Not actually American. Unless Di Marzio says so. Moderator

    @Beksl Did you ever get around to this?
  19. Manberg

    Manberg Predator

    What software do you use for such things? Sas?
  20. Beksl

    Beksl Sell All The Youngsters

    I've managed to make a spreadsheet with the most promising attributes/variables from 2017/18 season and did some preliminary analysis computing Entropy, Gini Index etc. The problem I have is I'm trying to use data mining approaches on samples that are really small and unstable (20 teams and there's always three different teams next season).

    I'll probably get back to this at the beggning of next year when I'll have some more time.
    American_Gooner likes this.

Share This Page

Watch Arsenal Live Streams With

Do Not Sell My Personal Information