• ! ! ! IMPORTANT MESSAGE ! ! !

    Discussions about police investigations

    In light of recent developments about a player from Premier League being arrested and until there is an official announcement, ALL users should refrain from discussing or speculating about situations around personal off-pitch matters related to any Arsenal player. This is to protect you and the forum.

    Users who disregard this reminder will be issued warnings and their posts will get deleted from public.

The Great Squad Cost Thread

Squad cost?


  • Total voters
    58

STATS

Active Member
Thanks for all PMs :lol:

Here's the finished chart. The money spent on each squad plotted against finishing position. According to Excel the correlation is 74%. That means that three quarters of the teams in the EPL finished within 1 standard deviation of their squad cost position. Only one quarter of teams finished away from their determined position.

This means that money is the most important factor in determining where you will finish. But of course everybody knew that anyway. Nobody ever said it's the only factor.

The teams that broke free were West Brom and Burnley who did much better than expected. The worst were Man U and Sunderland who did worse than expected.

So there you have it. Anybody who said it was rubbish @Sammy1887 and @BBF are just talking through their arses :lol:

View attachment 1888
Sorry mate, but as a guy who has some background in Statistics, that model fails on so many levels.
To begin with, it seems you have taken an error rate of (+-)3 positions or maybe (+-) 4.
In the case of the former, if a team is based on the 10th position, and it finishes 7th or 13th, you are still regarding that acceptable. Which means even without squad cost or any such measure, the probability of getting the positions right is a massive 35% just because of your insane error rate which basically covers 7 positions out of 20
And in the case of the latter (+-4) ,it covers from 6th to 14th. which is 9 positions and has a probability of getting the positions right of 45%.

The overall accuracy or correlation as you called it is 74% which is next to meaningless in the case where your error itself has a probability of 35-45%. Meaning, more than half of your correlation may actually be because of "error". If you could run the same model with an error of +-1 or maybe +-2 and then give the correlation, that would probably be much more indicative of the success or your theory.
 

Mark Tobias

Mr. Agreeable
Sorry mate, but as a guy who has some background in Statistics, that model fails on so many levels.
To begin with, it seems you have taken an error rate of (+-)3 positions or maybe (+-) 4.
In the case of the former, if a team is based on the 10th position, and it finishes 7th or 13th, you are still regarding that acceptable. Which means even without squad cost or any such measure, the probability of getting the positions right is a massive 35% just because of your insane error rate which basically covers 7 positions out of 20
And in the case of the latter (+-4) ,it covers from 6th to 14th. which is 9 positions and has a probability of getting the positions right of 45%.

The overall accuracy or correlation as you called it is 74% which is next to meaningless in the case where your error itself has a probability of 35-45%. Meaning, more than half of your correlation may actually be because of "error". If you could run the same model with an error of +-1 or maybe +-2 and then give the correlation, that would probably be much more indicative of the success or your theory.
You're my new favourite poster. Don't leave this thread to long.
 

RoadrunnerReloaded

Active Member

Arséne doesn't believe in the squad cost theory.

@Makingtrax will straighten him out next season when he shows up to matches wearing:

WXRiQYK.png
 

bingobob

A-M’s Resident Hunskelper
Trusted ⭐

Country: Scotland
Sorry mate, but as a guy who has some background in Statistics, that model fails on so many levels.
To begin with, it seems you have taken an error rate of (+-)3 positions or maybe (+-) 4.
In the case of the former, if a team is based on the 10th position, and it finishes 7th or 13th, you are still regarding that acceptable. Which means even without squad cost or any such measure, the probability of getting the positions right is a massive 35% just because of your insane error rate which basically covers 7 positions out of 20
And in the case of the latter (+-4) ,it covers from 6th to 14th. which is 9 positions and has a probability of getting the positions right of 45%.

The overall accuracy or correlation as you called it is 74% which is next to meaningless in the case where your error itself has a probability of 35-45%. Meaning, more than half of your correlation may actually be because of "error". If you could run the same model with an error of +-1 or maybe +-2 and then give the correlation, that would probably be much more indicative of the success or your theory.
I'd be inclined to leave no margin for error
With the sample size being so small, twenty teams, teams should finish in line with their spend.

At the very most it should be + or - 1 space to help account for historical spend. Given the fact that teams are likely to have a steady spend incrementally rising as overall income increases and not massive spikes.
 

Makingtrax

Worships in the house of Wenger 🙏
Trusted ⭐

Country: England

Player:Saliba
Sorry mate, but as a guy who has some background in Statistics, that model fails on so many levels.
To begin with, it seems you have taken an error rate of (+-)3 positions or maybe (+-) 4.
In the case of the former, if a team is based on the 10th position, and it finishes 7th or 13th, you are still regarding that acceptable. Which means even without squad cost or any such measure, the probability of getting the positions right is a massive 35% just because of your insane error rate which basically covers 7 positions out of 20
And in the case of the latter (+-4) ,it covers from 6th to 14th. which is 9 positions and has a probability of getting the positions right of 45%.

The overall accuracy or correlation as you called it is 74% which is next to meaningless in the case where your error itself has a probability of 35-45%. Meaning, more than half of your correlation may actually be because of "error". If you could run the same model with an error of +-1 or maybe +-2 and then give the correlation, that would probably be much more indicative of the success or your theory.
It's a simple linear regression analysis that kids in school could do. There's no error bars on the data points. The correlation just sees what teams fit inside one standard deviation from the line, as always. It's not rocket science it just shows that money matters. Just look at the points on the graph. They fit quite nicely

Got you a few likes from the WOBS though, so credit for that. :lol:
 
Last edited:

Country: Iceland
Not a clue what that's about mate. I haven't assumed any error. It's a simple linear regression analysis that kids in school could do. You're over thinking it.

I'm actually doing a linear regression model for this season and the error for position is 4.464(Mostly because of United :lol:). The confidence interval with 97.5% is 11.44.
 

Makingtrax

Worships in the house of Wenger 🙏
Trusted ⭐

Country: England

Player:Saliba
I'm actually doing a linear regression model for this season and the error for position is 4.464(Mostly because of United :lol:). The confidence interval with 97.5% is 11.44.
Yeah United really f*cked it up, had they finished first the correlation would be 80+%.
 

KROENKE SUCKS

Active Member
What Wenger says is true with regards to certain clubs like City. But we are also behind the likes of Real, Barca, Bayern, etc. These clubs dont have outside funding. The difference is they were smart enough to sign longer sponsorship deals soon after we signed our post emirates deals. By signing for double the duration they get twice as much money since most of this money is paid up front. It only won't work out if sponsorship money balloons, which it has since once United signed their new deal. Fortunately for Chelsea they've got a get out clause for a price, enabling them to sign their new 60M/year deal which is almost double what they were getting paid. We unfortunately either weren't smart enough to negotiate a favourable get out clause, which would be Gaz's fault.

Wenger's never going to say that Gazidis screwed up by signing a deal that was too short, not after he's convinced Stan to keep Özil and Sanchez and signed a new deal. That would be publicly airing dirty laundry. Arsène dont roll like that.
 

Country: Iceland
Yeah United really f*cked it up, had they finished first the correlation would be 80+%.

Yeah there is also team that finished in position 9 and 10 who only had squad cost of 83 and 70. The teams in 11-15 have higher squad costs than the teams in 9 and 10.

The t values are 7.8 for the intercept and -4.663 from Position finished in.

Pr(>|t|) values are 3.91e-07 *** for intercept and 0.000193 *** for the position finished. Three stars (or asterisks) represent a highly significant p-value. Consequently, a small p-value for the intercept and the slope indicates that we can reject the null hypothesis which allows us to conclude that there is a relationship between Position finished and Squad cost.

Residual standard error: 115.1 on 18 degrees of freedom. Multiple R-squared: 0.5471, Adjusted R-squared: 0.522.

Finally. F-statistic: 21.75 on 1 and 18 DF, p-value: 0.0001933. F-statistic is a good indicator of whether there is a relationship between our predictor and the response variables. The further the F-statistic is from 1 the better it is. The p-value is 0.0001933.

The data:
Spot Spending Squad Cost
1 20.74 419
2 25.67 239
3 151 532
4 4.25 310
5 87.25 332
6 117.17 628
7 23.04 184
8 -13.73 158
9 12.92 83
10 9.29 70
11 36.13 171
12 22.14 113
13 29.53 104
14 44.54 131
15 8.16 83
16 37.74 48
17 10.07 81
18 9.44 75
19 33.28 54
20 13.46 91

The code
library(ggplot2)

squadcost2 <- read.csv("squadcost2.csv", header=T)
summary(squadcost2)

ggplot(squadcost2, aes(x= Spot, y = Spending)) + geom_point()
ggplot(squadcost2, aes(x= Spot, y = Squad.Cost)) + geom_point()

fit<-lm(Squad.Cost ~ Spot, data=squadcost2)
summary(fit)
confint(fit)

ggplot(squadcost2, aes(x= Spot, y = Squad.Cost)) + geom_point() + geom_smooth(method="lm", se=F)
diag<-fortify(fit)
ggplot(diag, aes(x= Spot, y = .resid)) + geom_point()
ggplot(diag, aes(sample = .resid)) + stat_qq() + geom_abline(slope = sd(diag$.resid), intercept = mean(diag$.resid))

And finally for fun:

The model predicts in order to win the league:

predict(fit, newdata = data.frame(Spot=1))

The squad need to cost: 393.0714

I used squad costs from:

http://metro.co.uk/2016/10/04/manch...ore-than-twice-as-much-as-liverpools-6169214/
 

Makingtrax

Worships in the house of Wenger 🙏
Trusted ⭐

Country: England

Player:Saliba
Yeah there is also team that finished in position 9 and 10 who only had squad cost of 83 and 70. The teams in 11-15 have higher squad costs than the teams in 9 and 10.

The t values are 7.8 for the intercept and -4.663 from Position finished in.

Pr(>|t|) values are 3.91e-07 *** for intercept and 0.000193 *** for the position finished. Three stars (or asterisks) represent a highly significant p-value. Consequently, a small p-value for the intercept and the slope indicates that we can reject the null hypothesis which allows us to conclude that there is a relationship between Position finished and Squad cost.

Residual standard error: 115.1 on 18 degrees of freedom. Multiple R-squared: 0.5471, Adjusted R-squared: 0.522.

Finally. F-statistic: 21.75 on 1 and 18 DF, p-value: 0.0001933. F-statistic is a good indicator of whether there is a relationship between our predictor and the response variables. The further the F-statistic is from 1 the better it is. The p-value is 0.0001933.

The data:
Spot Spending Squad Cost
1 20.74 419
2 25.67 239
3 151 532
4 4.25 310
5 87.25 332
6 117.17 628
7 23.04 184
8 -13.73 158
9 12.92 83
10 9.29 70
11 36.13 171
12 22.14 113
13 29.53 104
14 44.54 131
15 8.16 83
16 37.74 48
17 10.07 81
18 9.44 75
19 33.28 54
20 13.46 91

The code
library(ggplot2)

squadcost2 <- read.csv("squadcost2.csv", header=T)
summary(squadcost2)

ggplot(squadcost2, aes(x= Spot, y = Spending)) + geom_point()
ggplot(squadcost2, aes(x= Spot, y = Squad.Cost)) + geom_point()

fit<-lm(Squad.Cost ~ Spot, data=squadcost2)
summary(fit)
confint(fit)

ggplot(squadcost2, aes(x= Spot, y = Squad.Cost)) + geom_point() + geom_smooth(method="lm", se=F)
diag<-fortify(fit)
ggplot(diag, aes(x= Spot, y = .resid)) + geom_point()
ggplot(diag, aes(sample = .resid)) + stat_qq() + geom_abline(slope = sd(diag$.resid), intercept = mean(diag$.resid))

And finally for fun:

The model predicts in order to win the league:

predict(fit, newdata = data.frame(Spot=1))

The squad need to cost: 393.0714

I used squad costs from:

http://metro.co.uk/2016/10/04/manch...ore-than-twice-as-much-as-liverpools-6169214/
So £393m gets you a shot at the title. But that's last season. The big guns are preparing to up the anti.

United aren't going to be happy with 6th, spending £650m. That is really underperforming. So there answer will be to throw even more money at the problem.

Next season you might need another £50m or so on top of that figure.:lol:
 

Country: Iceland
So £393m gets you a shot at the title. But that's last season. The big guns are preparing to up the anti.

United aren't going to be happy with 6th, spending £650m. That is really underperforming. So there answer will be to throw even more money at the problem.

Next season you might need another £50m or so on top of that figure.:lol:

Yeah well this model is actually built with United finishing 6th with their insane squad cost. :lol:

I guess given Mourinho having good 2nd seasons and will have 2x the predicted value of squad cost to win the league that he should win it quite comfortably. :lol:
 

Makingtrax

Worships in the house of Wenger 🙏
Trusted ⭐

Country: England

Player:Saliba
Yeah well this model is actually built with United finishing 6th with their insane squad cost. :lol:

I guess given Mourinho having good 2nd seasons and will have 2x the predicted value of squad cost to win the league that he should win it quite comfortably. :lol:
Have you tried a non linear model to see if you can get a better fit, now the data's in your machine?
 

Country: Iceland
Have you tried a non linear model to see if you can get a better fit, now the data's in your machine?

No not yet... Not sure how I do that. Currently away from my computer. Maybe I look into it when I have time.

I was also wondering to see what happens to error and p values if I just take United out of the data. Cant trust those bastards. :lol:
 

Makingtrax

Worships in the house of Wenger 🙏
Trusted ⭐

Country: England

Player:Saliba
No not yet... Not sure how I do that. Currently away from my computer. Maybe I look into it when I have time.

I was also wondering to see what happens to error and p values if I just take United out of the data. Cant trust those bastards. :lol:
Think you can use the nlinfit function in MATLAB if that's what you're using.
 

Country: Iceland
Think you can use the nlinfit function in MATLAB if that's what you're using.

Dont have matlab in my new cpu at the moment. I worked the data in Rstudio. I will google it when I have it.

Maybe we will find the perfect model and make those famous 7 figures of money next season. :drool:
 

Latest posts+

Top Bottom