xG map for Swansea City this season. Swans look legit y'all. pic.twitter.com/znZUgD2aDc
— Michael Caley (@MC_of_A) September 2, 2015
Now, Swansea sit 14th in the table, despite having faced one of the weaker schedules in the Premier League (in terms of opponent points per match). They have a -4 goal difference, and while their shots on target difference is better (+.5 shots per game), their expected goals difference is now negative as well. So what happened?
For me, the answer becomes clear if we look at Swansea's first two matches. In the 52' minute of the Chelsea match, Thibaut Courtois received a red card, reducing Chelsea to ten men. Chelsea had one shot on target the rest of the way, Swansea had five (including Gomis' penalty). Against Newcastle, Daryl Janmaat picked up a red card in the 42nd minute, reducing the Magpies to ten men. Newcastle had no shots on target following the dismissal, Swansea had three including Ayew's insurance goal. In both games, the red card allowed Swansea to dominate proceedings in a way that was not happening when the opposing team had a full complement of players. This is hardly surprising, given that only one team this season has gone on to prevail when one of their players received a red card with more than 15 minutes remaining (Chelsea against West Brom, for those wondering). A man advantage is such a huge bonus that it becomes hard to take statistics from such games seriously (see City v. West Brom last year), and Swansea's early season stats undeniably got a huge boost from those two matches being included, and led to them being overrated early on (their dismantling of relegation-bound Sunderland didn't hurt either).
Should games with red cards be included in a team's statistics? Obviously, that depends on the question you are trying to answer. If you were making the case that Swansea were playing very well in the first four games, I absolutely think those should be included. In both cases, the red cards were probably deserved, and Swansea did go on to perform at a high level in those games. If you are making the case that they were likely to continue to perform well (as the above prognosticators were), I think they probably shouldn't be included (or at least not stats following the red card), particularly in such a small sample. Red cards are relatively rare events, there were only 71 in 380 Premier League matches last term, and there are even fewer that actually had an impact on the game (due to the accumulation of yellows and general ref tendencies, red cards are more likely to be shown late in matches). Even if the red cards were earned by Swansea's play, the stats accumulated from when they were 11 v 10 is probably not reflective of their true talent level. Moreover, as we saw with James McCarthy's tackle on Dmitri Payet this past week, deserved reds are not always given by refs, and there is no way for Swansea to control the referees (short of channeling Alex Ferguson).
There's still a definite shortage of good data out there on football, and oftentimes the tendency is to use whatever we can get. Still, I think this is example shows that it's just as important to know when to exclude data as it is when to use it.