Weekly reading: Chapter 6&7 Truthful Art

Chapter 6 and 7 give really useful basic knowledge of the Distribution. As the conception of “mode”, “mean “and “median”, which we might be quite familiar with in the mathematical approach; however, the role they play in the process of exploring data, which we need to learn more than we’ve done so far.

Median and Mean, just like bros, we use them both in an analysis quite often. I used to think that “Mean”, the average value, should be more unambiguous and objective than other statistics. Well, it is only 50% right. Median, actually, is a more “resistant statistic” that would not be affected by the extreme data (or “outlier”). The average salary of UNC Chapel Hill is a good example, like the book writes “thing we should care about is not how much people earn on average but on how much the average people earn.”, which is definitely true. This example makes me recall an interesting thing. There is a famous entrepreneur called Jack Ma in my country (the person like Jeff Bezos) who graduate from English education major of a general university; the thing interesting is that they put the post graduate earning (average) on the enrolling publicity which is much higher than the person I know from this school. The value is true, but it could not stand for the most graduates of this major; well, that is a kind of tricky.

A note for the mean: if we need the average score of groups of different sizes, we could use weight means. The formula is Weight Mean= [(numbers of group 1*mean score in group 1] +(numbers of group 2*mean score in group 2)+ etc.]/(numbers of group 1+numbers of group 2+ etc..)

Combined with Chapter 6, chapter 7 gives a mind map when visualizing distributions. Most time when we get a raw data set, we cannot tell anything special with a glance until present them with different types of charts or maps. Here is a type called frequency chart including funnel plot (avoid the side-effect of population or sample size), box-plot etc. 

Rest parts of Chapter 7 spent a lot of ink on introducing the Box-and-Whisker plot, which measure the spread of the data beginning with the median using percentiles (25%, 50%,75%). Specifically, the Whiskers represent the range of scores that lie within 1.5 times the Interquartile Range (IQR). 

A note for the box-plot: before building the charts, we should know what is “standard deviation”, “standard score” and “normal distribution”. I will not go detailed, just put on the formulas. 

Standard score (z-score of a raw score) = (Raw score-mean)/standard deviation

Last but not the least, some tips when applying different charts.
n  When choose Box-plot: 1. Want to show the important thing straightly and vividly, like the spread or range of the data. Since Histogram and Strip plot are too detailed for some purposes that may obscure the important ones; 2. Not just analyzing one distribution but comparing several of them (e.g. The evaluation of Schools in Brazil). 
n  When choose Histogram/Strip plot: Want to stress outliers (particular the distribution is very pointy and has very flat tails).


Since we fix out the norm, the essential nature of data (level, the spread, the shape of our data), we could take the next step as identifying deviation. We could “set aside one of the elements of the norm” to find deviation.

Comments