One apparently confused redditor has made the following claims about the attractiveness assessments used in research into preferences:

https://cdn-images-1.medium.com/max/2000/0*aiEOj6bJOf5mZX_z.png Look at the male messaging curve.

Now again look at the woman's curve.

http://cdn.okcimg.com/blog/your_looks_and_inbox/Female-Messaging-Curve.png Why would men be messaging women they mostly find attractive while women seem to be messaging men they on average find unattractive?

Here's a break down of how this works:

Let's say there are 3 ice cream flavors: A B C, and subjects are to each rate them 1 - 5. And this happened:

Subject 1

A 1 B 3 C 5

Subject 2

A 5 B 3 C 1

Subject 3

A 1 B 5 C 1

Subject 4

A 1 B 5 C 3

So our results are:

5 1s 3 3s 3 5s

3 good flavors

8 less than good flavors

The subjects would be rating 80 percent of ice cream flavors less desirable. Yet they each still individually PREFER ice cream flavors that are on average rated as less than desirable by the group.

Black pillers along with LMSers deliberately ignore the messaging curve while pretending that women all have the same tastes and judge 80 percent of men as unattractive and so the 20 percent that remains must all be the same guys.

The messaging curve easily debunks that and reveals what's really happening.

The power of stats.

Side-stepping the utterly questionable (aka wrong) math and implicit assumptions involved in interpreting the sum count of all <5/5 ratings on 3 ice cream flavors as subjects overall rating "80 percent of (three!) ice cream flavors less desirable," let's focus on the crux of this post: that the ratings are too "variegated" to be reliable.

First, I'll elaborate on something I mentioned here in response to this redditor's concerns. An excerpt:

The argument you're trying to make is that some subgroup or diffuse heterogeneity precludes any statistical analyses. Except for the fact that if this were true then:

  1. there would be poor correlation of ratings between different independent observers used in the studies for a single final rating (usually a central tendency metric such as mean) to be useful (this is measured by the alpha index, by the way)

By alpha index, I'm referring to the Cronbach's α aka tau-equivalent reliability measure for inter-rater reliability. Nearly all research involving attractiveness ratings show a Cronbach's α >0.80, and often >0.9 when ratings are limited to heterosexual raters evaluating opposite sex targets. Hitsch 2006 and 2010 (in the sidebar) had a mixed sex group of 100 different raters for their massive dataset, yielding 12 ratings per photo, with a Cronbach's α of 0.80. Here's a commonly used scheme for interpreting the value:

Cronbach's alpha Internal consistency
0.9 ≤ α Excellent
0.8 ≤ α < 0.9 Good
0.7 ≤ α < 0.8 Acceptable
0.6 ≤ α < 0.7 Questionable
0.5 ≤ α < 0.6 Poor
α < 0.5 Unacceptable

Which bring's us to the heart of the matter:

What's the Cronbach's α of the neomancr hypothetical ratings dataset?

First, his data, re-presented again in a clearer table form:

Rater Ice cream A Ice cream B Ice cream C
Subject 1 1 3 5
Subject 2 5 3 1
Subject 3 1 5 1
Subject 4 1 5 3

The next steps may be performed with your preferred stats software of choice or excel:

Anova: Two-Factor Without Replication||||||| :--|--:|:--|:--|:--|:--|:--| ||||||| SUMMARY|Count|Sum|Average|Variance||| Subject 1|3|9|3|4||| Subject 2|3|9|3|4||| Subject 3|3|7|2.333333|5.333333||| Subject 4|3|9|3|4||| ||||||| Ice cream A|4|8|2|4||| Ice cream B|4|16|4|1.333333||| Ice cream C|4|10|2.5|3.666667||| ||||||| ||||||| ANOVA||||||| Source of Variation|SS|df|MS|F|P-value|F crit| Rows|1|3|0.333333|0.076923|0.970184|4.757063| Columns|8.666667|2|4.333333|1|0.421875|5.143253| Error|26|6|4.333333|||| ||||||| Total|35.66667|11||||| ||||||| ||||||| Cronbach's α|0||||||

The Cronbach's α of the neomancr dataset is ZERO.

Slightly more "variegated" than what actual studies show, eh?

Given there hasn't been a single study that I'm aware of with a Cronbach's α below 0.75 for looks ratings, we can probably rest assured that the hypothetical dataset neomancr envisioned, with such marked variation between raters, exists nowhere except his own imagination.

To facilitate the understanding of how Cronbach's α changes with how "variegated" the numbers are, see below.


Case 2: Perfect agreement between raters:

Rater Ice cream A Ice cream B Ice cream C
Subject 1 5 3 1
Subject 2 5 3 1
Subject 3 5 3 1
Subject 4 5 3 1

Anova: Two-Factor Without Replication||||||| :--|--:|:--|:--|:--|:--|:--| ||||||| SUMMARY|Count|Sum|Average|Variance||| Subject 1|3|9|3|4||| Subject 2|3|9|3|4||| Subject 3|3|9|3|4||| Subject 4|3|9|3|4||| ||||||| Ice cream A|4|20|5|0||| Ice cream B|4|12|3|0||| Ice cream C|4|4|1|0||| ||||||| ||||||| ANOVA||||||| Source of Variation|SS|df|MS|F|P-value|F crit| Rows|0|3|0|65535|#DIV/0!|4.757063| Columns|32|2|16|65535|#DIV/0!|5.143253| Error|0|6|0|||| ||||||| Total|32|11||||| ||||||| ||||||| Cronbach's α|1||||||


Case 3: Less than perfect agreement between raters:

Rater Ice cream A Ice cream B Ice cream C
Subject 1 4 2 1
Subject 2 3 3 2
Subject 3 5 3 1
Subject 4 4 2 1

Anova: Two-Factor Without Replication| | | |||| :--|--:|:--|:--|:--|:--|:--| ||||||| SUMMARY|Count|Sum|Average|Variance||| Subject 1|3|7|2.333333|2.333333||| Subject 2|3|8|2.666667|0.333333||| Subject 3|3|9|3|4||| Subject 4|3|7|2.333333|2.333333||| ||||||| Ice cream A|4|16|4|0.666667||| Ice cream B|4|10|2.5|0.333333||| Ice cream C|4|5|1.25|0.25||| ||||||| ||||||| ANOVA||||||| Source of Variation|SS|df|MS|F|P-value|F crit| Rows|0.916667|3|0.305556|0.647059|0.612811|4.757063| Columns|15.16667|2|7.583333|16.05882|0.0039|5.143253| Error|2.833333|6|0.472222|||| ||||||| Total|18.91667|11||||| ||||||| ||||||| Cronbach's α|0.937729||||||