I'm not sure I did the best job in the world of explaining exactly what it is that I found in my post from Yesterday.

I have to explain a wee bit about statistics for any of this to make sense to many of my readers. So those of you who already had this in school, just roll your eyes, please.

Psychology has a problem that is unique to other fields in that they have to study live and sentient beings. Dead people don't exhibit "psychology." And trying to take a living brain apart to study it just shows how a damaged brain responds. It doesn't really tell you how a healthy brain response. To make matters even more interesting, Psychologists have to deal with the fact that some of the things they are asking those conscious, sentient beings are topics that the even a conscious sentient being isn't aware that they are doing. Or the subject IS aware of what they are doing and are embarrassed by it. And if not embarrassed, convinced that some social norm dictates a different answer than the actual truth. And if not hiding something, or trying to enforce a social norm, some go out of their way to mess with social norms.

Yeah humans.

One of the workarounds is to study large populations of people answering the same questions. With a large enough sample size (around 1000 or so but there is actually a formula), scientists can be confident that any trend they see will overcome individual tendencies to skew results. That is why you see "ranges" for probabilities in the capt.org spreadsheets. There are formulas that tell you, over your sample size, and given what your subjects have answered, mathematically, here is the range of numbers that can scientifically explain the behavior we are seeing. So reading the figures off of CAPT's spreadsheet"> for Extraverted vs. Introverted females:

Females

E45-55%52.5%
I45-55%47.5%
That tells us that the real number could be anywhere between 45 and 55%. The computer's best guess is 52.5% for E and 47.5% for I. But the actually errors in the data could support 50/50, 45/55, 55/45 or any ratio in between. And you would think that sampling more and more people would help clean up the noise. But no. It's like measuring an impact crater. You can put a ruler on where you think the phenomenon starts and stops, but the middle is an estimation. (Albeit a mathematically sounds one.)

Now we are going to get into a wee bit of Philosophy. I am going to give you two different answers to the same question that are both technically correct. They hinge on whether you consider Gender to be a defining characteristic of personality.

Now in the Gender is a real thing school, we look at the numbers where Gender is controlled as a variable. The way we do that is put a little question in your survey: "Do you identify as a Male or a Female?" With a second question of "If yes, which." Though adds are most of the survey data from the 70s and 80s probably just has a Male/Female multiple choice box.

Now if you chose to use that variable, you partition the samples into a male bin and a female bin, and discard any survey for which you don't have a clear gender assignment. (Or at least that seems to be what this study did.) The computer then analyzes the questions and gives you back your metrics.

The figures I was concerned with were the Thinking vs Feeling metric. Here's a quick summary of that data again:

Male %Female %Combined
Thinking55-6724-3540-50
Feeling33-4565-7550-60

Wait a minute. How on Earth do we end up with ranges for the combined numbers that are outside of the bracket for either gender? And the answer is ... numbers. If you feed the numerical software data that isn't controlled for a know variable, it simply produces muddier numbers. The software's job is to screen out noise, after all. The differences between men an women are treated as individual variation. And so the "average" ends up being a number that really doesn't fit either of the two populations you are serving. But it does answer the question "given a person at random and you don't know the gender what can you expect."

Except it doesn't. Go back to page 1 of the report. There is a profound under-sampling of males in this study. Males are 50% of the population between the ages of 25-54. (Odd quirk of humanity, there are more male children born, but they are more likely to die earlier. At birth the ratio if male to female is 1.04:1. By age 25, it's around 1:1, and after age 55, women outnumber men.) Yet they are only 46% of the sample population of this study. And the notes explain why: they can't get enough males to sit down and finish the survey. (Or take the survey over the phone, back in the old days.)

Remember what I said about humans being tricky things to study?

Other things to look out for are that these studies tended to be carried out around college campuses, so intellectuals are also probably over represented. For my purposes, as a game writer trying to build an authentic population, I need to factor in that males are more introverted than the numbers support, (thus the lack of responses) and that while my initial crew may be college educated, their children are probably going to be showing lower levels of pure-intellectual traits (NT), which I can probably estimate by looking at the proportion of people who never go to college.

Although there is an argument to be made that the population on this ship will kind of stop being a representative sample of humanity. With Universal access to quality medical care and early childhood education there will probably be some minor changes in personality distributions. Could be more intellectual. Could be less intellectual. As a speculative fiction writer, I basically get to pick. Or not.