Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use normal distribution to generate random height in new character screen #49270

Merged
merged 9 commits into from
Jul 7, 2021

Conversation

Tairesh
Copy link
Contributor

@Tairesh Tairesh commented Jun 12, 2021

Summary

Bugfixes "Use normal distribution to generate random height in new character screen"

Purpose of change

When you press [*] on the final tab of the new character screen, it generates random sex, name, height, blood type. For blood type, it uses the real distribution of blood types, but for generating height, it uses rng() with uniform distribution. So you have an equal chance of getting a male character who is 69 inches tall and one who is 58 inches tall, which is absurd.

Describe the solution

Created function Character::randomize_height() that generates height with normal distribution.

Describe alternatives you've considered

Probably it also should use appearance traits like skin tone to adjust the mean height.

Testing

Compiled the game, opened a new character screen, and pressed * a lot of times. The distribution looks realistic.

Additional context

It's not necessary, but a ton of very small and very big characters hits my perfectionist sensibilities hard.

src/newcharacter.cpp Outdated Show resolved Hide resolved
@anothersimulacrum anothersimulacrum added the [C++] Changes (can be) made in C++. Previously named `Code` label Jun 12, 2021
@Tairesh Tairesh force-pushed the fix_random_character_height branch from a8e3ebb to 57e9b97 Compare June 12, 2021 15:50
@actual-nh actual-nh added the Character / World Generation Issues and enhancements concerning stages of creating a character or a world label Jun 12, 2021
@lgtm-com
Copy link

lgtm-com bot commented Jun 12, 2021

This pull request introduces 1 alert when merging 0fca28f into a76f7e8 - view on LGTM.com

new alerts:

  • 1 for Use of c-style math functions

src/character.cpp Outdated Show resolved Hide resolved
@kevingranade
Copy link
Member

We intentionally do not represent biological sex and do not want to open that particular can of worms.

When creating a character, you are not sampling randomly from a population, but rather choosing parameters directly, so there is no distribution to follow.

@pehamm
Copy link
Contributor

pehamm commented Jun 13, 2021

The player is still given the option to start randomized runs, and I would argue using a bell curve instead of a uniform distribution is reasonable for that. It feels weird when most of my random characters are much taller or shorter than the expected.

But you are probably right that distinguishing between sexes here would open a can of worms, did not think of that myself.

src/character.cpp Outdated Show resolved Hide resolved
@pehamm
Copy link
Contributor

pehamm commented Jun 13, 2021

@Tairesh you will probably want to follow Kevins advice and use one distribution for males and females. Having a normal height distribution that ignores biological sex is still better than the current system. You can use an average of 166.43 and a standard deviation of 9.75.

How I got these numbers: From the CIA's world factbook (https://web.archive.org/web/20210609225248if_/https://www.cia.gov/the-world-factbook/countries/united-states/#people-and-society) in the entry Age Structure we get 50.36% males and 49.64% females between the ages of 15 and 54 for the US.

The average is simply the weighted average 0.5036170 + 0.4964162.8 = 166.42592 (~166.43), but the standard deviation is more complex.

I simulated this in python, creating separate samples from the male and female population statistics, united them to a single large sample and pulled the sample moments. This new sample had a mean of 169.4384 and a standard deviation of 9.7497 (~9.75). The distribution was not really normal (actual distribution blue, idealized normal distribution orange), but it works well enough and is much easier to work with:
MF_Distribution

@Qrox
Copy link
Contributor

Qrox commented Jun 15, 2021

weighted average 0.5036*170 + 0.4964*162.8 = 166.42592

The average male height is 176 so the average should be 169.44752

To calculate the stddev:

import numpy as np

x_mean = 176.0
x_std = 7.4
x_sqr_mean = x_mean ** 2 + x_std ** 2
y_mean = 162.8
y_std = 7.0
y_sqr_mean = y_mean ** 2 + y_std ** 2
p_x = 0.5036
p_y = 0.4964
z_mean = x_mean * p_x + y_mean * p_y
z_sqr_mean = x_sqr_mean * p_x + y_sqr_mean * p_y
z_std = np.sqrt(z_sqr_mean - z_mean ** 2)
print(f'mean={z_mean}, std={z_std}')

Outputs mean=169.44752, std=9.77028545384455

And to be honest, the source of the height data (http://www.biostat.jhsph.edu/bstcourse/bio751/papers/bimodalHeight.pdf) is a bit too old (published in 2002). Maybe we should find a more recent source.

@pehamm
Copy link
Contributor

pehamm commented Jun 15, 2021

@Qrox Good points and good catch on my typo. I overthought and assumed that because we sample from two different distributions it should be complex, and forgot that we can just apply the E[X^2] - E[X]^2.

On the data, I just checked CDC distributes statistics for the US population. The most recent data is for 2015-2018 and has 175.3 cm mean and standard deviation 13.55 for men (table 11, std dev = standard error * sqrt(N)) and 161.3 cm mean and 14.10 standard deviation for females.

That would imply a mean of
175.3 * 0.5036 + 161.3 * 0.4964 = 168.35

And a standard deviation of
sqrt(0.5036*(175.3^2 + 13.55^2) + 0.4964*(161.3^2 + 14.1^2) - 168.35^2) = 15.50

It is interesting that the standard deviations in this source are generally higher, not sure where this comes from.

@Qrox
Copy link
Contributor

Qrox commented Jun 15, 2021

I think it's because the data cited by the 2002 paper are from the 20-29 age bracket, whereas the 2015-2018 data here are from all ages > 20. Which means we should use the 2015-2018 data because it is more recent and covers a more representative age range.

@I-am-Erk I-am-Erk merged commit efcbdcc into CleverRaven:master Jul 7, 2021
@Tairesh Tairesh deleted the fix_random_character_height branch July 7, 2021 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[C++] Changes (can be) made in C++. Previously named `Code` Character / World Generation Issues and enhancements concerning stages of creating a character or a world
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants