-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use normal distribution to generate random height in new character screen #49270
Use normal distribution to generate random height in new character screen #49270
Conversation
a8e3ebb
to
57e9b97
Compare
This pull request introduces 1 alert when merging 0fca28f into a76f7e8 - view on LGTM.com new alerts:
|
Co-authored-by: pehamm <[email protected]>
We intentionally do not represent biological sex and do not want to open that particular can of worms. When creating a character, you are not sampling randomly from a population, but rather choosing parameters directly, so there is no distribution to follow. |
The player is still given the option to start randomized runs, and I would argue using a bell curve instead of a uniform distribution is reasonable for that. It feels weird when most of my random characters are much taller or shorter than the expected. But you are probably right that distinguishing between sexes here would open a can of worms, did not think of that myself. |
Co-authored-by: John Bytheway <[email protected]>
@Tairesh you will probably want to follow Kevins advice and use one distribution for males and females. Having a normal height distribution that ignores biological sex is still better than the current system. You can use an average of 166.43 and a standard deviation of 9.75. How I got these numbers: From the CIA's world factbook (https://web.archive.org/web/20210609225248if_/https://www.cia.gov/the-world-factbook/countries/united-states/#people-and-society) in the entry Age Structure we get 50.36% males and 49.64% females between the ages of 15 and 54 for the US. The average is simply the weighted average 0.5036170 + 0.4964162.8 = 166.42592 (~166.43), but the standard deviation is more complex. I simulated this in python, creating separate samples from the male and female population statistics, united them to a single large sample and pulled the sample moments. This new sample had a mean of 169.4384 and a standard deviation of 9.7497 (~9.75). The distribution was not really normal (actual distribution blue, idealized normal distribution orange), but it works well enough and is much easier to work with: |
The average male height is 176 so the average should be 169.44752 To calculate the stddev: import numpy as np
x_mean = 176.0
x_std = 7.4
x_sqr_mean = x_mean ** 2 + x_std ** 2
y_mean = 162.8
y_std = 7.0
y_sqr_mean = y_mean ** 2 + y_std ** 2
p_x = 0.5036
p_y = 0.4964
z_mean = x_mean * p_x + y_mean * p_y
z_sqr_mean = x_sqr_mean * p_x + y_sqr_mean * p_y
z_std = np.sqrt(z_sqr_mean - z_mean ** 2)
print(f'mean={z_mean}, std={z_std}') Outputs And to be honest, the source of the height data (http://www.biostat.jhsph.edu/bstcourse/bio751/papers/bimodalHeight.pdf) is a bit too old (published in 2002). Maybe we should find a more recent source. |
@Qrox Good points and good catch on my typo. I overthought and assumed that because we sample from two different distributions it should be complex, and forgot that we can just apply the E[X^2] - E[X]^2. On the data, I just checked CDC distributes statistics for the US population. The most recent data is for 2015-2018 and has 175.3 cm mean and standard deviation 13.55 for men (table 11, std dev = standard error * sqrt(N)) and 161.3 cm mean and 14.10 standard deviation for females. That would imply a mean of And a standard deviation of It is interesting that the standard deviations in this source are generally higher, not sure where this comes from. |
I think it's because the data cited by the 2002 paper are from the 20-29 age bracket, whereas the 2015-2018 data here are from all ages > 20. Which means we should use the 2015-2018 data because it is more recent and covers a more representative age range. |
Summary
Bugfixes "Use normal distribution to generate random height in new character screen"
Purpose of change
When you press [*] on the final tab of the new character screen, it generates random sex, name, height, blood type. For blood type, it uses the real distribution of blood types, but for generating height, it uses
rng()
with uniform distribution. So you have an equal chance of getting a male character who is 69 inches tall and one who is 58 inches tall, which is absurd.Describe the solution
Created function
Character::randomize_height()
that generates height with normal distribution.Describe alternatives you've considered
Probably it also should use appearance traits like skin tone to adjust the mean height.
Testing
Compiled the game, opened a new character screen, and pressed * a lot of times. The distribution looks realistic.
Additional context
It's not necessary, but a ton of very small and very big characters hits my perfectionist sensibilities hard.