Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter NTS data to study area to avoid unrepresentative travel distances or mode share #16

Closed
Hussein-Mahfouz opened this issue Apr 18, 2024 · 1 comment · Fixed by #67
Assignees
Labels
enhancement New feature or request Task 1 creating activity chains

Comments

@Hussein-Mahfouz
Copy link
Collaborator

Travel distances vary across different areas. For example, we cannot assume that commute distances in London are the same as those in Cambridge.

Matching (#8) is done based on socioeconomic and demographic variables, but individuals/households in different parts of the country that share the same variables may exhibit different travel behaviour due to land use / transport options. If we don't filter, we may end up with travel distances in our study area that are too long, or a mode share that is not representative.

When carrying out matching for any area, we should filter the NTS survey data for that area. Initially I was doing this (see this function, but the sample became to small and matching at the household level notebook resulted in a low matching rate.

Possible workarounds:

  • Use filter_by_region() but include all regions that would have similar travel patterns to the study area
  • Use filter_by_region() to filter the NTS to the study area. Apply propensity score matching on the household level, as described in Improve matching approach #13
@Hussein-Mahfouz Hussein-Mahfouz added the enhancement New feature or request label Apr 18, 2024
@Hussein-Mahfouz Hussein-Mahfouz self-assigned this Apr 18, 2024
@Hussein-Mahfouz Hussein-Mahfouz added the Task 1 creating activity chains label Apr 19, 2024
@Hussein-Mahfouz
Copy link
Collaborator Author

@sgreenbury @BZ-BowenZhang and I discussed this today, and we should implement it given regional variations in travel time and mode share.

Steps:

  • Uncomment filter_by_region() here and here in script 2
  • Remove regions from here and add to config

Nice to have

  • jupyter notebook showing regional variations in mode share, travel times etc. This could inform user choice on which regions from the NTS to include in their model. If a user is applying the model to LEeds, they may decide to use Yorkshire and The Humber only, or include other regions that ave similar characteristics. The latter is useful so that the sample size is not too small
  • Include more detailed regional filtering: The PSUStatsReg_B01ID column int he NTS has the breakdown below. The metropolitan / non-metropolitan categorisation is useful for getting representative data (but will reduce our sample size)
	Value = -10.0	Label = DEAD
	Value = -9.0	Label = DNA
	Value = -8.0	Label = NA
	Value = 1.0	Label = Northern, Metropolitan
	Value = 2.0	Label = Northern, Non-metropolitan
	Value = 3.0	Label = Yorkshire / Humberside, Metropolitan
	Value = 4.0	Label = Yorkshire / Humberside, Non-metropolitan
	Value = 5.0	Label = East Midlands
	Value = 6.0	Label = East Anglia
	Value = 7.0	Label = South East (excluding London Boroughs)
	Value = 8.0	Label = London Boroughs
	Value = 9.0	Label = South West
	Value = 10.0	Label = West Midlands, Metropolitan
	Value = 11.0	Label = West Midlands, Non-metropolitan
	Value = 12.0	Label = North West, Metropolitan
	Value = 13.0	Label = North West, Non-metropolitan
	Value = 14.0	Label = Wales
	Value = 15.0	Label = Scotland

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Task 1 creating activity chains
Projects
None yet
1 participant