export_gtfs() #10
Replies: 2 comments 3 replies
-
Hi all. @dhersz , thank you again for such an excellent contribution! Here are my quick 2 cents:
|
Beta Was this translation helpful? Give feedback.
-
Changes introduced in last commit: The library(gtfsio)
gtfs_path <- system.file("extdata/ggl_gtfs.zip", package = "gtfsio")
gtfs <- import_gtfs(gtfs_path)
tmpf <- tempfile(fileext = ".zip")
gtfs$ola <- data.table::data.table(oi = 1:2, ola = 3:4)
export_gtfs(gtfs, tmpf)
zip::zip_list(tmpf)$filename
#> [1] "calendar_dates.txt" "fare_attributes.txt" "fare_rules.txt"
#> [4] "feed_info.txt" "frequencies.txt" "levels.txt"
#> [7] "pathways.txt" "routes.txt" "shapes.txt"
#> [10] "stop_times.txt" "stops.txt" "transfers.txt"
#> [13] "translations.txt" "trips.txt" "agency.txt"
#> [16] "attributions.txt" "calendar.txt" "ola.txt"
export_gtfs(gtfs, tmpf, standard_only = TRUE)
zip::zip_list(tmpf)$filename
#> [1] "calendar_dates.txt" "fare_attributes.txt" "fare_rules.txt"
#> [4] "feed_info.txt" "frequencies.txt" "levels.txt"
#> [7] "pathways.txt" "routes.txt" "shapes.txt"
#> [10] "stop_times.txt" "stops.txt" "transfers.txt"
#> [13] "translations.txt" "trips.txt" "agency.txt"
#> [16] "attributions.txt" "calendar.txt" Note that this argument affect both extra files and extra fields in required/optional files: gtfs$levels
#> level_id level_index level_name elevation
#> 1: L0 0 Street 0
#> 2: L1 -1 Mezzanine -6
#> 3: L2 -2 Southbound -18
#> 4: L3 -3 Northbound -24
export_gtfs(gtfs, tmpf, files = "levels", standard_only = TRUE)
import_gtfs(tmpf)
#> $levels
#> level_id level_index level_name
#> 1: L0 0 Street
#> 2: L1 -1 Mezzanine
#> 3: L2 -2 Southbound
#> 4: L3 -3 Northbound An error is thrown if an extra file is specified in export_gtfs(gtfs, tmpf, files = c("levels", "ola"), standard_only = TRUE)
#> Error in export_gtfs(gtfs, tmpf, files = c("levels", "ola"), standard_only = TRUE): Non-standard file specified in 'files', even though 'standard_only' is set to TRUE: 'ola' The export_gtfs(gtfs, tmpf, compression_level = 1)
file.size(tmpf)
#> [1] 4130
export_gtfs(gtfs, tmpf)
file.size(tmpf)
#> [1] 4080 The tmpf <- tempfile(fileext = ".zip")
tmpf
#> [1] "/tmp/Rtmp5NnqB2/file523c1461af29.zip"
export_gtfs(gtfs, tmpf, as_dir = TRUE)
dir.exists(tmpf)
#> [1] TRUE
tmpf <- tempfile()
tmpf
#> [1] "/tmp/Rtmp5NnqB2/file523c38ddd388"
export_gtfs(gtfs, tmpf, as_dir = TRUE)
dir.exists(tmpf)
#> [1] TRUE The function, thus, doesn't try to guess if you refer to a directory from the export_gtfs(gtfs, tmpf)
#> Error in export_gtfs(gtfs, tmpf): 'path' must have '.zip' extension. If you meant to create a directory please set 'as_dir' to TRUE.
tmpf <- tempfile(fileext = ".zip")
export_gtfs(gtfs, tmpf)
dir.exists(tmpf)
#> [1] FALSE |
Beta Was this translation helpful? Give feedback.
-
Hello folks,
As of 6185db1 a first version
export_gtfs()
is up and running. Here is how it works right now. Much improvement is surely yet to come.Basic usage
Just input an GTFS object and the path where it should be written to. By default it writes every element inside it:
But you can control which files are written to disk with the
files
argument:If an element named
.
is present, which is used by{tidytransit}
to hold "auxiliary" tables, it is not exported.You can use the
overwrite
argument to control whether existing files should be overwritten or not:And trying to export an element that doesn't exist results in an error:
Notes
I see that
tidytransit::write_gtfs()
has a few other parameters not included in the function (compression_level
andas_dir
).I assumed that using the most strict compression would be desirable, but I'm happy to change it if you think otherwise (I'm not sure how much it affects performance, to be honest). Regarding
as_dir
, if I read the code correctly it creates a directory instead of a.zip
file, right? I'm not sure if I like it, but I'd like to hear your opinion on it.Basic behaviour and handling auxiliary columns
No conversions are made inside
export_gtfs()
(i.e. the function expects a GTFS that follows the standards). I figured out that, since each one of our packages might handle some columns differently, especially those that are date and time related, it would be better to leave any conversions to be made outside the function. The workflow I'm thinking of could be very roughly translated to something like:Approaching the problem with the workflow above solves the problem of making sure that fields are correctly formatted in the final
.zip
file. But one issue remains unsolved:Right now
export_gtfs
deals with auxiliary tables exactly like{tidytransit}
does it. Elements inside the.
sub-list are not written to disk. So any data frames located outside.
are still exported, even if they are not specified in the official reference. This is very useful when dealing with non-standard GTFS (I've never used them much, but a few extensions are built on top of the official GTFS format).The problem arises when we're dealing with auxiliary columns (e.g.
arrival_time_hms
anddeparture_time_hms
created bytidytransit::set_hms_times()
instop_times
). These columns should not be exported, but how do we differentiate them from extra columns that must be? @polettif suggested in an earlier discussion using a naming convention here, and I think it's a great idea. We could perhaps use a prefix (e.g.aux
, resulting in column names such asaux_arrival_time_hms
) that would signal that a column is auxiliary and thus should not be exported.Another possible solution would be to create an argument to specify fields that should not be exported (e.g. something like
no_export = list(stop_times = c("arrival_time_hms", "departure_time_hms"))
.I prefer the naming convention. In my opinion it makes for a simpler way of specifying which columns should be written, both to final users and to developers, but I'd like to hear your thoughts on it, as usual.
Cheers!
Beta Was this translation helpful? Give feedback.
All reactions