You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It occurred to me that there's some code in mapping.jl that's no one uses, is undocumented, probably buggy, but potentially an interesting experiment I had started working on a while back. This issue is to solicit feedback on whether it should be completed (add tests an docs and such), removed, or just left as a secret feature.
Background: long-form versus wide-form
Data with some corresponding factors can be stored either by separating values into rows and columns by factor, or keeping all the values in one column and adding another columns to store the corresponding factors. This is sometimes called "wide-form" and "long-form", respectively. Long-form tends to be how things are stored in databases.
Here's some temperature data from last week in Seattle to illustrate.
long-form
StationName
DateTime
AirTemperature
"AuroraBridge"
2017-05-08T00:00:00
50.37
"MagnoliaBridge"
2017-05-08T00:00:00
51.57
"NE45StViaduct"
2017-05-08T00:00:00
51.2
"AlaskanWayViaduct_KingSt"
2017-05-08T00:00:00
57.87
"AlbroPlaceAirportWay"
2017-05-08T00:00:00
51.82
"HarborAveUpperNorthBridge"
2017-05-08T00:00:00
50.37
wide-form
DateTime
35thAveSW_SWMyrtleSt
AlaskanWayViaduct_KingSt
AlbroPlaceAirportWay
AuroraBridge
HarborAveUpperNorthBridge
JoseRizalBridgeNorth
MagnoliaBridge
NE45StViaduct
RooseveltWay_NE80thSt
SpokaneSwingBridge
2017-05-08T00:00:00
59.27
57.87
51.82
50.37
50.37
52.31
51.57
51.2
55.91
67.2
2017-05-08T00:01:00
59.26
57.85
51.92
50.4
50.37
52.27
51.54
51.21
55.89
67.17
2017-05-08T00:02:00
59.24
57.84
51.82
50.35
50.36
52.24
51.51
51.23
55.88
67.16
2017-05-08T00:03:00
59.24
57.84
51.74
50.27
50.36
52.18
51.47
51.23
55.85
67.15
2017-05-08T00:04:00
59.23
57.83
51.66
50.22
50.39
52.15
51.47
51.25
55.83
67.11
2017-05-08T00:05:00
59.21
57.8
51.53
50.17
50.41
52.09
51.48
51.24
55.82
67.11
Gadfly is a library for plotting long-form data. If you want to plot wide-form data, you pretty much have to transform it to long-form (wide-form can always be transformed to long-form, the opposite transformation is not necessarily possible without inserting NAs).
The people who intensely dislike this style of plotting tend to be people who work with a lot of wide-form data. I don't think we should try to be all things to all people, but it would be nice to have a better answer to the inconvenience of plotting wide-form data, as long as it doesn't compromise the elegance of plotting long-form data.
Plotting implicit long-form data
Towards this goal, I implemented an experiment to allow plotting of an implicitly transformed version of the data. It introduces two functions Col.value and Col.index, which allow you to use the standard plotting interface but treat a group of columns and their corresponding names as a long-form factor.
To demonstrate:
# plotting the long-formplot(weather_long, x=:DateTime, y=:AirTemperature, color=:StationName, Geom.line)
has a essentially equivalent call for the wide-form
Col.index uses the columns names as a factor, while Col.value uses the column values in an implicit long-form transformation of the data. Without parameters they use every column in the data.
This also makes plotting matrices much shorter.
M =convert(Matrix, weather_long[:,2:end])
plot(M, y=Col.value, color=Col.index, Scale.color_discrete, Geom.line)
That's the basic idea. Is this feature worth including and officially supporting? If so, I can write docs and tests for it.
The text was updated successfully, but these errors were encountered:
It occurred to me that there's some code in mapping.jl that's no one uses, is undocumented, probably buggy, but potentially an interesting experiment I had started working on a while back. This issue is to solicit feedback on whether it should be completed (add tests an docs and such), removed, or just left as a secret feature.
Background: long-form versus wide-form
Data with some corresponding factors can be stored either by separating values into rows and columns by factor, or keeping all the values in one column and adding another columns to store the corresponding factors. This is sometimes called "wide-form" and "long-form", respectively. Long-form tends to be how things are stored in databases.
Here's some temperature data from last week in Seattle to illustrate.
long-form
wide-form
Gadfly is a library for plotting long-form data. If you want to plot wide-form data, you pretty much have to transform it to long-form (wide-form can always be transformed to long-form, the opposite transformation is not necessarily possible without inserting NAs).
The people who intensely dislike this style of plotting tend to be people who work with a lot of wide-form data. I don't think we should try to be all things to all people, but it would be nice to have a better answer to the inconvenience of plotting wide-form data, as long as it doesn't compromise the elegance of plotting long-form data.
Plotting implicit long-form data
Towards this goal, I implemented an experiment to allow plotting of an implicitly transformed version of the data. It introduces two functions
Col.value
andCol.index
, which allow you to use the standard plotting interface but treat a group of columns and their corresponding names as a long-form factor.To demonstrate:
has a essentially equivalent call for the wide-form
Col.index
uses the columns names as a factor, whileCol.value
uses the column values in an implicit long-form transformation of the data. Without parameters they use every column in the data.This also makes plotting matrices much shorter.
That's the basic idea. Is this feature worth including and officially supporting? If so, I can write docs and tests for it.
The text was updated successfully, but these errors were encountered: