Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan for plotting wide-form data #1003

Closed
dcjones opened this issue May 17, 2017 · 4 comments
Closed

Plan for plotting wide-form data #1003

dcjones opened this issue May 17, 2017 · 4 comments

Comments

@dcjones
Copy link
Collaborator

dcjones commented May 17, 2017

It occurred to me that there's some code in mapping.jl that's no one uses, is undocumented, probably buggy, but potentially an interesting experiment I had started working on a while back. This issue is to solicit feedback on whether it should be completed (add tests an docs and such), removed, or just left as a secret feature.

Background: long-form versus wide-form

Data with some corresponding factors can be stored either by separating values into rows and columns by factor, or keeping all the values in one column and adding another columns to store the corresponding factors. This is sometimes called "wide-form" and "long-form", respectively. Long-form tends to be how things are stored in databases.

Here's some temperature data from last week in Seattle to illustrate.

long-form

StationName DateTime AirTemperature
"AuroraBridge" 2017-05-08T00:00:00 50.37
"MagnoliaBridge" 2017-05-08T00:00:00 51.57
"NE45StViaduct" 2017-05-08T00:00:00 51.2
"AlaskanWayViaduct_KingSt" 2017-05-08T00:00:00 57.87
"AlbroPlaceAirportWay" 2017-05-08T00:00:00 51.82
"HarborAveUpperNorthBridge" 2017-05-08T00:00:00 50.37

wide-form

DateTime 35thAveSW_SWMyrtleSt AlaskanWayViaduct_KingSt AlbroPlaceAirportWay AuroraBridge HarborAveUpperNorthBridge JoseRizalBridgeNorth MagnoliaBridge NE45StViaduct RooseveltWay_NE80thSt SpokaneSwingBridge
2017-05-08T00:00:00 59.27 57.87 51.82 50.37 50.37 52.31 51.57 51.2 55.91 67.2
2017-05-08T00:01:00 59.26 57.85 51.92 50.4 50.37 52.27 51.54 51.21 55.89 67.17
2017-05-08T00:02:00 59.24 57.84 51.82 50.35 50.36 52.24 51.51 51.23 55.88 67.16
2017-05-08T00:03:00 59.24 57.84 51.74 50.27 50.36 52.18 51.47 51.23 55.85 67.15
2017-05-08T00:04:00 59.23 57.83 51.66 50.22 50.39 52.15 51.47 51.25 55.83 67.11
2017-05-08T00:05:00 59.21 57.8 51.53 50.17 50.41 52.09 51.48 51.24 55.82 67.11

Gadfly is a library for plotting long-form data. If you want to plot wide-form data, you pretty much have to transform it to long-form (wide-form can always be transformed to long-form, the opposite transformation is not necessarily possible without inserting NAs).

The people who intensely dislike this style of plotting tend to be people who work with a lot of wide-form data. I don't think we should try to be all things to all people, but it would be nice to have a better answer to the inconvenience of plotting wide-form data, as long as it doesn't compromise the elegance of plotting long-form data.

Plotting implicit long-form data

Towards this goal, I implemented an experiment to allow plotting of an implicitly transformed version of the data. It introduces two functions Col.value and Col.index, which allow you to use the standard plotting interface but treat a group of columns and their corresponding names as a long-form factor.

To demonstrate:

# plotting the long-form
plot(weather_long, x=:DateTime, y=:AirTemperature, color=:StationName, Geom.line)

has a essentially equivalent call for the wide-form

# plotting the corresponding wide-form
value_columns = names(weather_long)[2:end]
plot(weather_wide, x=Col.value(:DateTime),
     y=Col.value(value_columns...), color=Col.index(value_columns...), Geom.line)

wide

Col.index uses the columns names as a factor, while Col.value uses the column values in an implicit long-form transformation of the data. Without parameters they use every column in the data.

This also makes plotting matrices much shorter.

M = convert(Matrix, weather_long[:,2:end])
plot(M, y=Col.value, color=Col.index, Scale.color_discrete, Geom.line)

mat

That's the basic idea. Is this feature worth including and officially supporting? If so, I can write docs and tests for it.

@dcjones
Copy link
Collaborator Author

dcjones commented May 17, 2017

Related issues: #327, #563, #526, #529 (probably others, this comes up a lot)

@bjarthur
Copy link
Member

that'd be awesome!

@tlnagy
Copy link
Member

tlnagy commented May 23, 2017

Great to see you around here again @dcjones!

@bjarthur
Copy link
Member

closed by #1013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants