This package provides a number of utility functions useful for manipulating large histograms. This includes methods to trim, subset, merge buckets, merge histograms, convert to CDF, and calculate information loss due to binning. It also provides a protocol buffer representations of the default R histogram class to allow histograms over large data sets to be computed and manipulated in a MapReduce environment.
You can either install from source via this repo, or install the CRAN package the usual way from R.
RProtoBuf & HistogramTools: Statistical Analysis Tools for Large Data Sets Google Open Source Blog, October 10, 2013
Murray Stokely
Apache 2.0