by Pilot Sampling and One-Step Updating
-
Spark >= 2.3.1
-
Python >= 3.7.0
pyarrow >= 0.15.0
Please read this Compatibility issue with Spark 2.3.x or 2.4.xstatsmodels >= 0.12.0
-
See
setup.py
for detailed requirements.
- Zip the code into a portable package (a zipped file
dqr.zip
will be placed into theprojects
folder)
make zip
- Run the project on a Spark platform
PYSPARK_PYTHON=/usr/local/bin/python3.7 \
spark-submit --py-files projects/dqr.zip \
projects/dqr_spark.py
You could also build the code into standard Python module and deploy to Spark clusters.
python setup.py bdist
-
Contributed by @edwardguo61
-
The required
R
version:3.5.1
-
Files:
dqr/Restimator.R
: one-shot estimation and one-step estimation for distributed quantile regressiondqr/R/simulator.R
: simulation functions to generate random/non-random datadqr/R/uilts.R
: other functions usedprojects/dqr_demo.R
: generate data, conduct estimation and generate plot. Please rundqr_demo.R
to see how to use the functions.
- Rui Pan, Tunan Ren, Baishan Guo, Feng Li, Guodong Li and Hansheng Wang (2021). A Note on Distributed Quantile Regression by Pilot Sampling and One-Step Updating, Journal of Business and Economic Statistics. (in press).