© 2015-2017 by Mekki MacAulay, [email protected], LinkedIn, Twitter Some rights reserved.
This program is free and open source software. The author licenses it to you under the terms of the GNU General Public License (GPL), as published by the Free Software Foundation, either version 3, or (at your option) any later version (GPLv3+).
There is NO WARRANTY for this software, express or implied, including the implied warranties of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
For the full text of the GNU General Public License, please visit http://gnu.org/licenses/ Should you require an alternative licensing arrangement for this software, please contact the author.
source('<NAME OF THIS FILE>.r', echo=TRUE, max.deparse.length=100000, keep.source=TRUE, verbose=TRUE);
These source()
parameters ensure that the R shell outputs the script commands and responses. Otherwise, they're hidden by default.
Or, from the command prompt directly as follows (assuming R binary is in the PATH
environment variable):
is necessary because by default R output writes to file, not command prompt
- An updated (>3.3.2)
installation with the appropriate packages - A
(tested on 5.5.xx) installation containing the Mozilla Bugzilla database to analyze - A
installation (tested on 5.6.14) - A
PHP Composer
installation - A
PHP utility script
for domain name parsing - A
installation (tested on ActivePerl - A
The following sections describe the process for installing these necesities.
Visit: https://dev.mysql.com/downloads/windows/installer/5.5.html
Download the MySQL MSI installer for Windows (Version 5.5.xx will do just fine - Later versions have the annoying Oracle installer that makes things more complicated)
Run the installer as administrator and complete the MySQL install with default settings (or minor tweaks if you wish)
During install, set the default username to "root" and password to "password" in the configuration. Default host will be "localhost" and default port will be "3306"
Test connection to ensure it is working with the
MySQL Workbench
client that is also installed with the package -
Once connected with the MySQL Workbench client, under the menu "Server", select "Options File"
Click on the "Networking" tab
Check the box for "max_allowed_packet" and set its value to "1G" or a suitable large number
Click "Apply" and then restart the server. (In the Navigator pane, click on "Startup / Shutdown" then click "Stop Server" in the main window, followed by "Start Server)
Add the mysql/bin folder to the PATH system environment. Its default location is C:\Program Files\MySQL\MySQL Server 5.5\bin, but could vary depending on your install parameters
Decompress/untar the Bugzilla database. The result is a MySQL-formatted dumpfile with a .sql extension. I'll assume that the file name is "bugzilla.sql". If it's not, change the name in the commands below. NOTE: This dumpfile only works with MySQl databases. It cannot be restored to other databases such as SQLite, PostGRESQL, MSSQL, etc. It is also sufficiently complex that scripts cannot readily be used to convert it to a dumpfile of a different format
Open the standard command prompt as administrator and type the following 3 commands, hitting enter after each one:
mysql -uroot -ppassword --execute="DROP DATABASE `bugs`;"
mysql -uroot -ppassword --execute="CREATE DATABASE `bugs`;"
mysql -uroot -ppassword bugs < bugzilla.sql
The last command will execute for several minutes as it populates the database with the dumpfile data. The result will be a database named "bugs" on the MySQL server, filled with the Bugzilla data
INSTALL AND CONFIGURE R (Statistical package) or Microsoft R Open (MRO - From Microsoft, formerly Revolution Analytics)
Visit: http://cran.utstat.utoronto.ca/bin/windows/base/ or another mirror
Download the installer for the latest version for Windows (32/64 bit)
Alternatively, visit: http://mran.revolutionanalytics.com/download/#download
Download the installer for the latest version of Microsoft R Open, MRO, an alternative R distribution primarily developed by Revolution Analytics (now owned by Microsoft) (http://revolutionanalytics.com/), which is also open source. Revolution Analytics maintains a Managed R Archive Network (MRAN) that mirrors the base CRAN with optimizations.
NOTE: This script might execute faster with MRO vs. base R, especially when using multiple cores
NOTE: You are encouraged to use an R or MRO version of at least 3.3.2 as versions 3.1.3 and earlier execute this script significantly slower (~45% speed decrease), likely due to different memory heap management discussed here: http://cran.r-project.org/src/base/NEWS
Run the installer (either one) as administrator and complete the R install with default settings (or minor tweaks if you wish)
Create a shortcut to R x64 X.X.X on the desktop (or suitable place - the installers offer to create one for you)
Right-click on the shortcut and choose "Properties"
Change the "Start in:" field to the location of this script file
That will ensure that R can find this script when executed from within the R shell
Install additional packages from the package manager including at least the following:
- betareg
- bit64
- car
- chron
- curl
- data.table
- devtools
- dplyr
- dtplyr
- DT
- e1071
- FactoMineR
- ggplot2
- graphics
- gWidgets
- gWidgetsRGtk2
- highr
- itertools
- iterators
- koRpus (With caps: "koRpus") -> Development is moving quickly, so might be best to use Dev version: install.packages("koRpus", repo="http://R.reaktanz.de") which depends on package:devtools, so install that first
- lme4
- lmtest
- longitudinalData
- lubridate
- moments
- nnet
- nortest
- doParallel (And its many Windows dependencies including "foreach", "snow", and "parallel"
- piecewiseSEM
- plyr
- pscl
- qpcR
- Rcmdr (and its many plugins)
- RCurl
- rggobi
- RGraphics
- RGtk2
- RGtk2Extras
- RLRsim
- RODBCext
- sqldf
- sqlutils
- stargazer
- textcat
- tidyr
- timeDate
- tm (for text mining)
- xkcd
- xlsx
- zipcode
and all of the recursive dependencies of these listed packages (should do it automatically for you)
NOTE: This might take a while...
install.packages(c("bit64", "curl", "data.table", "devtools", "dplyr", "ggplot2", "RGtk2", "RMySQL", "stargazer", "textcat", "tidyr", "xlsx", "doParallel", "itertools", "iterators", "RCurl", "sqlutils", "timeDate", "tm", "chron", "knitr", "xtable"));
Download the latest zip installer package from http://windows.php.net/download/ NOTE: This version uses PHP 5.6.14, which was current release version at the time. I chose x64 Thread Safe, but the other ones should work too.
Extract the contents of the zip file to whatever directory you want to keep PHP in. I chose "C:\php"
Create a new file called "php.ini" in the PHP directory
Edit the php.ini file to add the following lines:
memory_limit = 2048M
Open the ext director in the PHP directory
Copy php_curl.dll, php_openssl.dll, php_intl.dll, and php_mbstring.dll to the root PHP directory
Change the system PATH environment variable to include "C:\php" or whatever directory you chose for the PHP install
Download the latest version of PHP Composer from: https://getcomposer.org/Composer-Setup.exe
Run the installer with default settings
Follow the instructions here to install "PHP Domain Parser" using Composer: https://github.com/jeremykendall/php-domain-parser/blob/develop/README.md
NOTE: The small php program "domainparser.php" is provided and uses the installed domain parser library
Download the latest Tree Tager version from http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
If using Windows, see the specific details for Windows further down the page
Download the english parameter file from here: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/english-par-linux-3.2-utf8.bin.gz (or as listed elsewhere on the page above)
If desired, download other language parameter files (not described in this script)
Unzip the downloaded package, and VERY CAREFULLY follow ALL the instructions in the INSTALL.txt file contained within. NOTE: This script assumes you install Tree Tagger to its default of c:/TreeTagger. If not, adjust accordingly in the user-defined parameters below NOTE: Pay particular attention to the PATH environment variable setting so that the R script can find the Tree Tagger script files
Go to the treetagger/lib directory and copy the english-utf8.par file to overwrite english.par (backup the latter first if you wish)
Go to the treetagger/cmd directory and copy the utf8-tokenize.perl file to overwrite tokenize.pl (backup the latter first if you wish)
- Download the latest version of Active Perl 64-bit here: http://www.activestate.com/activeperl/downloads
- Run the installer as administrator. Default values should be fine.
- Reboot the system
- Verify that the PERL executable shows up in the system PATH