Use RVM to install the version of Ruby specified in .ruby-version.
Use Bundler to install the required gems:
$ gem install bundler
$ bundle install
The required services (Redis, MySQL, etc.) can all be installed and run using Docker. If you prefer to install the services without Docker, see the wiki. We recommend setting the max memory alloted to Docker to 4GB (in Docker Desktop, Preferences > Resources > Advanced). See the wiki for more documentation on basic Docker commands.
All the required services below can be run using Docker Compose:
$ docker-compose up
Alternatively, run the services individually, i.e.:
$ docker-compose up redis
- Elasticsearch 6.8 - for full-text search and query analytics
We have configured Elasticsearch 6.8 to run on port 9268, and Elasticsearch 7.8 to run on 9278. (Currently, only 6.8 is used in production, but some tests run against both versions.) To check Elasticsearch settings and directory locations:
$ curl "localhost:9268/_nodes/settings?pretty=true"
$ curl "localhost:9278/_nodes/settings?pretty=true"
Some specs depend upon Elasticsearch having a valid trial license. A 30-day trial license is automatically applied when the cluster is initially created. If your license expires, you can rebuild the cluster by rebuilding the container and its data volume.
-
Kibana - Kibana is not required, but can be very useful for debugging Elasticsearch. Confirm Kibana is available for the Elasticsearch 6.8 cluster by visiting http://localhost:5668. Kibana for the Elasticsearch 7 cluster should be available on http://localhost:5678.
-
MySQL 5.6 - database, accessible from user 'root' with no password
-
Redis 5.0 - We're using the Redis key-value store for caching, queue workflow via Resque, and some analytics.
-
Tika - for extracting plain text from PDFs, etc. The Tika REST server runs on http://localhost:9998/.
We recommend using Homebrew for local package installation on a Mac.
Use the package manager of your choice to install the following packages:
- C++ compiler - required by the cld3 gem, which we use for language detection
- Google's protocol buffers - also required by the cld gem
- Java Runtime Environment
- PhantomJS - required to run JavaScript in Cucumber features
- ImageMagick - required by the Paperclip gem, used for image attachments
Example of installation on Mac using Homebrew:
$ brew install gcc
$ brew install protobuf
$ brew install java
$ brew install imagemagick
$ brew cask install phantomjs
Example of installation on Linux:
$ apt-get install protobuf-compiler
$ apt-get install libprotobuf-dev
$ apt-get install imagemagick
$ apt-get install default-jre
The app does its best to avoid interacting with most remote services during the test phase through heavy use of the VCR gem.
You should be able to simply run this command to get a valid secrets.yml
file that will work for running existing specs:
$ cp config/secrets.yml.dev config/secrets.yml
If you find that you need to run specs that interact with a remote service, you'll need to put valid credentials into your secrets.yml
file.
Anything listed in the secret_keys
entry of that file will automatically be masked by VCR in newly-recorded cassettes.
Create and set up your development and test databases:
$ rails db:setup
$ rails db:test:prepare
A few tips when working with asset pipeline:
-
Ensure that your asset directory is in the asset paths by running the following in the console:
Rails.application.assets.paths
-
Find out which file is served for a given asset path by running the following in the console:
Rails.application.assets['relative_path/to_asset.ext']
You can create the USASearch-related indexes like this:
$ rake usasearch:elasticsearch:create_indexes
You can index all the records from ActiveRecord-backed indexes like this:
$ rake usasearch:elasticsearch:index_all[FeaturedCollection+BoostedContent]
If you want it to run in parallel using Resque workers, call it like this:
$ rake usasearch:elasticsearch:resque_index_all[FeaturedCollection+BoostedContent]
Note that indexing everything uses whatever index/mapping/setting is in place. If you need to change the Elasticsearch schema first, do this:
$ rake usasearch:elasticsearch:recreate_index[FeaturedCollection]
If you are changing a schema and want to migrate the index without having it be unavailable, do this:
$ rake usasearch:elasticsearch:migrate[FeaturedCollection]
Same thing, but using Resque to index in parallel:
$ rake usasearch:elasticsearch:resque_migrate[FeaturedCollection]
Make sure the unit tests, functional and integration tests run:
# Run the RSpec tests
$ rspec spec/
# Run the Cucumber integration tests
$ cucumber features/
We require 100% code coverage. After running the tests (both RSpec & Cucumber), open coverage/index.html
in your favorite browser to view the report. You can click around on the files that have < 100% coverage to see what lines weren't exercised.
We use CircleCI for continuous integration. Build artifacts, such as logs, are available in the 'Artifacts' tab of each CircleCI build.
We use Rubocop for static code analysis. Settings specific to search-gov are configured via .rubocop.yml. Settings that can be shared among all Search.gov repos should be configured via the searchgov_style gem.
Fire up a server and try it all out:
$ rails server
Visit http://127.0.0.1:3000
To run test searches, you will need a working Bing API key. You can request one from Bing, or ask a friendly coworker. Add the key to config/secrets.yml
Login.gov is used for authentication.
To create a new local admin account we will need to:
- Create an account on Login's sandbox environment.
- Get the Login sandbox private key from a team member.
- Add an admin user to your local app.
Create an account on Login's sandbox environment. This will need to be a valid email address that you can get emails at. You'll receive a validation email to set a password and secondary authentication method.
Ask your team members for the current config/logindotgov.pem
file. This private key will let your local app complete the handshake with the Login sandbox servers.
Open the rails console, add a new user with the matching email.
u = User.where(email: '[email protected]').first_or_initialize
u.assign_attributes( contact_name: 'admin',
first_name: 'search',
last_name: 'admin',
default_affiliate: Affiliate.find_by_name('usagov'),
is_affiliate: true,
organization_name: 'GSA',
)
u.approval_status = 'approved'
u.is_affiliate_admin = true
u.save!
You should now be able to login to your local instance of search.gov.
Your user account should have admin privileges set. Now go here and poke around.
Several long-running tasks have been moved to the background for processing via Resque.
-
If you haven't already, run
docker-compose up
to start all the services. -
Launch the Sinatra app to see the queues and jobs:
$ resque-web ./lib/setup_resque.rb
-
Visit the resque-web sinatra app at http://0.0.0.0:5678/overview to inspect queues, workers, etc.
-
In your admin center, create a type-ahead suggestion (SAYT) "delete me". Now create a SAYT filter on the word "delete".
-
Look in the Resque web queue to see the job enqueued.
-
Start a Resque worker to run the job:
$ QUEUE=* rake environment resque:work
-
You should see log lines indicating that a Resque worker has processed a
ApplySaytFilters
job:
resque-workers_1 | *** Running before_fork hooks with [(Job{primary_low} | ApplySaytFilters | [])]
At this point, you should see the queue empty in Resque web, and the suggestion "delete me" should be gone from the sayt_suggestions table.
Each Resque job runs in the context of a queue named 'primary' with priorities assigned at job creation time using the resque-priority Gem. We have queues named :primary_low, :primary, and :primary_high. When creating a new background job model, consider the priorities of the existing jobs to determine where your jobs should go. Things like fetching and indexing all Odie documents will take days, and should run as low priority. But fetching and indexing a single URL uploaded by an affiliate should be high priority. When in doubt, just use Resque.enqueue() instead of Resque.enqueue_with_priority() to put it on the normal priority queue.
(Note: newer jobs inherit from ActiveJob, using the resque queue adapter. We are in the process of migrating the older jobs to ActiveJob.)
We use the resque-scheduler gem to schedule delayed jobs. Use ActiveJob's :wait
or :wait_until
options to enqueue delayed jobs, or schedule them in config/resque_schedule.yml
.
Example:
-
In the Rails console, schedule a delayed job:
> SitemapMonitorJob.set(wait: 5.minutes).perform_later
-
Run the resque-scheduler rake task:
$ rake resque-scheduler
-
Check the 'Delayed' tab in Resque web to see your job.
We use New Relic to monitor our site performance, especially on search requests. If you are doing something around search, make sure you aren't introducing anything to make it much slower. If you can, make it faster.
You can configure your local app to send metrics to New Relic.
-
Edit
config/secrets.yml
changingenabled
to true and adding your name toapp_name
in thenewrelic
section -
Edit
config/secrets.yml
and setlicense_key
to your New Relic license key in thenewrelic_secrets
section -
Run mongrel/thin
-
Run a few representative SERPs with news items, gov boxes, etc
-
The database calls view was the most useful one for me. How many extra database calls did your feature introduce? Yes, they are fast, but at 10-50 searches per second, it adds up.
You can also turn on profiling and look into that (see https://newrelic.com/docs/general/profiling-ruby-applications).