Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup skills-airflow and skills-api #173

Open
matthaeusheer opened this issue Mar 28, 2019 · 1 comment
Open

Setup skills-airflow and skills-api #173

matthaeusheer opened this issue Mar 28, 2019 · 1 comment

Comments

@matthaeusheer
Copy link

Hello everyone!

The Problem

First of all thanks for the fantasting work. I would like to be able to get skills-airflow and the skills-api up and running. However the instructions provided seem to be not enough for me to make them run. Maybe we can clarify things and together improve the documentation as well.

What I did so far for skills-airflow

  1. set up the virtual environment in skills-airflow repo (python 3.6.0) and pip installed requirements.txt and requirements_dev.txt
  2. installed postresql and createt a data base I called daw_db
  3. updated config/api_v1_db_config.yaml to
PGPORT: 5432
PGHOST: localhost
PGDATABASE: daw_db
PGUSER: daw_db
PGPASSWORD:
  1. the alembic upgrade head command fails for me, not sure whether it's important?

  2. set up the following s3 buckets, right now empty

my-geo-bucket
my-job-postings 
my-labeled-postings
my-model-cache 
my-onet
my-output-tables
  1. I copied example_config.yaml to config.yaml
  2. running the airflow scheduler which gives the following output
[2019-03-28 12:01:06,790] {__init__.py:51} INFO - Using executor SequentialExecutor
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/

[2019-03-28 12:01:07,490] {jobs.py:1477} INFO - Starting the scheduler
[2019-03-28 12:01:07,490] {jobs.py:1485} INFO - Running execute loop for -1 seconds
[2019-03-28 12:01:07,491] {jobs.py:1486} INFO - Processing each file at most -1 times
[2019-03-28 12:01:07,491] {jobs.py:1489} INFO - Searching for files in /Users/matthausheer/airflow/dags
[2019-03-28 12:01:07,504] {jobs.py:1491} INFO - There are 19 files in /Users/matthausheer/airflow/dags
[2019-03-28 12:01:07,506] {jobs.py:1534} INFO - Resetting orphaned tasks for active dag runs
[2019-03-28 12:01:07,517] {dag_processing.py:453} INFO - Launched DagFileProcessorManager with pid: 34311
[2019-03-28 12:01:07,536] {settings.py:51} INFO - Configured default timezone <Timezone [UTC]>
[2019-03-28 12:01:07,568] {dag_processing.py:663} ERROR - Cannot use more than 1 thread when using sqlite. Setting parallelism to 1
[2019-03-28 12:01:08,002] {jobs.py:1559} INFO - Harvesting DAG parsing results
[2019-03-28 12:01:09,630] {jobs.py:1559} INFO - Harvesting DAG parsing results
...

What I did so far for skills-api

  1. setup virtual env (python 2.7.11) in skills-api repo and installed requirements.txt
  2. run bin/make_config.sh specifying postgresql://localhost/daw_db
  3. python server.py runserver which gives starts a server running on http://127.0.0.1:5000/v1/jobs

I get the error that
ProgrammingError: (psycopg2.ProgrammingError) relation "jobs_alternate_titles" does not exist
LINE 3: FROM jobs_alternate_titles) AS anon_1

The Question

  1. What exactly do I have to place into the s3 buckets and in which format and or naming conventions?
  2. Did I miss anything else?

Some help would be greatly appreciated!
Cheers

@philipwhitt
Copy link

philipwhitt commented Apr 8, 2020

A docker-compose file would be hugely beneficial. Plus having this project work on Python 3.5+. @rayidghani @thcrock thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants