Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev to GraphQL-API-Experiment #2529

Merged
merged 31 commits into from
Sep 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
06f242d
Alembic upgrade not working right ...
sgoggins Sep 2, 2023
feb9a3d
fixing another materialized view
sgoggins Sep 2, 2023
7ee216e
updating augur_new_contributors
sgoggins Sep 2, 2023
b8c16ef
trying to get everything to execute in sequence.
sgoggins Sep 2, 2023
71bc747
setting up commits to make it execute sequentially.
sgoggins Sep 2, 2023
2736b2e
fix
sgoggins Sep 2, 2023
b9b936c
updating alembic message to alert admin that this upgrade takes a lit…
sgoggins Sep 2, 2023
d6809e7
update indexes
sgoggins Sep 2, 2023
dd0d700
index update
sgoggins Sep 2, 2023
bc0b209
checking new indices
sgoggins Sep 2, 2023
02d3e4b
making materialized view refreshes not wholly dependent on each other.
sgoggins Sep 2, 2023
eea24cc
logger update
sgoggins Sep 2, 2023
76e9210
logger already created
sgoggins Sep 2, 2023
fd9f8ca
unsure what logger to get.
sgoggins Sep 2, 2023
909e097
updated explorer new contributors view. Rare duplicates on large scal…
sgoggins Sep 2, 2023
630854b
correct ordering of view
sgoggins Sep 2, 2023
e9496d7
fixing double drop
sgoggins Sep 2, 2023
eb81e6f
separate drop from create transaction
sgoggins Sep 2, 2023
f9c6e01
materialized view fixes
sgoggins Sep 2, 2023
e7363f8
making contributor breadth worker get contributors without breadth da…
sgoggins Sep 5, 2023
d6c665e
Merge pull request #2512 from chaoss/dev-alembic-fix-123
sgoggins Sep 5, 2023
034e4ed
version update
sgoggins Sep 5, 2023
ee7bd71
version update in dev
sgoggins Sep 5, 2023
e51ded1
documentation update
sgoggins Sep 5, 2023
87bfc46
Merge pull request #2516 from chaoss/dev-docsupdate-osx-aa
sgoggins Sep 5, 2023
611afe0
docs update
sgoggins Sep 5, 2023
5284b91
Merge pull request #2519 from chaoss/dev-docsupdate-osx-aa
sgoggins Sep 5, 2023
d967620
fixing materialized view refresh bug
sgoggins Sep 7, 2023
a84fef7
fixing logging in materialized view refresh.
sgoggins Sep 8, 2023
874bd5d
Fix pr reviews issue
ABrain7710 Sep 14, 2023
b42bd73
Merge pull request #2526 from chaoss/fix-pr-reviews
sgoggins Sep 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,12 @@ env.txt
docker_env.txt
pyenv.txt
augur_export_env.sh
.DS_Store
*DS_Store
*.config.json
!docker.config.json
config.yml
reports.yml


node_modules/
.idea/
logs/
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Augur NEW Release v0.52.0
# Augur NEW Release v0.53.1

[![first-timers-only](https://img.shields.io/badge/first--timers--only-friendly-blue.svg?style=flat-square)](https://www.firsttimersonly.com/) We follow the [First Timers Only](https://www.firsttimersonly.com/) philosophy of tagging issues for first timers only, and walking one newcomer through the resolution process weekly. [You can find these issues tagged with "first timers only" on our issues list.](https://github.com/chaoss/augur/labels/first-timers-only).

Expand All @@ -8,7 +8,7 @@
### [If you want to jump right in, updated docker build/compose and bare metal installation instructions are available here](docs/new-install.md)


Augur is now releasing a dramatically improved new version to the main branch. It is also available here: https://github.com/chaoss/augur/releases/tag/v0.52.0
Augur is now releasing a dramatically improved new version to the main branch. It is also available here: https://github.com/chaoss/augur/releases/tag/v0.53.1
- The `main` branch is a stable version of our new architecture, which features:
- Dramatic improvement in the speed of large scale data collection (100,000+ repos). All data is obtained for 100k+ repos within 2 weeks.
- A new job management architecture that uses Celery and Redis to manage queues, and enables users to run a Flower job monitoring dashboard
Expand Down
512 changes: 402 additions & 110 deletions augur/application/schema/alembic/versions/25_unique_on_mataview.py

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,34 @@ def contributor_breadth_model() -> None:
tool_version = '0.0.1'
data_source = 'GitHub API'


# This version of the query pulls contributors who have not had any data collected yet
# To the top of the list
cntrb_login_query = s.sql.text("""
SELECT DISTINCT gh_login, cntrb_id
FROM augur_data.contributors
WHERE gh_login IS NOT NULL
SELECT DISTINCT
gh_login,
cntrb_id
FROM
(
SELECT DISTINCT
gh_login,
cntrb_id,
data_collection_date
FROM
(
SELECT DISTINCT
contributors.gh_login,
contributors.cntrb_id,
contributor_repo.data_collection_date :: DATE
FROM
contributor_repo
RIGHT OUTER JOIN contributors ON contributors.cntrb_id = contributor_repo.cntrb_id
AND contributors.gh_login IS NOT NULL
ORDER BY
contributor_repo.data_collection_date :: DATE NULLS FIRST
) A
ORDER BY
data_collection_date DESC NULLS FIRST
) b
""")

result = engine.execute(cntrb_login_query)
Expand Down
109 changes: 105 additions & 4 deletions augur/tasks/db/refresh_materialized_views.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,128 @@

from augur.tasks.init.celery_app import celery_app as celery
from augur.application.db.session import DatabaseSession
from augur.application.logs import AugurLogger


@celery.task
def refresh_materialized_views():

#self.logger = AugurLogger("data_collection_jobs").get_logger()

from augur.tasks.init.celery_app import engine

logger = logging.getLogger(refresh_materialized_views.__name__)
#self.logger = logging.getLogger(refresh_materialized_views.__name__)

refresh_view_query = s.sql.text("""
mv1_refresh = s.sql.text("""
REFRESH MATERIALIZED VIEW concurrently augur_data.api_get_all_repo_prs with data;
COMMIT;
""")

mv2_refresh = s.sql.text("""
REFRESH MATERIALIZED VIEW concurrently augur_data.api_get_all_repos_commits with data;
COMMIT;
""")

mv3_refresh = s.sql.text("""
REFRESH MATERIALIZED VIEW concurrently augur_data.api_get_all_repos_issues with data;
COMMIT;
""")

mv4_refresh = s.sql.text("""
REFRESH MATERIALIZED VIEW concurrently augur_data.augur_new_contributors with data;
COMMIT;
""")
mv5_refresh = s.sql.text("""
REFRESH MATERIALIZED VIEW concurrently augur_data.explorer_commits_and_committers_daily_count with data;
COMMIT;
""")

mv6_refresh = s.sql.text("""
REFRESH MATERIALIZED VIEW concurrently augur_data.explorer_new_contributors with data;
COMMIT;
""")

mv7_refresh = s.sql.text("""
REFRESH MATERIALIZED VIEW concurrently augur_data.explorer_entry_list with data;
REFRESH MATERIALIZED VIEW concurrently augur_data.explorer_new_contributors with data;
COMMIT;
""")

mv_8_refresh = s.sql.text("""

REFRESH MATERIALIZED VIEW concurrently augur_data.explorer_contributor_actions with data;
COMMIT;
""")

with DatabaseSession(logger, engine) as session:

session.execute_sql(refresh_view_query)

try:
with DatabaseSession(logger, engine) as session:
session.execute_sql(mv1_refresh)
except Exception as e:
logger.info(f"error is {e}")
pass


try:
with DatabaseSession(logger, engine) as session:
session.execute_sql(mv1_refresh)
except Exception as e:
logger.info(f"error is {e}")
pass

try:
with DatabaseSession(logger, engine) as session:
session.execute_sql(mv2_refresh)
except Exception as e:
logger.info(f"error is {e}")
pass

try:
with DatabaseSession(logger, engine) as session:
session.execute_sql(mv3_refresh)
except Exception as e:
logger.info(f"error is {e}")
pass

try:
with DatabaseSession(logger, engine) as session:
session.execute_sql(mv4_refresh)
except Exception as e:
logger.info(f"error is {e}")
pass

try:
with DatabaseSession(logger, engine) as session:
session.execute_sql(mv5_refresh)
except Exception as e:
logger.info(f"error is {e}")
pass

try:
with DatabaseSession(logger, engine) as session:
session.execute_sql(mv6_refresh)
except Exception as e:
logger.info(f"error is {e}")
pass

try:
with DatabaseSession(logger, engine) as session:
session.execute_sql(mv7_refresh)
except Exception as e:
logger.info(f"error is {e}")
pass

try:
with DatabaseSession(logger, engine) as session:
session.execute_sql(mv8_refresh)
except Exception as e:
logger.info(f"error is {e}")
pass







42 changes: 22 additions & 20 deletions augur/tasks/github/pull_requests/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -333,7 +333,7 @@ def collect_pull_request_reviews(repo_git: str) -> None:

pr_count = len(prs)

all_raw_pr_reviews = []
all_pr_reviews = {}
for index, pr in enumerate(prs):

pr_number = pr.pr_src_number
Expand All @@ -343,40 +343,46 @@ def collect_pull_request_reviews(repo_git: str) -> None:

pr_review_url = f"https://api.github.com/repos/{owner}/{repo}/pulls/{pr_number}/reviews"

pr_reviews = GithubPaginator(pr_review_url, manifest.key_auth, logger)

for page_data, page in pr_reviews.iter_pages():
pr_reviews = []
pr_reviews_generator = GithubPaginator(pr_review_url, manifest.key_auth, logger)
for page_data, page in pr_reviews_generator.iter_pages():

if page_data is None:
break

if len(page_data) == 0:
break

all_raw_pr_reviews.extend(page_data)
pr_reviews.extend(page_data)

if pr_reviews:
all_pr_reviews[pull_request_id] = pr_reviews

if not all_raw_pr_reviews:
if not list(all_pr_reviews.keys()):
logger.info(f"{owner}/{repo} No pr reviews for repo")
return

contributors = []
for raw_pr_review in all_raw_pr_reviews:
contributor = process_pull_request_review_contributor(raw_pr_review, tool_source, tool_version, data_source)
if contributor:
contributors.append(contributor)
for pull_request_id in all_pr_reviews.keys():

reviews = all_pr_reviews[pull_request_id]
for review in reviews:
contributor = process_pull_request_review_contributor(review, tool_source, tool_version, data_source)
if contributor:
contributors.append(contributor)

logger.info(f"{owner}/{repo} Pr reviews: Inserting {len(contributors)} contributors")
augur_db.insert_data(contributors, Contributor, ["cntrb_id"])


pr_reviews = []
for raw_pr_review in all_raw_pr_reviews:

logger.info(f"Pr review type: {type(raw_pr_review)}")
logger.info(raw_pr_review)
for pull_request_id in all_pr_reviews.keys():

if "cntrb_id" in raw_pr_review:
pr_reviews.append(extract_needed_pr_review_data(raw_pr_review, pull_request_id, repo_id, platform_id, tool_source, tool_version))
reviews = all_pr_reviews[pull_request_id]
for review in reviews:

if "cntrb_id" in review:
pr_reviews.append(extract_needed_pr_review_data(review, pull_request_id, repo_id, platform_id, tool_source, tool_version))

logger.info(f"{owner}/{repo}: Inserting pr reviews of length: {len(pr_reviews)}")
pr_review_natural_keys = ["pr_review_src_id",]
Expand All @@ -395,7 +401,3 @@ def collect_pull_request_reviews(repo_git: str) -> None:







4 changes: 2 additions & 2 deletions augur/tasks/github/util/github_api_key_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,8 @@ def get_api_keys_from_database(self) -> List[str]:
#select.order_by(func.random())
where = [WorkerOauth.access_token != self.config_key, WorkerOauth.platform == 'github']

#return [key_tuple[0] for key_tuple in self.session.query(select).filter(*where).order_by(func.random()).all()]
return [key_tuple[0] for key_tuple in self.session.query(select).filter(*where).all()]
return [key_tuple[0] for key_tuple in self.session.query(select).filter(*where).order_by(func.random()).all()]
#return [key_tuple[0] for key_tuple in self.session.query(select).filter(*where).all()]


def get_api_keys(self) -> List[str]:
Expand Down
4 changes: 2 additions & 2 deletions augur/tasks/util/redis_list.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,8 +170,8 @@ def pop(self, index: int = None):
if index is None:
# This will get a random index from the list and remove it,
# decreasing the likelihood of everyone using the same key all the time
redis.rpop(self.redis_list_key)
#redis.spop(self.redis_list_key)
#redis.rpop(self.redis_list_key)
redis.spop(self.redis_list_key)

else:
# calls __delitem__
Expand Down
2 changes: 1 addition & 1 deletion docker/backend/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
FROM python:3.8.11-slim-buster

LABEL maintainer="[email protected]"
LABEL version="0.52.0"
LABEL version="0.53.1"

ENV DEBIAN_FRONTEND=noninteractive

Expand Down
2 changes: 1 addition & 1 deletion docker/database/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
FROM postgres:12

LABEL maintainer="[email protected]"
LABEL version="0.52.0"
LABEL version="0.53.1"

ENV POSTGRES_DB "test"
ENV POSTGRES_USER "augur"
Expand Down
Binary file removed docs/.DS_Store
Binary file not shown.
28 changes: 15 additions & 13 deletions docs/dev-osx-install.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
## Augur Setup

**NOTE**: Currently, our machine learning dependencies allow Augur to only fully support python 3.8 to python 3.10. Python 3.11 will sometimes work, but often there are libraries at the operating system level that have not yet been updated to support machine learning libraries at python 3.11.

# OSX: Note: This has **MOSTLY** been tested on Apple Silicon with Python 3.11 at this time, however, one user has been successful with Intel based Apple computers.
## For OSX You Need to make sure to install XCode Command line tools:
```shell
Expand Down Expand Up @@ -38,6 +40,19 @@ export PKG_CONFIG_PATH="/opt/homebrew/opt/openblas/lib/pkgconfig"
## Pre-Requisite Operating System Level Packages
Here we ensure your system is up to date, install required python libraries, install postgresql, and install our queuing infrastrucutre, which is composed of redis-server and rabbitmq-server

### Updating your Path: Necessary for rabbitmq on OSX
#### for macOS Intel
`export PATH=$PATH:/usr/local/sbin`
#### for Apple Silicon
`export PATH=$PATH:/opt/homebrew/sbin`

***These should be added to your .zshrc or other environment file loaded when you open a terminal***

#### for macOS Intel
`export PATH=$PATH:/usr/local/sbin:$PATH`
#### for Apple Silicon
`export PATH=$PATH:/opt/homebrew/sbin:$PATH`

### Executable
```shell
brew update ;
Expand Down Expand Up @@ -77,19 +92,6 @@ rabbitmqctl set_user_tags augur augurTag administrator;
rabbitmqctl set_permissions -p augur_vhost augur ".*" ".*" ".*";
```

### Updating your Path: Necessary for rabbitmq on OSX
#### for macOS Intel
`export PATH=$PATH:/usr/local/sbin`
#### for Apple Silicon
`export PATH=$PATH:/opt/homebrew/sbin`

***These should be added to your .zshrc or other environment file loaded when you open a terminal***

#### for macOS Intel
`export PATH=$PATH:/usr/local/sbin:$PATH`
#### for Apple Silicon
`export PATH=$PATH:/opt/homebrew/sbin:$PATH`

- We need rabbitmq_management so we can purge our own queues with an API call
- We need a user
- We need a vhost
Expand Down
Binary file removed docs/source/.DS_Store
Binary file not shown.
Binary file removed docs/source/development-guide/.DS_Store
Binary file not shown.
Binary file removed docs/source/getting-started/.DS_Store
Binary file not shown.
Loading