Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daily and night fishing events incremental implementation by Chris #104

Merged
merged 36 commits into from
Dec 17, 2024

Conversation

smpiano
Copy link
Contributor

@smpiano smpiano commented Oct 17, 2024

  • Adds the branch where @Chr96er work on implementing the incremental fishing events.
  • Adds the schemas with a abstract description of each field meaning.
  • Adds the parser of the arguments to guide to the commands and a cli.
  • Adds the bigquery helper that wraps the bigquery actions.
  • Creates the tables with the specific schema. Removes from the SQL the creation.
  • Uses BQ sessions to track easy the queries when debugging.

Related with> https://globalfishingwatch.atlassian.net/browse/PIPELINE-2170

Copy link

codecov bot commented Oct 17, 2024

Codecov Report

Attention: Patch coverage is 39.68750% with 193 lines in your changes missing coverage. Please review.

Project coverage is 41.83%. Comparing base (2972336) to head (178175c).
Report is 37 commits behind head on develop.

Files with missing lines Patch % Lines
pipe_events/utils/bigquery.py 20.26% 122 Missing ⚠️
pipe_events/fishing_events_incremental.py 11.42% 31 Missing ⚠️
pipe_events/utils/parse.py 70.83% 21 Missing ⚠️
pipe_events/fishing_events_auth_and_regions.py 30.00% 7 Missing ⚠️
pipe_events/cli.py 83.78% 6 Missing ⚠️
pipe_events/fishing_events_restricted_view.py 33.33% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           develop     #104       +/-   ##
============================================
- Coverage    85.71%   41.83%   -43.88%     
============================================
  Files            1        7        +6     
  Lines           21      337      +316     
============================================
+ Hits            18      141      +123     
- Misses           3      196      +193     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Chr96er and others added 26 commits October 18, 2024 10:09
…r oldest event only; set correct column order
…ve fishing vessel filter; include and combine night loitering events; finish overall orchestration pipeline
@smpiano smpiano force-pushed the feature/PIPELINE-2170 branch from e61e318 to 1fb1ca9 Compare October 18, 2024 13:10
@smpiano smpiano requested a review from Chr96er October 18, 2024 13:30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file as well as requirements.txt could be removed. They were only required for development. I'm not sure though whether we include resources like that in the repo or should clean up?

"description": "Type of event."
},
{
"name": "vessel_id",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the right description for vessel_id (it's for ssvid). It should be:

unique vessel identifier. Vessel IDs are an internal GFW value derived from the set of core identity characteristics for a vessel (MMSI, shipname, callsign, and IMO).

"name": "prod_shiptype",
"type": "STRING",
"mode": "NULLABLE",
"description": "type of ship from prod eyes."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that looks like a typo or unclear. I'd say: "type of ship as determined for products".

"description": "Id of the segment involved in the event."
},
{
"name": "timestamp",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, at this point the table only contains messages (with event start and end timestamp columns). So this timestamp (and year, meters_to_prev and hours) all refer to messages and not events.

@Chr96er
Copy link
Contributor

Chr96er commented Oct 18, 2024

@smpiano looks good to me apart from some column descriptions. We need to decide though whether this table design is OK or whether we need to provide a separate table for products that filters for prod_shiptype = 'fishing'.

@aperdizs @rrequero we decided to create a single final fishing events table in the new incremental fishing events step. We've added the column prod_shiptype which could be used by the API to filter only for fishing vessels, instead of creating another table that is prefiltered by prod_shiptype = 'fishing. Does that work for you? I.e. you would either:

  1. have to filter the BQ table hardcoded for prod_shiptype='fishing in the API (or publication query); or
  2. add a parameter to the API call that lets the API call filter for specific prod_shiptypes.

If you don't like either of the options we can also just provide a separate table (or view) that applies the filter.

@aperdizs
Copy link
Contributor

aperdizs commented Oct 28, 2024

@Chr96er Sorry for the delay. Last week, I was focused on the VV release.

Regarding your question: The publication process just execute a SELECT * FROM xxxx. In this case, if the table is product_fishing_events_v2. We copy this table to the API database engine. There are not more filters. So, in terms of the API, this column is not necessary is all the events in this table are fishing events. If we need to implement the filter in the publication process for any reason, it should be enough :)

@smpiano One important thing, we decided a long time ago that all the events table must have the same schema, although they are different tables. Then, if we add this new column (prod_shiptype) to the fishing events, we should do it to all off them (gaps, encounters, port visits, ...)

@smpiano
Copy link
Contributor Author

smpiano commented Nov 22, 2024

NOTE: @andres-arana please review and merge #105 before merging this.

…-restricted-view

Adds creation of the restricted view
@smpiano smpiano merged commit 72fd54d into develop Dec 17, 2024
5 checks passed
@smpiano smpiano deleted the feature/PIPELINE-2170 branch December 17, 2024 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants