Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Logging and Setting Status/BlockHeight for new Tables #299

Closed
5 tasks done
Tracked by #298
pkudinov opened this issue Oct 17, 2023 · 10 comments · Fixed by #608
Closed
5 tasks done
Tracked by #298

Implement Logging and Setting Status/BlockHeight for new Tables #299

pkudinov opened this issue Oct 17, 2023 · 10 comments · Fixed by #608
Assignees

Comments

@pkudinov
Copy link
Collaborator

pkudinov commented Oct 17, 2023

  • Partition by date
  • Don't set primary key to block_height
  • Index by severity
  • Introduce the new Hasura endpoint for logs
  • Add log schema provisioning step

Tasks

Preview Give feedback
  1. morgsmccauley
  2. Kevin101Zhang
  3. component: Runner enhancement
    Kevin101Zhang
  4. component: Runner enhancement
    Kevin101Zhang
  5. component: Runner enhancement
    Kevin101Zhang

Blocked

  • Delete the table on indexer deletion
@pkudinov pkudinov changed the title Partition indexer_log_entries table by function_name Move indexer_log_entries table into function_name schema Dec 19, 2023
@roshaans
Copy link
Contributor

roshaans commented Jan 10, 2024

(Option A) Should we provision one indexer_log_entries table per user DB or (Option B) provision a new table for every new user indexer or

Option A:

Pros

  • No time wasted in provisioning after the user's first indexer.
  • For users with quite a few indexers, this table will get slower.
  • Data Retention Job targets a single table rather than many.

Cons

  • Deletion of indexer logs will be an expensive query for large indexers.

Option B:

Pros

  • For users with many indexers like Dataplatform, indexer logs query performance is not affected.
  • Indexer Logs are easier to clean up when deleting indexers. Simply delete the indexer logs table.
  • Enhances discoverability in graphql playground. Every indexer has a logs table user can search through.

Cons

  • Every new indexer also waits for indexer_logs_schema provisioning (Not that big of deal IMO)
  • Data Retention policy targets many different tables at the same time.

Seems like Option B will be a more performant solution here when it comes to querying and writing to the database as well as maintenance.

@roshaans
Copy link
Contributor

@pkudinov I do not think we need to partition by date anymore especially if we are going to be having a data retention policy of 2 weeks.

@pkudinov pkudinov changed the title Move indexer_log_entries table into function_name schema Implement the new logs schema Feb 27, 2024
@Kevin101Zhang
Copy link
Contributor

Kevin101Zhang commented Mar 1, 2024

Draft:
CREATE TABLE logs (
    log_id INT PRIMARY KEY,
    timestamp TIMESTAMP,
    log_type VARCHAR(50),
    description TEXT,
    severity VARCHAR(20) NOT NULL CHECK (severity IN ('Low', 'Medium', 'High')),
    block_height INT NOT NULL,
    additional_data JSON
);

CREATE INDEX idx_timestamp ON logs USING btree (timestamp);
CREATE INDEX idx_log_type ON logs USING btree (log_type);
CREATE INDEX idx_severity ON logs USING btree (severity);

@eduohe
Copy link

eduohe commented Mar 4, 2024

Hi @Kevin101Zhang here are some docs that we can talk in today's meeting:

  1. Text Search: https://www.postgresql.org/docs/current/textsearch.html
  2. Partition by Range: https://www.postgresql.org/docs/current/ddl-partitioning.html (Automatic partition creation example )

@eduohe
Copy link

eduohe commented Mar 4, 2024

@Kevin101Zhang please let's figure out how many records we have per indexer and then per indexer / date to validate the real need for partitions on logs table:

something like:

select indexer_name, count(1) from logs
group by 1
order 2 desc
select user, count(1) from logs
group by 1
order 2 desc
select user, indexer_name, count(1) from logs
group by 1, 2
order 3 desc
select date(logs_date) , indexer_name, count(1) from logs
group by 1,2
order 3 desc

@pkudinov
Copy link
Collaborator Author

pkudinov commented Mar 5, 2024

Separate table per indexer, one partition per day.

CREATE TABLE logs (
    log_id INT PRIMARY KEY,
    log_date DATE,
    log_type TEXT NOT NULL CHECK (log_type IN ('system', 'indexer')),
    timestamp TIMESTAMP,
    block_height INT NOT NULL,
    message TEXT,
   -- add tsvector for message fulltext search
    level TEXT NOT NULL CHECK (level IN ('DEBUG', 'INFO', 'WARN', 'ERROR')),
);

@eduohe
Copy link

eduohe commented Mar 7, 2024

-- 1) Let's create the logs table
CREATE TYPE log_type AS ENUM (
    'system',
    'indexer'
    );

CREATE TYPE log_level AS ENUM (
    'DEBUG',
    'INFO',
    'WARN',
    'ERROR'
    );

CREATE TABLE logs (
    id BIGSERIAL NOT NULL,
    block_height NUMERIC(20) NOT NULL,
    log_date DATE NOT NULL,
    log_timestamp TIMESTAMP NOT NULL,
    log_type log_type NOT NULL,
    log_level log_level NOT NULL,
    message TEXT NOT NULL,
    PRIMARY KEY (log_date, id)
) PARTITION BY RANGE (log_date);

CREATE INDEX logs_log_timestamp_idx ON logs USING btree (log_timestamp);
CREATE INDEX logs_log_type_idx ON logs USING btree (log_type);
CREATE INDEX logs_log_level_idx ON logs USING btree (log_level);
CREATE INDEX logs_block_height_idx ON logs USING btree (block_height);


-- 2) Create the tsvector index for the message column
-- Query example: SELECT * FROM logs WHERE message @@ to_tsquery('english', 'lag & block');
CREATE INDEX logs_search_vector_idx ON logs USING GIN (to_tsvector('english', message));

-- 3) Define a function to create new partitions. Examples of usage:
-- select fn_create_partition('myschema', 'mytable', current_date, '0 day', '1 day'); today
-- select fn_create_partition('myschema', 'mytable', current_date, '1 day', '2 day'); tomorrow
CREATE OR REPLACE FUNCTION fn_create_partition(_tbl text, _date date, _interval_start text, _interval_end text)
  RETURNS void
  LANGUAGE plpgsql AS
$func$
DECLARE
	_start text;
	_end text;
	_partition_name text;
BEGIN
	_start := TO_CHAR(date_trunc('day', _date + (_interval_start)::interval), 'YYYY-MM-DD');
    _end := TO_CHAR(date_trunc('day', _date + (_interval_end)::interval), 'YYYY-MM-DD');
	_partition_name := TO_CHAR(date_trunc('day', _date + (_interval_start)::interval), 'YYYYMMDD');
	-- Create partition 
	EXECUTE 'CREATE TABLE IF NOT EXISTS ' || _tbl || '_p' || _partition_name || ' PARTITION OF ' || _tbl || ' FOR VALUES FROM (''' || _start || ''') TO (''' || _end || ''')';
END
$func$;

SELECT fn_create_partition('eduohe_near_access_keys_v1.logs', CURRENT_DATE, '0 day', '1 day');
SELECT fn_create_partition('eduohe_near_access_keys_v1.logs', CURRENT_DATE, '1 day', '2 day');


-- 4) Define a function to delete partitions. Examples of usage:
-- select fn_delete_partition('myschema', 'mytable', current_date, '-15 day', '-14 day'); 2 weeks retention
CREATE OR REPLACE FUNCTION fn_delete_partition(_tbl text, _date date, _interval_start text, _interval_end text)
  RETURNS void
  LANGUAGE plpgsql AS
$func$
DECLARE
	_start text;
	_end text;
	_partition_name text;
BEGIN
	_start := TO_CHAR(date_trunc('day', _date + (_interval_start)::interval), 'YYYY-MM-DD');
    _end := TO_CHAR(date_trunc('day', _date + (_interval_end)::interval), 'YYYY-MM-DD');
	_partition_name := TO_CHAR(date_trunc('day', _date + (_interval_start)::interval), 'YYYYMMDD');
	-- Detach partition 
	EXECUTE 'ALTER TABLE ' || _tbl || ' DETACH PARTITION ' || _tbl || '_p' || _partition_name;
	EXECUTE 'DROP TABLE '  || _tbl || '_p' || _partition_name;
END
$func$;


-- 5) To create new partitions automatically and delete old ones we can use pg_cron
-- GCP Cloud SQL requires these flags:
-- cloudsql.enable_pg_cron = On
-- cron.database_name = indexer_balances_mainnet

-- IMPORTANT: Needs a SUPERUSER (e.g. potsgres or admin user) to create the extension
-- and grant access to the indexer user 
CREATE EXTENSION pg_cron;
GRANT CREATE, USAGE ON SCHEMA cron TO eduohe_near;

-- Every day at 1am creates a new partition for the next day
SELECT cron.schedule('eduohe_near_access_keys_v1_logs_delete_partition', '0 1 * * *', $$SELECT fn_create_partition('eduohe_near_access_keys_v1.logs', CURRENT_DATE, '1 day', '2 day')$$);
-- Every day at 2am deletes a partition of 14 days ago
SELECT cron.schedule('0 2 * * *', $$SELECT fn_delete_partition('eduohe_near_access_keys_v1.logs', CURRENT_DATE, '-15 day', '-14 day')$$);
-- to unschedule we can use these cmd: SELECT cron.unschedule('eduohe_near_access_keys_v1_logs_create_partition');
SELECT * FROM cron.job;

-- END

SELECT * FROM logs WHERE message @@ to_tsquery('english', 'lag & block');

Screenshot 2024-03-07 at 4 08 30 PM

Screenshot 2024-03-07 at 4 12 14 PM

@Kevin101Zhang here is the script and how we should see the components for the indexers DB.
With this I think you are unblocked to implement the execution of the script in the QueryAPI indexer DB provisioning step. If you need any changes or have any questions about the script I'm happy to support

@eduohe eduohe removed their assignment Mar 8, 2024
@darunrs darunrs changed the title Implement the new logs schema Add Code to Support Writes to new Logs Table Apr 4, 2024
@darunrs darunrs changed the title Add Code to Support Writes to new Logs Table Implement and use New Logs Table Apr 4, 2024
@darunrs darunrs reopened this Apr 4, 2024
@darunrs darunrs changed the title Implement and use New Logs Table Implement Logging and Setting Status/BlockHeight for new Tables Apr 5, 2024
@Kevin101Zhang Kevin101Zhang removed their assignment Apr 17, 2024
@Kevin101Zhang
Copy link
Contributor

Is this one redundant. Can we close this @pkudinov

@pkudinov
Copy link
Collaborator Author

This is a parent ticket. Let's mark it as done when all sub tickets are in production

@pkudinov
Copy link
Collaborator Author

@morgsmccauley please close this ticket when everything is in prod

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants