Implement Logging and Setting Status/BlockHeight for new Tables #299

pkudinov · 2023-10-17T23:38:36Z

Partition by date
Don't set primary key to block_height
Index by severity
Introduce the new Hasura endpoint for logs
Add log schema provisioning step

Tasks

Give feedback

Setup Logs Retention to 2 weeks #499
Add log schema provisioning step #586
Migrate existing user schemas to contain logs and metadata tables #631

component: Runner enhancement
Enable writes to new logs table #638

component: Runner enhancement
Release Logs to Production (both old and new way) #650

component: Runner enhancement
Options

Blocked

Delete the table on indexer deletion

The text was updated successfully, but these errors were encountered:

roshaans · 2024-01-10T19:40:13Z

(Option A) Should we provision one indexer_log_entries table per user DB or (Option B) provision a new table for every new user indexer or

Option A:

Pros

No time wasted in provisioning after the user's first indexer.
For users with quite a few indexers, this table will get slower.
Data Retention Job targets a single table rather than many.

Cons

Deletion of indexer logs will be an expensive query for large indexers.

Option B:

Pros

For users with many indexers like Dataplatform, indexer logs query performance is not affected.
Indexer Logs are easier to clean up when deleting indexers. Simply delete the indexer logs table.
Enhances discoverability in graphql playground. Every indexer has a logs table user can search through.

Cons

Every new indexer also waits for indexer_logs_schema provisioning (Not that big of deal IMO)
Data Retention policy targets many different tables at the same time.

Seems like Option B will be a more performant solution here when it comes to querying and writing to the database as well as maintenance.

roshaans · 2024-01-10T20:10:01Z

@pkudinov I do not think we need to partition by date anymore especially if we are going to be having a data retention policy of 2 weeks.

Kevin101Zhang · 2024-03-01T23:15:34Z

Draft:
CREATE TABLE logs (
    log_id INT PRIMARY KEY,
    timestamp TIMESTAMP,
    log_type VARCHAR(50),
    description TEXT,
    severity VARCHAR(20) NOT NULL CHECK (severity IN ('Low', 'Medium', 'High')),
    block_height INT NOT NULL,
    additional_data JSON
);

CREATE INDEX idx_timestamp ON logs USING btree (timestamp);
CREATE INDEX idx_log_type ON logs USING btree (log_type);
CREATE INDEX idx_severity ON logs USING btree (severity);

eduohe · 2024-03-04T17:45:27Z

Hi @Kevin101Zhang here are some docs that we can talk in today's meeting:

Text Search: https://www.postgresql.org/docs/current/textsearch.html
Partition by Range: https://www.postgresql.org/docs/current/ddl-partitioning.html (Automatic partition creation example )

eduohe · 2024-03-04T19:10:24Z

@Kevin101Zhang please let's figure out how many records we have per indexer and then per indexer / date to validate the real need for partitions on logs table:

something like:

select indexer_name, count(1) from logs
group by 1
order 2 desc

select user, count(1) from logs
group by 1
order 2 desc

select user, indexer_name, count(1) from logs
group by 1, 2
order 3 desc

select date(logs_date) , indexer_name, count(1) from logs
group by 1,2
order 3 desc

pkudinov · 2024-03-05T22:18:05Z

Separate table per indexer, one partition per day.

CREATE TABLE logs (
    log_id INT PRIMARY KEY,
    log_date DATE,
    log_type TEXT NOT NULL CHECK (log_type IN ('system', 'indexer')),
    timestamp TIMESTAMP,
    block_height INT NOT NULL,
    message TEXT,
   -- add tsvector for message fulltext search
    level TEXT NOT NULL CHECK (level IN ('DEBUG', 'INFO', 'WARN', 'ERROR')),
);

eduohe · 2024-03-07T02:17:43Z

-- 1) Let's create the logs table
CREATE TYPE log_type AS ENUM (
    'system',
    'indexer'
    );

CREATE TYPE log_level AS ENUM (
    'DEBUG',
    'INFO',
    'WARN',
    'ERROR'
    );

CREATE TABLE logs (
    id BIGSERIAL NOT NULL,
    block_height NUMERIC(20) NOT NULL,
    log_date DATE NOT NULL,
    log_timestamp TIMESTAMP NOT NULL,
    log_type log_type NOT NULL,
    log_level log_level NOT NULL,
    message TEXT NOT NULL,
    PRIMARY KEY (log_date, id)
) PARTITION BY RANGE (log_date);

CREATE INDEX logs_log_timestamp_idx ON logs USING btree (log_timestamp);
CREATE INDEX logs_log_type_idx ON logs USING btree (log_type);
CREATE INDEX logs_log_level_idx ON logs USING btree (log_level);
CREATE INDEX logs_block_height_idx ON logs USING btree (block_height);


-- 2) Create the tsvector index for the message column
-- Query example: SELECT * FROM logs WHERE message @@ to_tsquery('english', 'lag & block');
CREATE INDEX logs_search_vector_idx ON logs USING GIN (to_tsvector('english', message));

-- 3) Define a function to create new partitions. Examples of usage:
-- select fn_create_partition('myschema', 'mytable', current_date, '0 day', '1 day'); today
-- select fn_create_partition('myschema', 'mytable', current_date, '1 day', '2 day'); tomorrow
CREATE OR REPLACE FUNCTION fn_create_partition(_tbl text, _date date, _interval_start text, _interval_end text)
  RETURNS void
  LANGUAGE plpgsql AS
$func$
DECLARE
	_start text;
	_end text;
	_partition_name text;
BEGIN
	_start := TO_CHAR(date_trunc('day', _date + (_interval_start)::interval), 'YYYY-MM-DD');
    _end := TO_CHAR(date_trunc('day', _date + (_interval_end)::interval), 'YYYY-MM-DD');
	_partition_name := TO_CHAR(date_trunc('day', _date + (_interval_start)::interval), 'YYYYMMDD');
	-- Create partition 
	EXECUTE 'CREATE TABLE IF NOT EXISTS ' || _tbl || '_p' || _partition_name || ' PARTITION OF ' || _tbl || ' FOR VALUES FROM (''' || _start || ''') TO (''' || _end || ''')';
END
$func$;

SELECT fn_create_partition('eduohe_near_access_keys_v1.logs', CURRENT_DATE, '0 day', '1 day');
SELECT fn_create_partition('eduohe_near_access_keys_v1.logs', CURRENT_DATE, '1 day', '2 day');


-- 4) Define a function to delete partitions. Examples of usage:
-- select fn_delete_partition('myschema', 'mytable', current_date, '-15 day', '-14 day'); 2 weeks retention
CREATE OR REPLACE FUNCTION fn_delete_partition(_tbl text, _date date, _interval_start text, _interval_end text)
  RETURNS void
  LANGUAGE plpgsql AS
$func$
DECLARE
	_start text;
	_end text;
	_partition_name text;
BEGIN
	_start := TO_CHAR(date_trunc('day', _date + (_interval_start)::interval), 'YYYY-MM-DD');
    _end := TO_CHAR(date_trunc('day', _date + (_interval_end)::interval), 'YYYY-MM-DD');
	_partition_name := TO_CHAR(date_trunc('day', _date + (_interval_start)::interval), 'YYYYMMDD');
	-- Detach partition 
	EXECUTE 'ALTER TABLE ' || _tbl || ' DETACH PARTITION ' || _tbl || '_p' || _partition_name;
	EXECUTE 'DROP TABLE '  || _tbl || '_p' || _partition_name;
END
$func$;


-- 5) To create new partitions automatically and delete old ones we can use pg_cron
-- GCP Cloud SQL requires these flags:
-- cloudsql.enable_pg_cron = On
-- cron.database_name = indexer_balances_mainnet

-- IMPORTANT: Needs a SUPERUSER (e.g. potsgres or admin user) to create the extension
-- and grant access to the indexer user 
CREATE EXTENSION pg_cron;
GRANT CREATE, USAGE ON SCHEMA cron TO eduohe_near;

-- Every day at 1am creates a new partition for the next day
SELECT cron.schedule('eduohe_near_access_keys_v1_logs_delete_partition', '0 1 * * *', $$SELECT fn_create_partition('eduohe_near_access_keys_v1.logs', CURRENT_DATE, '1 day', '2 day')$$);
-- Every day at 2am deletes a partition of 14 days ago
SELECT cron.schedule('0 2 * * *', $$SELECT fn_delete_partition('eduohe_near_access_keys_v1.logs', CURRENT_DATE, '-15 day', '-14 day')$$);
-- to unschedule we can use these cmd: SELECT cron.unschedule('eduohe_near_access_keys_v1_logs_create_partition');
SELECT * FROM cron.job;

-- END

SELECT * FROM logs WHERE message @@ to_tsquery('english', 'lag & block');

@Kevin101Zhang here is the script and how we should see the components for the indexers DB.
With this I think you are unblocked to implement the execution of the script in the QueryAPI indexer DB provisioning step. If you need any changes or have any questions about the script I'm happy to support

Kevin101Zhang · 2024-04-17T22:10:32Z

Is this one redundant. Can we close this @pkudinov

pkudinov · 2024-04-17T23:07:35Z

This is a parent ticket. Let's mark it as done when all sub tickets are in production

pkudinov · 2024-04-23T21:47:13Z

@morgsmccauley please close this ticket when everything is in prod

pkudinov mentioned this issue Oct 17, 2023

🔷 [Epic] Logging Performance and Discoverability #298

Closed

pkudinov changed the title ~~Partition indexer_log_entries table by function_name~~ Move indexer_log_entries table into function_name schema Dec 19, 2023

pkudinov assigned roshaans Jan 9, 2024

pkudinov unassigned roshaans Jan 17, 2024

pkudinov assigned Kevin101Zhang Feb 27, 2024

pkudinov changed the title ~~Move indexer_log_entries table into function_name schema~~ Implement the new logs schema Feb 27, 2024

pkudinov assigned eduohe Feb 27, 2024

eduohe removed their assignment Mar 8, 2024

Kevin101Zhang linked a pull request Apr 1, 2024 that will close this issue

feat: Introducing Logging Class (Disabled Usage of Logger) #608

Merged

Kevin101Zhang closed this as completed in #608 Apr 4, 2024

darunrs changed the title ~~Implement the new logs schema~~ Add Code to Support Writes to new Logs Table Apr 4, 2024

darunrs changed the title ~~Add Code to Support Writes to new Logs Table~~ Implement and use New Logs Table Apr 4, 2024

darunrs reopened this Apr 4, 2024

darunrs changed the title ~~Implement and use New Logs Table~~ Implement Logging and Setting Status/BlockHeight for new Tables Apr 5, 2024

Kevin101Zhang removed their assignment Apr 17, 2024

pkudinov assigned morgsmccauley Apr 23, 2024

pkudinov added the component: Runner label Apr 23, 2024

pkudinov closed this as completed Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Logging and Setting Status/BlockHeight for new Tables #299

Implement Logging and Setting Status/BlockHeight for new Tables #299

pkudinov commented Oct 17, 2023 •

edited

Loading

Tasks

roshaans commented Jan 10, 2024 •

edited

Loading

roshaans commented Jan 10, 2024

Kevin101Zhang commented Mar 1, 2024 •

edited

Loading

eduohe commented Mar 4, 2024

eduohe commented Mar 4, 2024 •

edited

Loading

pkudinov commented Mar 5, 2024

eduohe commented Mar 7, 2024 •

edited

Loading

Kevin101Zhang commented Apr 17, 2024

pkudinov commented Apr 17, 2024

pkudinov commented Apr 23, 2024

Implement Logging and Setting Status/BlockHeight for new Tables #299

Implement Logging and Setting Status/BlockHeight for new Tables #299

Comments

pkudinov commented Oct 17, 2023 • edited Loading

Tasks

roshaans commented Jan 10, 2024 • edited Loading

Option A:

Pros

Cons

Option B:

Pros

Cons

roshaans commented Jan 10, 2024

Kevin101Zhang commented Mar 1, 2024 • edited Loading

eduohe commented Mar 4, 2024

eduohe commented Mar 4, 2024 • edited Loading

pkudinov commented Mar 5, 2024

eduohe commented Mar 7, 2024 • edited Loading

Kevin101Zhang commented Apr 17, 2024

pkudinov commented Apr 17, 2024

pkudinov commented Apr 23, 2024

pkudinov commented Oct 17, 2023 •

edited

Loading

roshaans commented Jan 10, 2024 •

edited

Loading

Kevin101Zhang commented Mar 1, 2024 •

edited

Loading

eduohe commented Mar 4, 2024 •

edited

Loading

eduohe commented Mar 7, 2024 •

edited

Loading