-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed transforming of hasharray to hashstring (#14121) #816
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
As the hashing of text array is fixed in PR https://github.com/greenplum-db/gpdb/pull/4940 removing dependency to transform hasharray to hash string which is being done in function transformTextArrayCols()
In Greenplum, different from upstream, will try to pull up the correlated EXISTS sublink which has aggregate, just like: ``` postgres=# create table t(a int, b int); postgres=# create table s (a int, b int); postgres=# explain (costs off) select * from t where exists (select sum(s.a) from s where s.a = t.a group by s.a); QUERY PLAN ------------------------------------------ Gather Motion 3:1 (slice1; segments: 3) -> Hash Join Hash Cond: (t.a = s.a) -> Seq Scan on t -> Hash -> HashAggregate Group Key: s.a -> Seq Scan on s Optimizer: Postgres query optimizer (9 rows) ``` So Greenplum changed the behavior of function `convert_EXISTS_sublink_to_join()` and `simplify_EXISTS_query()`, so `simplify_EXISTS_query()` will simplify the EXISTS sublink which has aggregate node, it will reset `targetList` and other clauses to null, just like: ``` query->targetList = NIL; query->distinctClause = NIL; query->sortClause = NIL; query->hasDistinctOn = false; ``` But when we exec uncorrelated EXISTS sublink which also has an aggregate node, we can't reset `targetList`. Otherwise, we can't make a plan for an aggregate node that doesn't have an aggregate column, just like #11849. This patch tries to fix the above issue, and the thought is **NOT** simplify uncorrelated EXISTS sublink in some situation.
The commit 4aaf666ff removed system column stats from Orca, but forgot to update the expected output plan.
…multi-DQA expr. (#14135) This patch is to fix #14096. How does multi-DQA work: When creating path for multi-DQA (multiple distinct qualified aggregate), the planner generates the plan as follows: ``` Finalize Aggregate -> Gather Motion -> Partial Aggregate -> HashAggregate, to remove duplicates -> Redistribute Motion -> TupleSplit (according to DISTINCT expr) -> input ``` The plan is generated via the following steps: 1. The planner compares the expr in DQA with the input tuples, if the DQA expr is not in PathTarget, the TupleSplit node will append the corresponding column the the PathTarget together with a special column 'AggExprId' to the output tuple. E.g., if the input of TupleSplit is '(a: 1, b: 2, c: 3)' and the DQA expressions are 'DQA(a)', 'DQA(b)', the output of TupleSplit are '(a: 1, b: null, c: 3, AggExprId: 1)', '(a: null, b: 2, c: 3, AggExprId: 2)'. Only one of the DQA expr's column is set to non-null per tuple and the AggExprId is used to indicate which DQA expr's column is non-null in that tuple. 2. The output tuples of TupleSplit are hashed and sent to different segments via Redistribute-Motion. Because in the previous stage, only one of the DQA expr's column is set to non-null per tuple, DQA expr with same value is gathered on the same segment and we can apply HashAggregate on these tuples to remove duplicate values. 3. We apply Partial Aggregate function on tuples across segments. 4. We gather tuples to QD and apply the Finalize Aggregate function on tuples. The RCA is given below: 1. When there're type conversions in DQA exprs, the planner will wrap the expr with type conversion node. E.g., for binary-compatible datatypes, like 'VARCHAR' and 'TEXT' the expr will be wrapped with 'ReplabelType' node. 2. When there are queries like 'SELECT DQA(a::TEXT) FROM foo(a VARCHAR)', the planner compares the DQA expr with exprs in PathTarget directly without removing the 'RelabelType' node. So, the TupleSplit looks like: ``` TupleSplit: Input: a Output: a, a::TEXT ``` 3. When the planner generating the Redistribute-Motion operator, it tries to find the expr of equivalent class with the expr in targetlist as the hashkey (src/backend/cdb/cdbpullup.c:cdbpullup_findEclassInTargetList). However, in this stage, the planner will ignore the 'RelabelType' when comparing two exprs. Finally, the planner may choose incorrect column as the hash key. This patch helps resolve the issue by teaching 'TupleSplit' node to ignore 'RelabelType' node when comparing DQA expr and the one in PathTarget. So the TupleSplit node looks like: ``` TupleSplit: Input: a Output: a ``` when planning queries like: 'SELECT DQA(a::TEXT) FROM foo(a VARCHAR)'.
Background: 1c1f164 introduced the waitOnOutbound() call for non-error cases as well during TCP IC teardown, after "Interconnect error: connection closed prematurely." ERRORs were seen in Azure environments. This is essential because performing a close() does not necessitate that the peer has received any unflushed data in the kernel send buffers. Neither does it flag cases where the peer may have crashed after the sender performed the close(). Thus it is essential we have an app-defined protocol to ensure delivery - this is where the EOS message comes in. Motion senders send an EOS to the motion receiver. Then the receiver closes the socket (and sent an optional 'stop' message), which is detected by the sender in waitOnOutbound() by recv()ing 0 or the stop message. ONLY after this detection step can the sender can safely close() its socket. Implementation: We currently poll for a stop message in a motion sender teardown. However, we were performing infinite retries (as we weren't handling the return value of 0 from select()), which can cause hung queries. So, use a timeout equal to gp_interconnect_transmit_timeout, which is used throughout the interconnect to enforce an upper bound on the time it takes to transmit a packet (in this case the EOS). Once this timeout elapses, an ERROR will result. PS: The implementation is inspired by PerformRadiusTransaction().
MemtupleBinding is column aligned by 4 or 8 bytes. It can be set to 8 if we find a atrribute's attalign is 'd' (8 bytes on most machines), there is no need to check other attributes. Add some comments about how to compute null bitmap entries space. Authored-by: Mingli Zhang <[email protected]>
We can create user-defined access methods in GPDB 7, fill the storage type according to the access method name is more accurate.
pipeline failed with the log 'Table pg_am has a dependency issue on oid 152563 at content ...'. The newly created access method in psql_gp_commands regress test is not cleaned up at the end. Then the gpcheckcat checked that the newly created object in pg_am catalog doesn't have a reference in pg_depend under objid or refobjid column, it will show a dependency issue.
%s/compatable/compatible/g in file gparray.py Since initFromString is already compatible with repr() formatting, remove the TODO mark. Search `compatable` globally and change them all to `compatible`. Signed-off-by: Junwang Zhao <[email protected]>
netstat has been replaced with ss earlier but some of the variable names and commented out codes were left as it is. This commit cleans up the old mentions of netstat and changes the variable names to avoid any confusion in the future.
my-ship-it
approved these changes
Dec 26, 2024
yjhjstz
approved these changes
Dec 27, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #ISSUE_Number
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheck
make -C src/test installcheck-cbdb-parallel
Impact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
CI Skip Instructions