-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPL fieldsummary
command
#766
Merged
seankao-az
merged 18 commits into
opensearch-project:main
from
YANG-DB:ppl-fieldsummary-command
Oct 25, 2024
Merged
Changes from 16 commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
cea6e44
add support for FieldSummary
YANG-DB fd1375a
add support for FieldSummary
YANG-DB 4f30306
update sample query
YANG-DB 21924ae
Merge branch 'main' into ppl-fieldsummary-command
YANG-DB a9e7c6e
Merge branch 'main' into ppl-fieldsummary-command
YANG-DB 7bcce2f
support spark prior to 3.5 with its extended table identifier (existi…
YANG-DB c170208
update union queries based summary
YANG-DB 63c6118
update scala fmt style
YANG-DB 049be03
update scala fmt style
YANG-DB ea5cbdf
update query with where clause predicate
YANG-DB 7a672fc
update command and remove the topvalues
YANG-DB c5c1cd3
Merge branch 'main' into ppl-fieldsummary-command
YANG-DB 9236999
update command docs
YANG-DB b483ff1
Merge branch 'main' into ppl-fieldsummary-command
YANG-DB ed5ccd9
Merge branch 'main' into ppl-fieldsummary-command
YANG-DB 132978b
update with comments feedback
YANG-DB 94acd56
Merge branch 'main' into ppl-fieldsummary-command
YANG-DB e3005d4
update `FIELD SUMMARY` symbols to the keywordsCanBeId bag of words
YANG-DB File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
## PPL `fieldsummary` command | ||
|
||
**Description** | ||
Using `fieldsummary` command to : | ||
- Calculate basic statistics for each field (count, distinct count, min, max, avg, stddev, mean ) | ||
- Determine the data type of each field | ||
|
||
**Syntax** | ||
|
||
`... | fieldsummary <field-list> (nulls=true/false)` | ||
|
||
* command accepts any preceding pipe before the terminal `fieldsummary` command and will take them into account. | ||
* `includefields`: list of all the columns to be collected with statistics into a unified result set | ||
* `nulls`: optional; if the true, include the null values in the aggregation calculations (replace null with zero for numeric values) | ||
|
||
### Example 1: | ||
|
||
PPL query: | ||
|
||
os> source = t | where status_code != 200 | fieldsummary includefields= status_code nulls=true | ||
+------------------+-------------+------------+------------+------------+------------+------------+------------+----------------| | ||
| Fiels | COUNT | COUNT_DISTINCT | MIN | MAX | AVG | MEAN | STDDEV | NUlls | TYPEOF | | ||
|------------------+-------------+------------+------------+------------+------------+------------+------------+----------------| | ||
| "status_code" | 2 | 2 | 301 | 403 | 352.0 | 352.0 | 72.12489168102785 | 0 | "int" | | ||
+------------------+-------------+------------+------------+------------+------------+------------+------------+----------------| | ||
|
||
### Example 2: | ||
|
||
PPL query: | ||
|
||
os> source = t | fieldsummary includefields= id, status_code, request_path nulls=true | ||
+------------------+-------------+------------+------------+------------+------------+------------+------------+----------------| | ||
| Fiels | COUNT | COUNT_DISTINCT | MIN | MAX | AVG | MEAN | STDDEV | NUlls | TYPEOF | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto |
||
|------------------+-------------+------------+------------+------------+------------+------------+------------+----------------| | ||
| "id" | 6 | 6 | 1 | 6 | 3.5 | 3.5 | 1.8708286933869707 | 0 | "int" | | ||
+------------------+-------------+------------+------------+------------+------------+------------+------------+----------------| | ||
| "status_code" | 4 | 3 | 200 | 403 | 184.0 | 184.0 | 161.16699413961905 | 2 | "int" | | ||
+------------------+-------------+------------+------------+------------+------------+------------+------------+----------------| | ||
| "request_path" | 2 | 2 | /about| /home | 0.0 | 0.0 | 0 | 2 |"string"| | ||
+------------------+-------------+------------+------------+------------+------------+------------+------------+----------------| | ||
|
||
### Additional Info | ||
The actual query is translated into the following SQL-like statement: | ||
|
||
```sql | ||
SELECT | ||
id AS Field, | ||
COUNT(id) AS COUNT, | ||
COUNT(DISTINCT id) AS COUNT_DISTINCT, | ||
MIN(id) AS MIN, | ||
MAX(id) AS MAX, | ||
AVG(id) AS AVG, | ||
MEAN(id) AS MEAN, | ||
STDDEV(id) AS STDDEV, | ||
(COUNT(1) - COUNT(id)) AS Nulls, | ||
TYPEOF(id) AS TYPEOF | ||
FROM | ||
t | ||
GROUP BY | ||
TYPEOF(status_code), status_code; | ||
UNION | ||
SELECT | ||
status_code AS Field, | ||
COUNT(status_code) AS COUNT, | ||
COUNT(DISTINCT status_code) AS COUNT_DISTINCT, | ||
MIN(status_code) AS MIN, | ||
MAX(status_code) AS MAX, | ||
AVG(status_code) AS AVG, | ||
MEAN(status_code) AS MEAN, | ||
STDDEV(status_code) AS STDDEV, | ||
(COUNT(1) - COUNT(status_code)) AS Nulls, | ||
TYPEOF(status_code) AS TYPEOF | ||
FROM | ||
t | ||
GROUP BY | ||
TYPEOF(status_code), status_code; | ||
``` | ||
For each such columns (id, status_code) there will be a unique statement and all the fields will be presented togather in the result using a UNION operator | ||
|
||
|
||
### Limitation: | ||
- `topvalues` option was removed from this command due the possible performance impact of such sub-query. As an alternative one can use the `top` command directly as shown [here](ppl-top-command.md). | ||
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fiels
->Fields
?