Skip to content

Commit

Permalink
Revert "[libbeat][reader][parquet] - Updated Apache Arrow library fro…
Browse files Browse the repository at this point in the history
…m v11 to v12.0.1 (elastic#35640)"

This reverts commit 54b5d97.
  • Loading branch information
faec committed Jun 14, 2023
1 parent 4a16c6a commit 8bffc3d
Show file tree
Hide file tree
Showing 12 changed files with 1,429 additions and 559 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]
- 'add_cloud_metadata' processor - add cloud.region field for GCE cloud provider
- 'add_cloud_metadata' processor - update azure metadata api version to get missing `cloud.account.id` field
- Make sure k8s watchers are closed when closing k8s meta processor. {pull}35630[35630]
- Upgraded apache arrow library used in x-pack/libbeat/reader/parquet from v11 to v12.0.1 in order to fix cross-compilation issues {pull}35640[35640]


*Auditbeat*

Expand Down
85 changes: 78 additions & 7 deletions NOTICE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2757,12 +2757,12 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


--------------------------------------------------------------------------------
Dependency : github.com/apache/arrow/go/v12
Version: v12.0.1-0.20230605094802-c153c6d36ccf
Dependency : github.com/apache/arrow/go/v11
Version: v11.0.0
Licence type (autodetected): Apache-2.0
--------------------------------------------------------------------------------

Contents of probable licence file $GOMODCACHE/github.com/apache/arrow/go/v12@v12.0.1-0.20230605094802-c153c6d36ccf/LICENSE.txt:
Contents of probable licence file $GOMODCACHE/github.com/apache/arrow/go/v11@v11.0.0/LICENSE.txt:


Apache License
Expand Down Expand Up @@ -2969,6 +2969,77 @@ Contents of probable licence file $GOMODCACHE/github.com/apache/arrow/go/v12@v12

--------------------------------------------------------------------------------

src/plasma/fling.cc and src/plasma/fling.h: Apache 2.0

Copyright 2013 Sharvil Nanavati

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

--------------------------------------------------------------------------------

src/plasma/thirdparty/ae: Modified / 3-Clause BSD

Copyright (c) 2006-2010, Salvatore Sanfilippo <antirez at gmail dot com>
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of Redis nor the names of its contributors may be used
to endorse or promote products derived from this software without
specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

--------------------------------------------------------------------------------

src/plasma/thirdparty/dlmalloc.c: CC0

This is a version (aka dlmalloc) of malloc/free/realloc written by
Doug Lea and released to the public domain, as explained at
http://creativecommons.org/publicdomain/zero/1.0/ Send questions,
comments, complaints, performance data, etc to [email protected]

--------------------------------------------------------------------------------

src/plasma/common.cc (some portions)

Copyright (c) Austin Appleby (aappleby (AT) gmail)

Some portions of this file are derived from code in the MurmurHash project

All code is released to the public domain. For business purposes, Murmurhash is
under the MIT license.

https://sites.google.com/site/murmurhash/

--------------------------------------------------------------------------------

src/arrow/util (some portions): Apache 2.0, and 3-clause BSD

Some portions of this module are derived from code in the Chromium project,
Expand Down Expand Up @@ -11932,11 +12003,11 @@ OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR

--------------------------------------------------------------------------------
Dependency : github.com/dustin/go-humanize
Version: v1.0.1
Version: v1.0.0
Licence type (autodetected): MIT
--------------------------------------------------------------------------------

Contents of probable licence file $GOMODCACHE/github.com/dustin/[email protected].1/LICENSE:
Contents of probable licence file $GOMODCACHE/github.com/dustin/[email protected].0/LICENSE:

Copyright (c) 2005-2008 Dustin Sallings <[email protected]>

Expand Down Expand Up @@ -43282,11 +43353,11 @@ SOFTWARE.

--------------------------------------------------------------------------------
Dependency : github.com/mattn/go-isatty
Version: v0.0.17
Version: v0.0.16
Licence type (autodetected): MIT
--------------------------------------------------------------------------------

Contents of probable licence file $GOMODCACHE/github.com/mattn/[email protected].17/LICENSE:
Contents of probable licence file $GOMODCACHE/github.com/mattn/[email protected].16/LICENSE:

Copyright (c) Yasuhiro MATSUMOTO <[email protected]>

Expand Down
6 changes: 3 additions & 3 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ require (
github.com/dolmen-go/contextio v0.0.0-20200217195037-68fc5150bcd5
github.com/dop251/goja v0.0.0-20200831102558-9af81ddcf0e1
github.com/dop251/goja_nodejs v0.0.0-20171011081505-adff31b136e6
github.com/dustin/go-humanize v1.0.1
github.com/dustin/go-humanize v1.0.0
github.com/eapache/go-resiliency v1.2.0
github.com/eclipse/paho.mqtt.golang v1.3.5
github.com/elastic/elastic-agent-client/v7 v7.1.2
Expand Down Expand Up @@ -192,7 +192,7 @@ require (
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/resources/armresources v1.0.0
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v0.4.1
github.com/Azure/go-autorest/autorest/adal v0.9.14
github.com/apache/arrow/go/v12 v12.0.1-0.20230605094802-c153c6d36ccf
github.com/apache/arrow/go/v11 v11.0.0
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.12.7
github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.11.17
github.com/aws/aws-sdk-go-v2/service/cloudformation v1.20.4
Expand Down Expand Up @@ -311,7 +311,7 @@ require (
github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 // indirect
github.com/mailru/easyjson v0.7.6 // indirect
github.com/markbates/pkger v0.17.1 // indirect
github.com/mattn/go-isatty v0.0.17 // indirect
github.com/mattn/go-isatty v0.0.16 // indirect
github.com/mattn/go-runewidth v0.0.9 // indirect
github.com/matttproud/golang_protobuf_extensions v1.0.2-0.20181231171920-c182affec369 // indirect
github.com/minio/asm2plan9s v0.0.0-20200509001527-cdd76441f9d8 // indirect
Expand Down
11 changes: 5 additions & 6 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -233,8 +233,8 @@ github.com/antlr/antlr4/runtime/Go/antlr/v4 v4.0.0-20230305170008-8188dc5388df/g
github.com/aokoli/goutils v1.0.1/go.mod h1:SijmP0QR8LtwsmDs8Yii5Z/S4trXFGFC2oO5g9DP+DQ=
github.com/apache/arrow/go/arrow v0.0.0-20191024131854-af6fa24be0db/go.mod h1:VTxUBvSJ3s3eHAg65PNgrsn5BtqCRPdmyXh6rAfdxN0=
github.com/apache/arrow/go/arrow v0.0.0-20200923215132-ac86123a3f01/go.mod h1:QNYViu/X0HXDHw7m3KXzWSVXIbfUvJqBFe6Gj8/pYA0=
github.com/apache/arrow/go/v12 v12.0.1-0.20230605094802-c153c6d36ccf h1:s5MDQXJmEalr0Urt0rPlX5UAE2BcHTiex/2Lt2O9p84=
github.com/apache/arrow/go/v12 v12.0.1-0.20230605094802-c153c6d36ccf/go.mod h1:weuTY7JvTG/HDPtMQxEUp7pU73vkLWMLpY67QwZ/WWw=
github.com/apache/arrow/go/v11 v11.0.0 h1:hqauxvFQxww+0mEU/2XHG6LT7eZternCZq+A5Yly2uM=
github.com/apache/arrow/go/v11 v11.0.0/go.mod h1:Eg5OsL5H+e299f7u5ssuXsuHQVEGC4xei5aX110hRiI=
github.com/apache/thrift v0.12.0/go.mod h1:cp2SuWMxlEZw2r+iP2GNCdIi4C1qmUzdZFSVb+bacwQ=
github.com/apache/thrift v0.13.0/go.mod h1:cp2SuWMxlEZw2r+iP2GNCdIi4C1qmUzdZFSVb+bacwQ=
github.com/apache/thrift v0.16.0/go.mod h1:PHK3hniurgQaNMZYaCLEqXKsYK8upmhPbmdP2FXSqgU=
Expand Down Expand Up @@ -496,9 +496,8 @@ github.com/dolmen-go/contextio v0.0.0-20200217195037-68fc5150bcd5/go.mod h1:cxc2
github.com/dop251/goja_nodejs v0.0.0-20171011081505-adff31b136e6 h1:RrkoB0pT3gnjXhL/t10BSP1mcr/0Ldea2uMyuBr2SWk=
github.com/dop251/goja_nodejs v0.0.0-20171011081505-adff31b136e6/go.mod h1:hn7BA7c8pLvoGndExHudxTDKZ84Pyvv+90pbBjbTz0Y=
github.com/dustin/go-humanize v0.0.0-20171111073723-bb3d318650d4/go.mod h1:HtrtbFcZ19U5GC7JDqmcUSB87Iq5E25KnS6fMYU6eOk=
github.com/dustin/go-humanize v1.0.0 h1:VSnTsYCnlFHaM2/igO1h6X3HA71jcobQuxemgkq4zYo=
github.com/dustin/go-humanize v1.0.0/go.mod h1:HtrtbFcZ19U5GC7JDqmcUSB87Iq5E25KnS6fMYU6eOk=
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
github.com/eapache/go-resiliency v1.1.0/go.mod h1:kFI+JgMyC7bLPUVY133qvEBtVayf5mFgVsvEsIPBvNs=
github.com/eapache/go-resiliency v1.2.0 h1:v7g92e/KSN71Rq7vSThKaWIq68fL4YHvWyiUKorFR1Q=
github.com/eapache/go-resiliency v1.2.0/go.mod h1:kFI+JgMyC7bLPUVY133qvEBtVayf5mFgVsvEsIPBvNs=
Expand Down Expand Up @@ -1212,8 +1211,8 @@ github.com/mattn/go-isatty v0.0.10/go.mod h1:qgIWMr58cqv1PHHyhnkY9lrL7etaEgOFcME
github.com/mattn/go-isatty v0.0.11/go.mod h1:PhnuNfih5lzO57/f3n+odYbM4JtupLOxQOAqxQCu2WE=
github.com/mattn/go-isatty v0.0.12/go.mod h1:cbi8OIDigv2wuxKPP5vlRcQ1OAZbq2CE4Kysco4FUpU=
github.com/mattn/go-isatty v0.0.14/go.mod h1:7GGIvUiUoEMVVmxf/4nioHXj79iQHKdU27kJ6hsGG94=
github.com/mattn/go-isatty v0.0.17 h1:BTarxUcIeDqL27Mc+vyvdWYSL28zpIhv3RoTdsLMPng=
github.com/mattn/go-isatty v0.0.17/go.mod h1:kYGgaQfpe5nmfYZH+SKPsOc2e4SrIfOl2e/yFXSvRLM=
github.com/mattn/go-isatty v0.0.16 h1:bq3VjFmv/sOjHtdEhmkEV4x1AJtvUvOJ2PFAZ5+peKQ=
github.com/mattn/go-isatty v0.0.16/go.mod h1:kYGgaQfpe5nmfYZH+SKPsOc2e4SrIfOl2e/yFXSvRLM=
github.com/mattn/go-runewidth v0.0.2/go.mod h1:LwmH8dsx7+W8Uxz3IHJYH5QSwggIsqBzpuz5H//U1FU=
github.com/mattn/go-runewidth v0.0.3/go.mod h1:LwmH8dsx7+W8Uxz3IHJYH5QSwggIsqBzpuz5H//U1FU=
github.com/mattn/go-runewidth v0.0.9 h1:Lm995f3rfxdpd6TSmuVCHVb/QhupuXlYr8sCI/QdE+0=
Expand Down
8 changes: 4 additions & 4 deletions x-pack/libbeat/reader/parquet/parquet.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ import (
"fmt"
"io"

"github.com/apache/arrow/go/v12/arrow/memory"
"github.com/apache/arrow/go/v12/parquet"
"github.com/apache/arrow/go/v12/parquet/file"
"github.com/apache/arrow/go/v12/parquet/pqarrow"
"github.com/apache/arrow/go/v11/arrow/memory"
"github.com/apache/arrow/go/v11/parquet"
"github.com/apache/arrow/go/v11/parquet/file"
"github.com/apache/arrow/go/v11/parquet/pqarrow"
)

// BufferedReader parses parquet inputs from io streams.
Expand Down
68 changes: 32 additions & 36 deletions x-pack/libbeat/reader/parquet/parquet_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
package parquet

import (
"bufio"
"bytes"
"encoding/json"
"fmt"
Expand All @@ -14,10 +15,10 @@ import (
"path/filepath"
"testing"

"github.com/apache/arrow/go/v12/arrow"
"github.com/apache/arrow/go/v12/arrow/array"
"github.com/apache/arrow/go/v12/arrow/memory"
"github.com/apache/arrow/go/v12/parquet/pqarrow"
"github.com/apache/arrow/go/v11/arrow"
"github.com/apache/arrow/go/v11/arrow/array"
"github.com/apache/arrow/go/v11/arrow/memory"
"github.com/apache/arrow/go/v11/parquet/pqarrow"
"github.com/stretchr/testify/assert"
)

Expand Down Expand Up @@ -170,22 +171,20 @@ func createRandomParquet(t testing.TB, fname string, numCols int, numRows int) m

func TestParquetWithFiles(t *testing.T) {
testCases := []struct {
parquetFile string
jsonFile string
maxRowsToCompare int
parquetFile string
jsonFile string
}{
{
parquetFile: "cloudtrail.parquet",
jsonFile: "cloudtrail.json",
parquetFile: "vpc_flow.gz.parquet",
jsonFile: "vpc_flow.ndjson",
},
{
parquetFile: "route53.parquet",
jsonFile: "route53.json",
parquetFile: "cloudtrail.parquet",
jsonFile: "cloudtrail.ndjson",
},
{
parquetFile: "vpc_flow.gz.parquet",
jsonFile: "vpc_flow.json",
maxRowsToCompare: 4,
parquetFile: "route53.parquet",
jsonFile: "route53.ndjson",
},
}

Expand All @@ -199,38 +198,43 @@ func TestParquetWithFiles(t *testing.T) {
}
defer parquetFile.Close()

orderedJSON, rows := readJSONFromFile(t, filepath.Join(testDataPath, tc.jsonFile))
jsonFile, err := os.Open(filepath.Join(testDataPath, tc.jsonFile))
if err != nil {
t.Fatalf("Failed to open json test file: %v", err)
}
defer jsonFile.Close()

orderedJSON, rows := readJSONFromFile(t, jsonFile)
cfg := &Config{
// we set ProcessParallel to true as this always has the best performance
ProcessParallel: true,
// batch size is set to 1 because we need to compare individual records one by one
BatchSize: 1,
}
readAndCompareParquetFile(t, cfg, parquetFile, orderedJSON, rows, tc.maxRowsToCompare)
readAndCompareParquetFile(t, cfg, parquetFile, orderedJSON, rows)
})
}
}

// readJSONFromFile reads the json file and returns the data as an ordered map (row number -> json string)
// along with the number of rows in the file
func readJSONFromFile(t *testing.T, filepath string) (map[int]string, int) {
fileBytes, err := os.ReadFile(filepath)
assert.NoError(t, err)
var rawMessages []json.RawMessage
err = json.Unmarshal(fileBytes, &rawMessages)
assert.NoError(t, err)
func readJSONFromFile(t *testing.T, file *os.File) (map[int]string, int) {
data := make(map[int]string)
var row int
for _, rawMsg := range rawMessages {
data[row] = string(rawMsg)
scanner := bufio.NewScanner(file)
row := 0
for scanner.Scan() {
data[row] = scanner.Text()
row++
}
if err := scanner.Err(); err != nil {
t.Fatalf("failed to read ndjson file: %v", err)
}

return data, row
}

// readAndCompareParquetFile reads the parquet file and compares the data with the input data
func readAndCompareParquetFile(t *testing.T, cfg *Config, file *os.File, data map[int]string, rows int, maxRowsToCompare int) {
func readAndCompareParquetFile(t *testing.T, cfg *Config, file *os.File, data map[int]string, rows int) {
sReader, err := NewBufferedReader(file, cfg)
if err != nil {
t.Fatalf("failed to init stream reader: %v", err)
Expand All @@ -244,17 +248,9 @@ func readAndCompareParquetFile(t *testing.T, cfg *Config, file *os.File, data ma
if val != nil {
rowCount = readAndCompareParquetJSON(t, bytes.NewReader(val), data, rowCount)
}
if maxRowsToCompare > 0 && rowCount == maxRowsToCompare {
break
}
}
// if maxRowsToCompare == 0 then we compare the row count
if maxRowsToCompare == 0 {
// asserts of number of rows read is the same as the number of rows from the input file
assert.Equal(t, rows, rowCount)
} else {
assert.EqualValues(t, rowCount, maxRowsToCompare)
}
// asserts of number of rows read is the same as the number of rows from the input file
assert.Equal(t, rows, rowCount)
// closes the stream reader and asserts that there are no errors
err = sReader.Close()
assert.NoError(t, err)
Expand Down
Loading

0 comments on commit 8bffc3d

Please sign in to comment.