Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INT96 is convert to datetime #592

Open
shiyuhang0 opened this issue Jul 17, 2024 · 1 comment
Open

INT96 is convert to datetime #592

shiyuhang0 opened this issue Jul 17, 2024 · 1 comment

Comments

@shiyuhang0
Copy link

write with csv

func main() {
	var err error
	md := []string{
		"name=id, type=INT96",
		"name=name, type=BYTE_ARRAY",
	}

	fw, err := local.NewLocalFileWriter("csv.parquet")
	if err != nil {
		log.Println("Can't open file", err)
		return
	}
	pw, err := writer.NewCSVWriter(md, fw, 4)
	if err != nil {
		log.Println("Can't create csv writer", err)
		return
	}

	data := []string{"18446744073709551615", "b"}
	rec := make([]*string, len(data))
	for j := 0; j < len(data); j++ {
		rec[j] = &data[j]
	}
	if err = pw.WriteString(rec); err != nil {
		log.Println("WriteString error", err)
	}

	if err = pw.WriteStop(); err != nil {
		log.Println("WriteStop error", err)
	}
	for _, s := range pw.Footer.Schema {
		println(fmt.Sprintf("%v", *s))
	}
	log.Println("Write Finished")
	fw.Close()
}

read with duckdb

id,name
"4714-11-24 (BC) 00:00:00",b

id become something like datetime, is it reasonable?

@hangxie
Copy link
Contributor

hangxie commented Oct 3, 2024

First INT96 was deprecated more than 6 years ago https://github.com/apache/parquet-format/blob/master/CHANGES.md#version-250, you may consider move to something else.

I feel like some tools/libraries don't interpret INT96 properly, I run your code and used online parquet viewers got different results:

  1. https://www.parquet-viewer.com/ gives 1717-12-28 19:20:10.805067775
  2. https://dataconverter.io/view/parquet/ gives 4713-01-01T11:59:59.999Z
  3. https://parquetreader.com/result gives 1970-01-01

Various CLI tools also returned different result, if you expect 1717-12-28 19:20:10.805067775 you may want to try the tool i built:

$ parquet-tools cat csv.parquet
[{"Id":"1717-12-28T19:20:10.805067776Z","Name":"b"}]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants