-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV文件字段存在换行符,如何正常读写存储到hive表? #6
Labels
question
Further information is requested
Comments
第一个问题比较好解决: ides> spark.read.option("multiline",true).option("header", true).csv("file:///Users/sgr/test").show ides> load csv.`file:///Users/sgr/test` where multiline='true' and header='true' as tb;
| tb.show
第二个问题可能有些麻烦: save tb overwrite into hive.`test.sgrtb` where fileFormat='csv'; 在hive中的结构是: CREATE TABLE `sgrtb`(
`id` string,
`name` string,
`desc` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://hadoop:9000/usr/hive/warehouse/test.db/sgrtb' 不过可以通过将fileFormat指定成别的格式解决,比如 save tb overwrite into hive.`test.sgrtb` where fileFormat='parquet';
hive表结构为: CREATE TABLE `sgrtb`(
`id` string,
`name` string,
`desc` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://hadoop:9000/usr/hive/warehouse/test.db/sgrtb' 不过,如果我在保存hive表的时候,一定要将数据保存成csv格式的文件该怎么办呢?? |
csv数据中有\r,怎么处理呢?multiLine=true,escape和quote进行转义,都不行 |
你用的啥在处理,有示例文件吗 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
csv文件某些文本字段里存在换行符,如:
问题是:
The text was updated successfully, but these errors were encountered: