Scrub CSV.File types for Tables.Schema (#811)

Fixes #808. The issue here is that we use an internal `PooledString` type while parsing to signal a column is currently being pooled. Once we're done parsing, however, there's no value in keeping `PooledString` around, and indeed, `Tables.rowtable` even gets confused because it's expecting a `PooledString` object but we always return `String` objects when indexing string columns, pooled or not. The fix here is to scrub these PooledString type columns to correct the `Tables.Schema` on `CSV.File`. Let's see if CI points out any problems with this approach.
JuliaData · Feb 25, 2021 · c386238 · c386238
1 parent 425f2de
commit c386238
Show file tree

Hide file tree

Showing 2 changed files with 6 additions and 0 deletions.
diff --git a/src/file.jl b/src/file.jl
@@ -331,6 +331,8 @@ function File(h::Header;
             if types[i] === Union{}
                 types[i] = Missing
                 columns[i] = MissingVector(finalrows)
+            elseif schematype(types[i]) !== types[i]
+                types[i] = schematype(types[i])
             end
         end
     end

diff --git a/src/utils.jl b/src/utils.jl
@@ -7,6 +7,10 @@ with the output array type being a `PooledArray`.
 """
 struct PooledString <: AbstractString end
 
+schematype(::Type{T}) where {T} = T
+schematype(::Type{PooledString}) = String
+schematype(::Type{Union{Missing, PooledString}}) = Union{Missing, String}
+
 # PointerString is an internal-only type for efficiently tracking string data + length
 # all strings indexed from a column/row will always be a full String
 # specifically, it allows avoiding materializing full Strings for pooled string columns while parsing