Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text fields in local NASIS are truncated at 255 characters #145

Closed
dylanbeaudette opened this issue Sep 28, 2020 · 10 comments
Closed

text fields in local NASIS are truncated at 255 characters #145

dylanbeaudette opened this issue Sep 28, 2020 · 10 comments

Comments

@dylanbeaudette
Copy link
Member

This seems to have occured after the recent switch to a newer ODBC driver.

Put something into your selected set and try it.

library(RODBC)

q <- "SELECT projectiid, uprojectid, projectname, projectdesc
  
  FROM 
  project_View_1
  ;
  "
# setup connection local NASIS
channel <- RODBC::odbcDriverConnect(connection=getOption('soilDB.NASIS.credentials'))

# exec query
d <- RODBC::sqlQuery(channel, q, stringsAsFactors=FALSE)

# close connection
RODBC::odbcClose(channel)

d
@brownag
Copy link
Member

brownag commented Sep 28, 2020

Interesting. Perhaps an interaction with {RODBC} package, as with {DBI}+{odbc} I get max 1500 characters not 255.

library(RODBC)

q <- "SELECT projectiid, uprojectid, projectname, projectdesc
  
  FROM 
  project_View_1
  ;
  "
# setup connection local NASIS
channel <- soilDB:::.openNASISchannel()

# exec query
d <- RODBC::sqlQuery(channel, q, stringsAsFactors=FALSE)

# close connection
RODBC::odbcClose(channel)

max(nchar(d$projectdesc), na.rm = T)
# 255
library(DBI)
con <- dbConnect(odbc::odbc(), 
                 DSN = "nasis_local", 
                 UID="NasisSqlRO",
                 PWD="nasisRe@d0n1y365")
res <- dbSendQuery(con, q)
dat <- dbFetch(res)
max(nchar(dat$projectdesc), na.rm=T)
# 1500

DBI::dbDisconnect(con)

@dylanbeaudette
Copy link
Member Author

That is odd, RODBC has worked for a long time without issue. Any thoughts on switching over? Is the 1500 limit a self-imposed limit or the driver?

@brownag
Copy link
Member

brownag commented Sep 28, 2020

I have wondered whether something might happen that would tip the tables in favor of DBI/odbc. Until that little snafu with R 4.0/Rcpp i hadn't seriously considered it.

If i drop the _View_1 from the query I get 9999 which IIRC is the max for that field in nasis. So, 1500 is max in my selected set (I have ~all 2-SON projects I think)

@brownag
Copy link
Member

brownag commented Sep 28, 2020

Castting varchar to ntext seems to do the trick with {RODBC}.

library(RODBC)

q <- "SELECT projectiid, uprojectid, projectname, CAST(projectdesc AS ntext) as 'projectdesc'
  
  FROM 
  project_View_1
  ;
  "
# setup connection local NASIS
channel <- soilDB:::.openNASISchannel()

# exec query
d <- RODBC::sqlQuery(channel, q, stringsAsFactors=FALSE)

max(nchar(d$projectdesc), na.rm=T)
# close connection

@dylanbeaudette
Copy link
Member Author

That is a good temporary fix, but maybe a long-term strategy is to switch over to DBI + odbc. What else has silently changed since the ODBC driver version change?

@brownag
Copy link
Member

brownag commented Sep 29, 2020

https://stackoverflow.com/questions/44249788/rodbc-and-microsoft-sql-server-truncating-long-character-strings

This seems to have been known behavior for some time, but we were way behind the curve with updating to this new MS SQL driver... I found related issues dating back almost 10 years but the above one seems most relevant. I would support moving to DBI/odbc.

@brownag
Copy link
Member

brownag commented Sep 29, 2020

Explanation from Brian Ripley. I guess the answer is to go to DBI/odbc if we want a more "sane" default for varchar(MAX)

Some DBMSs such as PostgreSQL and Microsoft SQL Server allow variable-length
character strings of length only limited by resources. These do not fit well with the
ODBC model that requires buffers to be allocated to transfer character data, and so
such types may be subjected (by the ODBC driver) to a fixed limit or not work at
all.

@brownag
Copy link
Member

brownag commented Sep 29, 2020

image

An added benefit of migrating to DBI for NASIS workflows is that it will integrate beautifully with RStudio Connections tab, if one is so inclined.

@dylanbeaudette
Copy link
Member Author

dylanbeaudette commented Sep 29, 2020

That is nice and reason enough to switch over: much simpler than the DB layout PDFs or manual inquiry.

This issue started as part of QA but there are pending tasks which could be affected by this new (old) bug. Map unit text notes, certification notes, and possibly even some aspects of calculations / interpretations. I have not tested those yet but they do make use of very large text chunks.

I'm going to do some testing and then make the case for a full transition to {DBI} / {odbc}.

Done here, moving on to #146.

@brownag
Copy link
Member

brownag commented Sep 29, 2020

Here are a couple interesting links related to getting long data from MS SQL -- it appears that nanodbc (used by {odbc}) has said that this is essentially not their problem to fix, and it is clear that Microsoft has no intention of fixing it -- it appears to be a "feature" not a "bug" that appeared after moving from "SQL Server" driver to later MS drivers. I am not sure why {odbc} seems to handle this case "better" out of the box since it is apparently related to the MS side of things... may just be that nanodbc is a bit more sensible, or they have addressed this some other way in odbc.

https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/getting-long-data?view=sql-server-ver15
nanodbc/nanodbc#149
r-dbi/odbc#112

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants