Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting dates from Excel #116

Closed
kper opened this issue Feb 18, 2018 · 7 comments
Closed

Extracting dates from Excel #116

kper opened this issue Feb 18, 2018 · 7 comments

Comments

@kper
Copy link

kper commented Feb 18, 2018

In my excel sheet, I've got a column with dates. Unfortunately, calamine::DateType has no variant for dates and when I handle a date as a string I get for 29.08.2012 -> 41181.

It would be nice to have built-in date conversion

@kper
Copy link
Author

kper commented Feb 18, 2018

Notes:

Excel stores dates as sequential serial numbers so that they can be used in calculations. By default, January 1, 1900 is serial number 1, and January 1, 2008 is serial number 39448 because it is 39,447 days after January 1, 1900.

Source: https://support.office.com/en-us/article/datevalue-function-df8b07d4-7761-4a93-bc33-b7471bbff252

@tafia
Copy link
Owner

tafia commented Mar 1, 2018

Please refer to #95.

The issue is that the logic to decide whether the cell is a Date, an integer or a datetime is not yet implemented and unfortunately not straightforward.

@sts10
Copy link

sts10 commented Aug 18, 2018

I'm just learning Rust, but I got a work-around that might be a starting point?

Basically I get an i64 from any cells where I know ahead of time there will be a date (in other words, I haven't solved the detection logic mentioned above). These i64s are, as @kper's comment clued me to, the number of days since 1900. I then wrote the function below, using the chrono crate's NaiveDate type, to get a workable Date type.

extern crate chrono;
use chrono::NaiveDate; 
// ...
fn from_days_since_1900(days_since_1900: i64) -> NaiveDate {
    let d1900 = NaiveDate::from_ymd(1900, 1, 1);
    d1900 + Duration::days(days_since_1900 - 2)
}

I believe that quirky - 2 at the end is necessary for accuracy, perhaps to compensate for the Excel bug mentioned above? Anyway, hope this is helpful for other people who landed here looking to better handle dates, and/or an eventual PR!

@tafia
Copy link
Owner

tafia commented Aug 20, 2018

Thanks @sts10.

I believe that quirky - 2 at the end is necessary for accuracy, perhaps to compensate for the Excel bug mentioned above?

On js-xlsx side, they seem to have a different logic

I don't have a lot of time to dig much deeper. I'd be happy to add a util function anyway so people can manually use it now, and hopefully it'll get converted automatically in the future.

@sts10
Copy link

sts10 commented Aug 20, 2018

Oh gosh, that's gonna take me a while to figure out. Luckily, I'm only doing dates (as opposed to datetimes) and they'll all be today's date, so nothing far in the future or past. So I'm hoping my little - 2 does the trick for me. But regardless, thanks for pointing to that part of their code.

@wilfredwee
Copy link

I'm solving a similar issue and bumped into this thread. I took some time to understand SheetJS' implementation, here's the commented code, hopefully it'll be helpful to future readers:

// Define `basedate` as Dec 30th, 1899. In current locale timezone.
var basedate = new Date(1899, 11, 30, 0, 0, 0);

// (new Date().getTimezoneOffset() - basedate.getTimezoneOffset())
// The section of code above basically seems unnecessary to me. It'll always result in 0 because both Date objects have the same timezone. Perhaps it's for browser compatibility?
// Which leaves basedate.getTime(): This returns the unix epoch time of Dec 30th, 1899.
// Note that dnthresh is thus equal to Excel's epoch (Jan 1st 1900 minus 2 days).
// This calculation is not entirely accurate, and I *suspect* will give a different answer than Excel if there's a date function referring to some time on or before Feb 28th, 1900. But should be fine any time after.
var dnthresh = basedate.getTime() + (new Date().getTimezoneOffset() - basedate.getTimezoneOffset()) * 60000;

function datenum(v/*:Date*/, date1904/*:?boolean*/)/*:number*/ {
        // Get the unix epoch of the argument.
	var epoch = v.getTime();
       // If using Excel's 1904 date format, shift the argument's date by 1462 days backwards.
      // I think a more intuitive approach is just to shift the Excel epoch itself, add 1462 days to dnthresh.
	if(date1904) epoch -= 1462*24*60*60*1000;

      // Now, get time since Excel epoch by subtracting the Excel epoch time from the arg's date.
     // This is no different from any other epoch date calculations.
	return (epoch - dnthresh) / (24 * 60 * 60 * 1000);
}

function numdate(v/*:number*/)/*:Date*/ {
  // v is days since Jan 1st, 1900
	var out = new Date();
	out.setTime(v * 24 * 60 * 60 * 1000 + dnthresh);
	return out;
}

In short, @sts10 basically has the same implementation as SheetsJS. However, SheetsJS implementation doesn't seem completely correct.

@tafia
Copy link
Owner

tafia commented Mar 7, 2020

Thanks for the analysis!
I believe the dates feature now implements a working as_dates fn. This particular issue remains open because the date detection is not automatic (it is up to the caller to convert float to date).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants