read_csv epoch parsing with pd.to_datetime() #3757

nipunbatra · 2013-06-05T11:47:24Z

!head 001EC00CC49D_processed        

timestamp,frequency,voltage,active_power,energy,cost,current,reactive_power,apparent_power,power_factor,phase_angle
1369210417,49.99,234.187,1.138,0.131,0.000,0.014,3.036,3.242,0.351,69.458
1369210418,49.98,234.276,1.043,0.131,0.000,0.014,3.183,3.350,0.311,71.855
1369210419,49.97,234.306,1.043,0.132,0.000,0.014,3.100,3.271,0.319,71.411
1369210420,49.97,234.288,1.045,0.132,0.000,0.014,3.155,3.324,0.315,71.668
1369210421,49.98,234.330,1.047,0.133,0.000,0.014,3.099,3.271,0.320,71.332
1369210422,49.97,234.228,1.004,0.134,0.000,0.014,3.140,3.296,0.304,72.275
1369210423,49.99,234.260,1.036,0.134,0.000,0.013,2.986,3.161,0.328,70.861
1369210424,49.97,234.292,1.073,0.135,0.000,0.014,3.089,3.270,0.328,70.842
1369210425,49.98,234.259,1.073,0.135,0.000,0.014,3.089,3.270,0.328,70.839

def epoch_to_date(timestamp):
    return datetime.datetime.fromtimestamp(int(timestamp))

def epoch_to_date_2(timestamp):
    return pd.to_datetime(float(timestamp)*int(1e9))

df=pd.read_csv('001EC00CC49D_processed',parse_dates=[0],date_parser=epoch_to_date,index_col=0,error_bad_lines=False);
df2=pd.read_csv('001EC00CC49D_processed',parse_dates=[0],date_parser=epoch_to_date_2,index_col=0,error_bad_lines=False);

df
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 327818 entries, 2013-05-22 13:43:37 to 2013-06-05 12:22:16
Data columns (total 10 columns):
frequency         327810  non-null values
...
dtypes: float64(9), object(1)

df2
<class 'pandas.core.frame.DataFrame'>
Index: 327818 entries, 1.369210417e+18 to 1.370415136e+18
Data columns (total 10 columns):
frequency         327810  non-null values
....
dtypes: float64(9), object(1)

Using pd.to_datetime() took more time and also index is not DateTime
Moreover, since a lot of people tend to record unix timestamps in order to escape different time formats, might be handy to add this functionality inbuilt.

The text was updated successfully, but these errors were encountered:

jreback · 2013-06-05T12:12:29Z

This is a MUCH faster way of doing this

In [17]: df=pd.read_csv('proc.txt',index_col=0,error_bad_lines=False)

In [18]: df
Out[18]: 
            frequency  voltage  active_power  energy  cost  current  reactive_power  apparent_power  power_factor  phase_angle
timestamp                                                                                                                     
1369210417      49.99  234.187         1.138   0.131     0    0.014           3.036           3.242         0.351       69.458
1369210418      49.98  234.276         1.043   0.131     0    0.014           3.183           3.350         0.311       71.855
1369210419      49.97  234.306         1.043   0.132     0    0.014           3.100           3.271         0.319       71.411
1369210420      49.97  234.288         1.045   0.132     0    0.014           3.155           3.324         0.315       71.668
1369210421      49.98  234.330         1.047   0.133     0    0.014           3.099           3.271         0.320       71.332
1369210422      49.97  234.228         1.004   0.134     0    0.014           3.140           3.296         0.304       72.275
1369210423      49.99  234.260         1.036   0.134     0    0.013           2.986           3.161         0.328       70.861
1369210424      49.97  234.292         1.073   0.135     0    0.014           3.089           3.270         0.328       70.842
1369210425      49.98  234.259         1.073   0.135     0    0.014           3.089           3.270         0.328       70.839

In [19]: df.index = pd.to_datetime((df.index.values*1e9).astype(int))

In [20]: df
Out[20]: 
                     frequency  voltage  active_power  energy  cost  current  reactive_power  apparent_power  power_factor  phase_angle
2013-05-22 08:13:37      49.99  234.187         1.138   0.131     0    0.014           3.036           3.242         0.351       69.458
2013-05-22 08:13:38      49.98  234.276         1.043   0.131     0    0.014           3.183           3.350         0.311       71.855
2013-05-22 08:13:39      49.97  234.306         1.043   0.132     0    0.014           3.100           3.271         0.319       71.411
2013-05-22 08:13:40      49.97  234.288         1.045   0.132     0    0.014           3.155           3.324         0.315       71.668
2013-05-22 08:13:41      49.98  234.330         1.047   0.133     0    0.014           3.099           3.271         0.320       71.332
2013-05-22 08:13:42      49.97  234.228         1.004   0.134     0    0.014           3.140           3.296         0.304       72.275
2013-05-22 08:13:43      49.99  234.260         1.036   0.134     0    0.013           2.986           3.161         0.328       70.861
2013-05-22 08:13:44      49.97  234.292         1.073   0.135     0    0.014           3.089           3.270         0.328       70.842
2013-05-22 08:13:45      49.98  234.259         1.073   0.135     0    0.014           3.089           3.270         0.328       70.839

jreback · 2013-06-05T12:47:09Z

@nipunreddevil also..your conversion using pd.to_datetime needs an integer (and not a float) in order to create a DateTimeIndex (and is slower because its trying several things); but converting all at once (and I did above) is much faster in any event

nipunbatra · 2013-06-05T13:39:03Z

Great!
Can there be dedicated section for such stuff in official documentation. Lest i keep bugging you guys :)
Would be great if this operation of using pd.to_datetime for such csv's be abstracted out from the user.
Also how do you post code here on issues? I struggle with copying stuff from notebook and then using 4 spaces per line.
Should this be closed?

jreback · 2013-06-05T13:46:36Z

click on the 'github flavored markdown' to the right of the comment box, will tell you howthis

This might make a nice little doc for the cookbook or main docs, want to put a short example together?

It is really hard for pandas to 'figure out' what the user wants sometimes, that's why we have these options!
that said it should be possible for the user to indicate that a date column is an epoch date and do the right thing

If you could post a small example I'll update the docs, then close

thanks

nipunbatra · 2013-06-05T14:20:02Z

Do you have some recommended dataset or should i put my own (can't put my
whole since it is not published yet)?
Would surely like to contribute!

On Wed, Jun 5, 2013 at 7:16 PM, jreback [email protected] wrote:

click on the 'github flavored markdown' to the right of the comment box,
will tell you howthis

This might make a nice little doc for the cookbook or main docs, want to
put a short example together?

It is really hard for pandas to 'figure out' what the user wants
sometimes, that's why we have these options!
that said it should be possible for the user to indicate that a date
column is an epoch date and do the right thing

If you could post a small example I'll update the docs, then close

thanks

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/3757#issuecomment-18976294
.

jreback · 2013-06-05T14:25:55Z

just make up a simple example, e.g. just take the first 10 rows and first couple of columns from your dataset (replace with random data), and rename the oclumns to ABCD....

nipunbatra · 2013-06-05T15:00:02Z

Here is a quick attempt from my side: http://nbviewer.ipython.org/5714493
Haven't talked about timezone as i am not very sure about it myself as of
now.
The gist is located here https://gist.github.com/nipunreddevil/5714493 .
Let me know what all can be/ should be added.

On Wed, Jun 5, 2013 at 7:56 PM, jreback [email protected] wrote:

just make up a simple example, e.g. just take the first 10 rows and first
couple of columns from your dataset (replace with random data), and rename
the oclumns to ABCD....

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/3757#issuecomment-18978962
.

jreback · 2013-06-05T15:03:54Z

I would just do using read_csv, after all this is pandas!
make a bit shorter if you can, can we link directly to this viewer? (e.g. is it 'permanent')?

I am just going to put a linke (and short description) here:
http://pandas.pydata.org/pandas-docs/dev/cookbook.html

nipunbatra · 2013-06-05T15:12:52Z

I think this should be permanent. nbviewer doesn't host the stuff. It only
pulls it from the source (in my case gist). Pretty much a standard way, so
should be up if Github is up!
Ok. Would make it shorter.
My pandas version 0.11.0 doesn't allow me to specify time zone in
.to_datetime() reference #3746, so haven't added that bit yet.

On Wed, Jun 5, 2013 at 8:34 PM, jreback [email protected] wrote:

I would just do using read_csv, after all this is pandas!
make a bit shorter if you can, can we link directly to this viewer? (e.g.
is it 'permanent')?

I am just going to put a linke (and short description) here:
http://pandas.pydata.org/pandas-docs/dev/cookbook.html

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/3757#issuecomment-18982463
.

jreback · 2013-06-05T15:19:51Z

ok...do you want to do a PR to add it?

since you are providing a utc date (by definition epoc seconds are utc)

you can do:
index.tz_localize('your_time_zone') to localize (you can show that too)

nipunbatra · 2013-06-05T15:30:08Z

Haven't done a Pull Request before. Would be great if you can help with the
same.

On Wed, Jun 5, 2013 at 8:50 PM, jreback [email protected] wrote:

ok...do you want to do a PR to add it?

since you are providing a utc date (by definition epoc seconds are utc)

you can do:
index.tz_localize('your_time_zone') to localize (you can show that too)

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/3757#issuecomment-18983712
.

jreback · 2013-06-05T15:33:05Z

read this, pretty much explains how to add to the docs: http://pandas.pydata.org/developers.html

nipunbatra · 2013-06-05T15:41:15Z

Would get back pretty soon after reviewing the same.

On Wed, Jun 5, 2013 at 9:03 PM, jreback [email protected] wrote:

read this, pretty much explains how to add to the docs:
http://pandas.pydata.org/developers.html

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/3757#issuecomment-18984670
.

jreback · 2013-06-06T11:57:40Z

thanks for the PR!

nipunbatra · 2013-06-06T15:34:59Z

Thanks for merging my first PR in a big project!
Any other similar stuff where i can do a PR?

On Thu, Jun 6, 2013 at 5:27 PM, jreback [email protected] wrote:

thanks for the PR!

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/3757#issuecomment-19040343
.

jreback · 2013-06-06T15:52:15Z

in issues, filter on 'Good as first PR' / DOCS

* Can now load arbitrary channel. * Loads mains data into memory. * added a `load_redd.py` example. Also changed a small detail in the comments for Electricity: * REDD channels are indexed from 1 so I figured that we should probably index channels and meters from 1 too.

nipunbatra mentioned this issue Jun 6, 2013

Cookbook epoch handling #3771

Merged

jreback closed this as completed in #3771 Jun 6, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_csv epoch parsing with pd.to_datetime() #3757

read_csv epoch parsing with pd.to_datetime() #3757

nipunbatra commented Jun 5, 2013

jreback commented Jun 5, 2013

jreback commented Jun 5, 2013

nipunbatra commented Jun 5, 2013

jreback commented Jun 5, 2013

nipunbatra commented Jun 5, 2013

jreback commented Jun 5, 2013

nipunbatra commented Jun 5, 2013

jreback commented Jun 5, 2013

nipunbatra commented Jun 5, 2013

jreback commented Jun 5, 2013

nipunbatra commented Jun 5, 2013

jreback commented Jun 5, 2013

nipunbatra commented Jun 5, 2013

jreback commented Jun 6, 2013

nipunbatra commented Jun 6, 2013

jreback commented Jun 6, 2013

read_csv epoch parsing with pd.to_datetime() #3757

read_csv epoch parsing with pd.to_datetime() #3757

Comments

nipunbatra commented Jun 5, 2013

jreback commented Jun 5, 2013

jreback commented Jun 5, 2013

nipunbatra commented Jun 5, 2013

jreback commented Jun 5, 2013

nipunbatra commented Jun 5, 2013

jreback commented Jun 5, 2013

nipunbatra commented Jun 5, 2013

jreback commented Jun 5, 2013

nipunbatra commented Jun 5, 2013

jreback commented Jun 5, 2013

nipunbatra commented Jun 5, 2013

jreback commented Jun 5, 2013

nipunbatra commented Jun 5, 2013

jreback commented Jun 6, 2013

nipunbatra commented Jun 6, 2013

jreback commented Jun 6, 2013