Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter server logs so as to analyse user behaviour #670

Open
PatGendre opened this issue Sep 16, 2021 · 8 comments
Open

filter server logs so as to analyse user behaviour #670

PatGendre opened this issue Sep 16, 2021 · 8 comments

Comments

@PatGendre
Copy link
Contributor

For our project in la Rochelle, France, we'd like to monitor how users actually interact with the app.
Maybe another prior or current project has done the same thing already?
The idea was to filter out the log files and extract (anonymous/aggregate) daily usage stats, i.e.
how often did users open the app ? or look at the dashboard or the journal, click on a notification, etc.

@shankari
Copy link
Contributor

shankari commented Sep 16, 2021

@PatGendre would this be close to what you are looking for?
https://github.com/e-mission/e-mission-eval-private-data/blob/master/emission_overview_trb_2017/app_usage_metrics.ipynb
or
https://github.com/e-mission/e-mission-eval-private-data/tree/master/tripaware_2017/Uncleared%20Outputs%20Notebooks

In general, I think you want to look at current entries for stats/client_nav_event and stats/client_time, and add new ones if you don't find what you are looking for.

similar to:
e-mission/e-mission-phone#770
or
e-mission/e-mission-phone#772

You can make the same calls using a javascript visualization framework and the REST API, or you can use the public dashboard code to run jupyter notebooks (like the ones above) once a day and generate simple static images on a website (similar to https://dashboard.canbikeco.org/)

https://github.com/e-mission/em-public-dashboard

@shankari
Copy link
Contributor

Would be really interested in seeing which option you go with, please share!

@PatGendre
Copy link
Contributor Author

@shankari that's great !

Actually I was thinking of a simple daily grep of a few expressions in the log files, that would be exploited manually by the researchers, but this notebook way is certainly more general (and shows how well designed the software is ;-)

It may take a while before I can send you a screenshot with our data, but I will tell you what we do in the end :-)

@PatGendre
Copy link
Contributor Author

Hi @shankari I tested the notebook, and encountered an error for this instruction

server_api_calls_df = agts.get_data_df("stats/server_api_time", time_query=sep_dec_tq_data_ts)

`DEBUG:root:curr_query = {'$or': [{'metadata.key': 'stats/server_api_time'}], 'data.ts': {'$lte': <bound method Arrow.timestamp of <Arrow [2021-09-20T09:53:58.674905+02:00]>>, '$gte': <bound method Arrow.timestamp of <Arrow [2021-02-02T00:00:00+00:00]>>}}, sort_key = None
DEBUG:root:orig_ts_db_keys = ['stats/server_api_time'], analysis_ts_db_keys = []


InvalidDocument Traceback (most recent call last)
in
----> 1 server_api_calls_df = agts.get_data_df("stats/server_api_time", time_query=sep_dec_tq_data_ts)

~/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py in get_data_df(self, key, time_query, geo_query, extra_query_list, map_fn)
261 :return:
262 """
--> 263 result_it = self.find_entries([key], time_query, geo_query, extra_query_list)
264 return self.to_data_df(key, result_it, map_fn)
265

~/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py in find_entries(self, key_list, time_query, geo_query, extra_query_list)
194 geo_query,
195 extra_query_list,
--> 196 sort_key)
197
198 (analysis_ts_db_count, analysis_ts_db_result) = self._get_entries_for_timeseries(self.analysis_timeseries_db,

~/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py in _get_entries_for_timeseries(self, tsdb, key_list, time_query, geo_query, extra_query_list, sort_key)
214 extra_query_list)
215 ts_db_cursor = tsdb.find(ts_query)
--> 216 ts_db_count = tsdb.count_documents(ts_query)
217 if sort_key is None:
218 ts_db_result = ts_db_cursor

~/miniconda3/envs/emission/lib/python3.7/site-packages/pymongo/collection.py in count_documents(self, filter, session, **kwargs)
1784
1785 return self.__database.client._retryable_read(
-> 1786 _cmd, self._read_preference_for(session), session)
1787
1788 def count(self, filter=None, session=None, **kwargs):

~/miniconda3/envs/emission/lib/python3.7/site-packages/pymongo/mongo_client.py in _retryable_read(self, func, read_pref, session, address, retryable, exhaust)
1469 # not support retryable reads, raise the last error.
1470 raise last_error
-> 1471 return func(session, server, sock_info, slave_ok)
1472 except ServerSelectionTimeoutError:
1473 if retrying:

~/miniconda3/envs/emission/lib/python3.7/site-packages/pymongo/collection.py in _cmd(session, server, sock_info, slave_ok)
1778 def _cmd(session, server, sock_info, slave_ok):
1779 result = self._aggregate_one_result(
-> 1780 sock_info, slave_ok, cmd, collation, session)
1781 if not result:
1782 return 0

~/miniconda3/envs/emission/lib/python3.7/site-packages/pymongo/collection.py in _aggregate_one_result(self, sock_info, slave_ok, cmd, collation, session)
1675 read_concern=self.read_concern,
1676 collation=collation,
-> 1677 session=session)
1678 batch = result['cursor']['firstBatch']
1679 return batch[0] if batch else None

~/miniconda3/envs/emission/lib/python3.7/site-packages/pymongo/collection.py in _command(self, sock_info, command, slave_ok, read_preference, codec_options, check, allowable_errors, read_concern, write_concern, collation, session, retryable_write, user_fields)
251 client=self.__database.client,
252 retryable_write=retryable_write,
--> 253 user_fields=user_fields)
254
255 def __create(self, options, collation, session):

~/miniconda3/envs/emission/lib/python3.7/site-packages/pymongo/pool.py in command(self, dbname, spec, slave_ok, read_preference, codec_options, check, allowable_errors, check_keys, read_concern, write_concern, parse_write_concern_error, collation, session, client, retryable_write, publish_events, user_fields, exhaust_allowed)
697 # Catch socket.error, KeyboardInterrupt, etc. and close ourselves.
698 except BaseException as error:
--> 699 self._raise_connection_failure(error)
700
701 def send_message(self, message, max_doc_size):

~/miniconda3/envs/emission/lib/python3.7/site-packages/pymongo/pool.py in command(self, dbname, spec, slave_ok, read_preference, codec_options, check, allowable_errors, check_keys, read_concern, write_concern, parse_write_concern_error, collation, session, client, retryable_write, publish_events, user_fields, exhaust_allowed)
692 unacknowledged=unacknowledged,
693 user_fields=user_fields,
--> 694 exhaust_allowed=exhaust_allowed)
695 except OperationFailure:
696 raise

~/miniconda3/envs/emission/lib/python3.7/site-packages/pymongo/network.py in command(sock_info, dbname, spec, slave_ok, is_mongos, read_preference, codec_options, session, client, check, allowable_errors, address, check_keys, listeners, max_bson_size, read_concern, parse_write_concern_error, collation, compression_ctx, use_op_msg, unacknowledged, user_fields, exhaust_allowed)
120 request_id, msg, size, max_doc_size = message._op_msg(
121 flags, spec, dbname, read_preference, slave_ok, check_keys,
--> 122 codec_options, ctx=compression_ctx)
123 # If this is an unacknowledged write then make sure the encoded doc(s)
124 # are small enough, otherwise rely on the server to return an error.

~/miniconda3/envs/emission/lib/python3.7/site-packages/pymongo/message.py in _op_msg(flags, command, dbname, read_preference, slave_ok, check_keys, opts, ctx)
713 flags, command, identifier, docs, check_keys, opts, ctx)
714 return _op_msg_uncompressed(
--> 715 flags, command, identifier, docs, check_keys, opts)
716 finally:
717 # Add the field back to the command.

InvalidDocument: cannot encode object: <bound method Arrow.timestamp of <Arrow [2021-09-20T09:53:58.674905+02:00]>>, of type: <class 'method'>

`

@PatGendre
Copy link
Contributor Author

I suspect this is due to Arrow, so
I tried the call without the time_query server_api_calls_df = agts.get_data_df("stats/server_api_time")
and this worked (and is fine because I want to query all entries).

@PatGendre
Copy link
Contributor Author

Except this Arrow exception with the timequery, the rest of the notebook seems to work fine, thanks ! (I don't have time to investigate, though...)

@PatGendre
Copy link
Contributor Author

I've also noticed that the API calls stored in the database contain many calls apparently not linked to e-mission (see attached result for the non_timeline_calls_list), so I suspect some tentative attacks to our server ;-)
api_call_list.txt

@shankari
Copy link
Contributor

Yup, these are definitely attack probes

 'GET_/dana-na/../dana/html5acc/guacamole/../../../../../../etc/passwd',
 'GET_/dana-na/../dana/html5acc/guacamole/../../../../../../etc/passwd_cputime',
 'GET_/vpn/../vpns/cfg/smb.conf',
 'GET_/vpn/../vpns/cfg/smb.conf_cputime',

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants