New York City has publicly released Taxi trip data for every trip from every taxi from 2014 to 2018. The web application shows the dataset of the pick-up locations on the map in New York City during January, 2015. When zooming down to an individual point, it will be displayed as a blue circle marker. On the other hand, zooming out will show clusters representing points in the specific area.
- Clean dataset by removing invalid points and points outside New York City - Python
Ref: New York City Bourough Boundry - Create 1 database including 2 collections - MongoDB
One includes more details(i.e fields) and the other only contains id and the array of longitude and latitude of the pick-up location.
First collection will provide more details.
The second one could provide a faster database query when we only need to show pick-up locations on the map without further details.
The dataset contains 12.5 millions records and leads to the slow query (5-10 seconds for maximum scale) when the map requests for more data rendering at the client-side. (zoom out) Indexing and redesigning the data structure in the database can not help achieve a better performance. A database shard may be required to spread the data to different server to increase efficiency of the database query.