The problem at hand is how we use Data Mining techniques in visualizing, and analyzing, water quality patterns across locations in various states so that people in that location can be informed of chemical characteristics and proper measures can be taken. Using this data set we would like to:
- Obtain correlations between different attributes say temperature – nitrate levels etc.
- Conduct a regional based analysis of water quality in different areas and spot danger zones.
Some of the Data Mining techniques that may prove useful for this analysis are preprocessing techniques like:
- Principle component analysis
- Attribute subset selection
- Sampling
- Aggregation
- Binning etc.
- Analysis techniques like:
- Clustering
- Outlier Analysis etc.
The data set is comprised of around 1300 each record is of 27 attributes. The main attributes are
Station code of location
- Name of the location.
- Temperature of the location
- Dissolved Oxygen (D.O)
- pH of water
- Conductivity, B.O.D, Nitrate level and Nitrite level
- Fecal Coliform and total Coliform.
During the course of this project, we will be using the following tools:
- Python3
- Jupyter
- NumPy
- Pandas
- Matplotlib
- Seaborn