Project

General

Profile

News

Reality Changers Challenge Assembly: Combining Data Into Single File

Added by Eric Busboom 22 days ago

Mariel shared with me a set of files, from which I've extracted a subset that has students for 2016-2017 and 2017-201 at a few schools, with their GPA 1 and GPA 2. This looks like it will be the main set of data to analyze. Here is the combined file:

https://docs.google.com/spreadsheets/d/16ENWTVx4c_IuQqYDpMVLGIslnSuI9wZF6adK3qO6gHg/edit?usp=sharing

If this is the right set of data for the Challenge Assemblies, I'll start cleaning it.

SD Water Quality: Answer Some Questions on the New Insights Site

Added by Eric Busboom 8 months ago

I'm trying a new way of handling data questions, the Insights website. The water-project tag identifies questions specific to this project, but there are also questions on the site that are not part of the project. Feel free to comment on or answer any of them.

I'll be promoting this site to journalists, nonprofits and other who are interested in getting data answers, and I hope that you'll follow the questions and answer some of them. You can get immediate notification of new questions and answers from the Slack channel.

After the next meeting, I'll probably move the Water quality project into this format; rather than having meeting dedicated to just water quality, we'll have Data Library meetings that are focused on answering a few of the open questions.

Please visit the site and review a few of the questions in with the water-project tag, and if you have analysis, link them in as answers. If you are working on an analysis that is different from the current questions, you can create and answer your own question. Just be sure to apply the water-project tag.

SD Water Quality: Interactive Station Map, Creek Monitoring Data

Added by Eric Busboom 8 months ago

We've got a new dataset for the San Diego Coastkeeper water monitoring program. This data comes from CEDEN, so it has the same format as the Beachwatch bacteria counts, but also includes measurements for temperature, pH, turbidity and nutrients.

https://data.sandiegodata.org/dataset/ceden-waterboards-ca-gov-sdck_monitoring

I've extracted all of the measurements stations from this dataset and added them to the Beachwatch stations, to produce this map:

http://water.sandiegodata.org/maps/stations/

You can get the link to the map from the main webpage for the project:

http://water.sandiegodata.org

SD Water Quality: Full Featured Beachwatch Data

Added by Eric Busboom 9 months ago

I've just released a new version of the Beachwatch data which include "features", new columns specifically designed for analysis:

https://data.sandiegodata.org/dataset/sandiegodata-org-beachwatch

These features include a "measure code" that groups together each unique combination of analyte/methodname/unit, so you can easily select a specific group for analysis. As shown in the example notebook, this will usually be 24, the Enterolert measurements of Enterococcus.

The dataset also has mean, median and quantile groups for each of the combinations of station and method code, so rather than analyzing the raw result measurements, you can work with "high" and "low" values, scaled to a particular measurement and station. Some of examples of this sort of analysis, using Logistic Regression, is in this example notebook:

https://github.com/san-diego-water-quality/water-datasets/blob/master/derived/sandiegodata.org-beachwatch/notebooks/Examples.ipynb

SD Water Quality: New Dataset for Tides, River Flow and Rain

Added by Eric Busboom 9 months ago

Here is a new data set to make analysis a bit easier:

https://data.sandiegodata.org/dataset/sandiegodata-org-water_quality

It combines rain, tides and San Diego river flow, re-sampled to days. It is also a good example of how to combine datasets. See the notebook used to build the package for examples of resampling and joining:

https://github.com/san-diego-water-quality/water-datasets/blob/master/sandiegodata.org-water_quality/notebooks/Combine.ipynb

The data packaging system I created, Metapack, now has a new feature to automatically create Exploratory Data Analysis notebooks. The notebook for this package is an example of basic EDA analysis, and it is also important for the analysis of null values -- see the Nulls section at the next of the notebook -- because the three data series in the dataset all have different time ranges for valid values.

https://github.com/san-diego-water-quality/water-datasets/blob/master/sandiegodata.org-water_quality/notebooks/eda-tides_river_rain.ipynb

If you'd like to do more exploration, combine this dataset with the Beachwatch data to see how bacteria counts at various stations correlate with rain and river flow.

SD Water Quality: What's Happening at Dog Beach?

Added by Eric Busboom 9 months ago

I've posted an issue, to look at possible correlations to bacteria counts at dog beach.

https://redmine.civicknowledge.com/issues/153?issue_count=2&issue_position=1&next_issue_id=93

If you'd like a challenge, work on the ticket and present your work at the next meeting.

The Issue references a map of water quality stations:

https://san-diego-water-quality.github.io/ericbusboom/pl_stations.html

This map was created with a Jupyter notebook, saved to HTML in the notebook, and checked into Github. Then I turned on the web publishing in Github for the repository, and I've got an interactive map!

Here is the notebook: https://github.com/san-diego-water-quality/ericbusboom/blob/master/Stream%20Flow.ipynb
Here is the repo, showing the map file, pl_stations.html: https://github.com/san-diego-water-quality/ericbusboom

Using Folium like this is one of the easiest ways to get an interactive map, with a good basemap and markers.

SD Water Quality: Correlations Challenge

Added by Eric Busboom 9 months ago

Here is a new notebook I created that looks at the geographic distribution of measurement stations in the Beachwatch bacteria counts and how station results are correlated with each other over time:

https://github.com/san-diego-water-quality/ericbusboom/blob/master/Beachwatch%20station%20correlations.ipynb

For the next meeting ( August 1 ) I'd like to have a few other analysts present extensions to this analysis, looking at:

  • Compare the correlations between stations to the distance between them. Is the correlation a function of distance?
  • Look at the correlations over time. Are they getting stronger or weaker?
  • This analysis looks at only one analyte, 'Coliform, Total'. Are there similar correlations for the others?
  • With additional work, you may be able to find the creeks, stream, or drainage pipes that are close to these stations. Are there relationships between the station results and the locations of these water outlets?

SD Water Quality: Did you Get the GitHub Invite?

Added by Eric Busboom 9 months ago

I (think) I've invited everyone into the github organization. If so, you should either already be a member, or have an invitation waiting for you, which you can accept at:

https://github.com/san-diego-water-quality

I added everyone from the "github account" entry in your Redmine account profile. So, if you didn't get a Github invite and don't have a redmine account, please create one. If you didn't get a github invite and you do have a redmine account, email me with your Github user id and I'll add you.

eric.

SD Water Quality: New Packages

Added by Eric Busboom 9 months ago

We have three new data packages in the data repository, under the 'water-project' tag. See them all at:

https://data.sandiegodata.org/dataset?tags=water-project

To use these packages, click into the resource for the .csv or .zip package file, and you'l' see the resource documentation includes some python code. For instance, for the CSV package for the Beachwatch data you'll see:

import metapack as mp
pkg = mp.open_package('http://library.metatab.org/ceden.waterboards.ca.gov-beachwatch-sandiego-1.csv')

Then, you can use the pkg to get a Pandas dataframe:

df = pkg.read_csv()

Later today or tomorrow I'll be posting some examples and challenges using these data packages.

(1-10/16)

Also available in: Atom