Data-projects-with-R-and-GitHub

This project accompanies the course

“R2 - Data projects with R and Github”

at the Dr. Eberle Centre for Digital Competencies at the University of Tübingen.

Tutorials

All tutorials are summarized within the

Overview


Project descriptions

During the course, you have to formulate a data wrangling project. That is, you should name or provide a data set, say how the data should be (re)structured and set some visualization goals. You can use a data set you are working on (note, you might have to make it anonymous to share it) or a data set freely available online, from a publication, whatever. I strongly suggest “dirty” data that has to be cleaned up and reformatted! Data cleanup, transformation and extension should be one (big) part of your project!

The formulated projects are the set of exercises you and your fellow students will pick from during the rest of the course, which is discussed below. But first, some details concerning project definition.


Phase 1 - Drafting a project description

Before you can define a project you need some data! In order to select something, you might want to “reach high”! That is, think about something you would like to know or see and what data might be needed for that. Don’t think in terms of “I know how to do” but more “I would like to see” (like “A BOSS”)! Given an idea, start looking for data sets that might help you to provide the information for your idea. Either you find something useful, or you might change your idea while looking for and investigating available data.

Data sets

Best are data set you are working on anyway or that are connected to your field of interest, such data makes most sense to you and you are most creative about possible analyses. Or something you have discussed in some other course or project. It would be best, if the data is already in some table form but not tidy, i.e. there is still need for some (extensive) data cleaning, formatting, …

You might not have a data set at hand, so check out open data repositories, websites, etc. Some open data repositories or search engines are listed at

Note down where you got your data from, since you will later have to provide some details about your data!

Eventually: the more ugly the data the better! 😜 Don’t try to be nice but provide what you have. Reality is neither nice nor without errors, bugs and misformatted data… Let’s get used to it!

online vs. local

If the data set is online and available for download without registration or user accounts, you can directly link it. This is often the case for data from databases or supplements from articles.

If user credentials are needed to access the data, please

If the data is large (>50MB per file),

Visualization goals

Next, formulate some rough idea what you would like to see. If you want to (re)produce a plot you have seen, store the image. Or just draw a sketch by hand of how it should look like and make a photo. Anything to transport your idea is fine.

Try to think of something “non-standard”…

Double check that you think the data set you picked provides (somehow) all information needed to draw your plot of interest.

Write up your project description

Write an R Markdown file project-description.Rmd to

It is fine to be vague at some points but you should formulate a clear goal and roadmap.

The output format should be “normal” Markdown! To this end you have to

Upload to GitHub

In order to submit your project proposal, you have to upload it to GitHub as part of this project! To this end:

Available projects

Example:

Current projects:

Goals

At the end of Phase 1 you will have a better understanding of


Phase 2 - Reviewing a project description

To ensure the drafted projects are understandable and doable, we will do a peer reviewing. To this end, you will get assigned to two projects to give feedback for them. Review comments should be done via GitHub issues, where you can also discuss you ideas and suggestions with the respective project owner.

Raising issues

For each project draft, we will assign two reviewers at random. The reviewer assignments are as follows:

Each reviewer is supposed to

Each project owner is supposed to

DON’T CHANGE THE PROJECT DRAFT SO FAR!!! (Since this will interfere with the second review!)

Goals

At the end of Phase 2 you will


Phase 3 - Finalizing your project description

Now it is time to rework your project draft in the light of the received reviews and the project drafts you have reviewed yourself. You might want/need to change a few bits and pieces. In the end, you might do the following:

Goals

At the end of Phase 3 you will


Tackling a suggested project

Given a project description, you will try to solve the task. In order to practice real work flow life cycles, you will create your solution first in your own git branch and suggest it via a pull request on GitHub. This provides the project owner the possibility to review your solution and to give you feedback, which you can discuss within the pull request. Once all are happy with the solution it can be merged into the main branch of the course repository and thus be published.

This workflow is described and summarized in

Note: we are still working all on ONE GITHUB REPOSITORY! We do not create a fork, i.e. our own copy of the repository on GitHub, which is also detailed in the linked material. The latter (forking) is needed, if you don’t have writing permissions to a repository. But the overall workflow is more or less the same.


Phase 1 - Posting your initial solution

Prepare the file

Work on your solution and call for help

When you work on your solution, you should at least once a day

This ensures you will not loose your work (backup) and store the stuff where it belong.

Furthermore, it opens up a new way to get help! In case you get stuck somewhere, it is a good idea to

Create your pull request

At some point you will be satisfied with your project solution and all changes are committed and pushed to GitHub.

Now it is time to open a pull request.

Goal

At the end of Phase 1 you will

Phase 2 - Reviewing and finalizing

Now it is time for the project owner to check your solution and for both of you to discuss possible changes, extensions, … This should, as before, be done on GitHub, but now directly within the pull request! All comments, answers, changes etc. will be listed there. Even if you are meeting in person, please note down the main points and goals within the pull request (together).

The project owner should

The solution author should

You can already work on the changes while you are discussing! Any change you commit to your branch is automatically visible in the pull request (and this HTML visualizing link you provided).

Thus, you can directly discuss if you meet the ideas of the project owner or suggest alternative ideas.

You will get a loooot of GitHub emails this week! :grin:

Goal

At the end of Phase 2 you will

Beautifying your project

Finally, it is not only about content but presentation matters. Thus, you will have to beautify your HTML output. Here some ideas where to start:

Common Issues

Rendering HTML files stored on GitHub

If your solutions generates HTML output files, you cannot directly view/render them on GitHub, since the page is made to work on source files not rendered output.

If your HTML file does not use JavaScript:

In case your HTML file works without JavaScript (just static text and image output), you can use https://htmlpreview.github.io/

Note: htmlpreview is only working for HTML pages without JavaScript content!

If your HTML file is based on JavaScript:

In case your HTML file is making use of JavaScript, you can use https://raw.githack.com/

The procedure is the same as above but the final URL is slightly different, see website.