This project accompanies the course
“R2 - Data projects with R and Github”
at the Dr. Eberle Centre for Digital Competencies at the University of Tübingen.
All tutorials are summarized within the
During the course, you have to formulate a data wrangling project. That is, you should name or provide a data set, say how the data should be (re)structured and set some visualization goals. You can use a data set you are working on (note, you might have to make it anonymous to share it) or a data set freely available online, from a publication, whatever. I strongly suggest “dirty” data that has to be cleaned up and reformatted! Data cleanup, transformation and extension should be one (big) part of your project!
The formulated projects are the set of exercises you and your fellow students will pick from during the rest of the course, which is discussed below. But first, some details concerning project definition.
Before you can define a project you need some data! In order to select something, you might want to “reach high”! That is, think about something you would like to know or see and what data might be needed for that. Don’t think in terms of “I know how to do” but more “I would like to see” (like “A BOSS”)! Given an idea, start looking for data sets that might help you to provide the information for your idea. Either you find something useful, or you might change your idea while looking for and investigating available data.
Best are data set you are working on anyway or that are connected to your field of interest, such data makes most sense to you and you are most creative about possible analyses. Or something you have discussed in some other course or project. It would be best, if the data is already in some table form but not tidy, i.e. there is still need for some (extensive) data cleaning, formatting, …
You might not have a data set at hand, so check out open data repositories, websites, etc. Some open data repositories or search engines are listed at
Note down where you got your data from, since you will later have to provide some details about your data!
Eventually: the more ugly the data the better! 😜 Don’t try to be nice but provide what you have. Reality is neither nice nor without errors, bugs and misformatted data… Let’s get used to it!
online vs. local
If the data set is online and available for download without registration or user accounts, you can directly link it. This is often the case for data from databases or supplements from articles.
If user credentials are needed to access the data, please
If the data is large (>50MB per file),
Next, formulate some rough idea what you would like to see. If you want to (re)produce a plot you have seen, store the image. Or just draw a sketch by hand of how it should look like and make a photo. Anything to transport your idea is fine.
Try to think of something “non-standard”…
Double check that you think the data set you picked provides (somehow) all information needed to draw your plot of interest.
Write an R Markdown file project-description.Rmd
to
It is fine to be vague at some points but you should formulate a clear goal and roadmap.
The output format should be “normal” Markdown! To this end you have to
output: md_document
in the Rmd header!In order to submit your project proposal, you have to upload it to GitHub as part of this project! To this end:
Project
folderproject-description.Rmd
file from above.md
) output (you have to use in the header output: md_document
)Pull
the recent project version from the GitHub repositoryCommit
all new files to git versioning
.md
output file, …Push
your changes to GitHubREADME.md
file and add your project to the following list of Available projects
below
.md
page, see example linkCommit
and Push
your changes to GitHubExample:
Current projects:
At the end of Phase 1 you will have a better understanding of
To ensure the drafted projects are understandable and doable, we will do a peer reviewing. To this end, you will get assigned to two projects to give feedback for them. Review comments should be done via GitHub issues, where you can also discuss you ideas and suggestions with the respective project owner.
For each project draft, we will assign two reviewers at random. The reviewer assignments are as follows:
Each reviewer is supposed to
Hi @martin-raden, what do you think about my comments!
Each project owner is supposed to
DON’T CHANGE THE PROJECT DRAFT SO FAR!!! (Since this will interfere with the second review!)
At the end of Phase 2 you will
Now it is time to rework your project draft in the light of the received reviews and the project drafts you have reviewed yourself. You might want/need to change a few bits and pieces. In the end, you might do the following:
Pull
the current state of the project (just to be up-to-date).md
output (see Project description guide)Pull
again the current project state from GitHubCommit
your changesPush
your committed changes to GitHubAt the end of Phase 3 you will
Given a project description, you will try to solve the task. In order to practice real work flow life cycles, you will create your solution first in your own git branch and suggest it via a pull request on GitHub. This provides the project owner the possibility to review your solution and to give you feedback, which you can discuss within the pull request. Once all are happy with the solution it can be merged into the main branch of the course repository and thus be published.
This workflow is described and summarized in
Note: we are still working all on ONE GITHUB REPOSITORY! We do not create a fork, i.e. our own copy of the repository on GitHub, which is also detailed in the linked material. The latter (forking) is needed, if you don’t have writing permissions to a repository. But the overall workflow is more or less the same.
main
branch BEFORE creating your new branch !!! Otherwise, your new branch will be an offspring of the currently loaded branch (and its changes..)!youGithubName.Rmd
output: md_document
in your header to produce a “normal” Markdown file.md
output file for you
yourGithubName.Rmd
yourGithubName.md
(knit it, if not existing yet)When you work on your solution, you should at least once a day
This ensures you will not loose your work (backup) and store the stuff where it belong.
Furthermore, it opens up a new way to get help! In case you get stuck somewhere, it is a good idea to
At some point you will be satisfied with your project solution and all changes are committed and pushed to GitHub.
Now it is time to open a pull request.
.md
page in the browser:
https://github.com/Dr-Eberle-Zentrum/Data-projects-with-R-and-GitHub/blob/main/Projects/martin-raden/project-description.md
At the end of Phase 1 you will
Now it is time for the project owner to check your solution and for both of you to discuss possible changes, extensions, … This should, as before, be done on GitHub, but now directly within the pull request! All comments, answers, changes etc. will be listed there. Even if you are meeting in person, please note down the main points and goals within the pull request (together).
The project owner should
The solution author should
You can already work on the changes while you are discussing! Any change you commit to your branch is automatically visible in the pull request (and this HTML visualizing link you provided).
Thus, you can directly discuss if you meet the ideas of the project owner or suggest alternative ideas.
You will get a loooot of GitHub emails this week! :grin:
At the end of Phase 2 you will
Finally, it is not only about content but presentation matters. Thus, you will have to beautify your HTML output. Here some ideas where to start:
If your solutions generates HTML output files, you cannot directly view/render them on GitHub, since the page is made to work on source files not rendered output.
In case your HTML file works without JavaScript (just static text and image output), you can use https://htmlpreview.github.io/
Raw
button in the upper right cornerhttp://htmlpreview.github.io/?
Note: htmlpreview
is only working for HTML pages without JavaScript content!
In case your HTML file is making use of JavaScript, you can use https://raw.githack.com/
The procedure is the same as above but the final URL is slightly different, see website.