👋 Welcome to the dPrep course chatbot!
📘 Here's how to make the most out of your interaction with me:
- Ask Questions: Feel free to ask questions about the lecture material. I'm here to assist you with anything related to the Data Preparation and Workflow Management course.
- Check Sources: After providing an answer, I'll share the source from which the information was derived. It's encouraged to delve deeper into the material for a better understanding.
- Stay on Topic: Please keep your questions within the scope of the lecture materials. While I'm eager to help, I'm tailored specifically to assist with course-related topics.
- For general inquiries or discussions about the course material, please use the discussion board on Canvas.
- If you have individual questions, feel free to email us at: h.datta@tilburguniversity.edu.
🤖 This chatbot is brought to you by Tilburg.ai, where you can explore articles on AI and discover other useful AI tools.
Documents behind this Chatbot 📚
No Documents Found.
-
picture_as_pdf
dPrep - Opening lecture.pdf
In this document, the opening lecture for the dPrep course introduces the agenda, emphasizing the importance of reproducibility, efficiency, and collaboration in data projects. It outlines the course structure, focusing on building coding skills, using GitHub, and automating workflows. The lecture also covers practical tips, project details, grading, and the commitment to helping students succeed in future data-intensive work.
Autors: Hannes Datta
Pages: 32
-
picture_as_pdf
R Bootcamp (in-class tutorial).pdf
In this document, the R Bootcamp in-class tutorial provides an introduction to R and RStudio, focusing on data wrangling and manipulation using the dplyr package. It covers basic R concepts like creating projects, interacting with R, and using R as a calculator. The tutorial includes practical exercises on data handling, such as subsetting data, creating new variables, and summarizing data. It emphasizes hands-on practice with R code and encourages students to complete additional chapters on their own to reinforce learning. The document also introduces writing and running R scripts in both RStudio and the terminal.
Autors: Hannes Datta
Pages: 24
-
picture_as_pdf
Exploring and auditing new data with RMarkdown (in-class tutorial).pdf
In this document, the tutorial focuses on exploring and auditing new data using RMarkdown. It begins with a review of data preparation theory and introduces RMarkdown as a tool for producing well-formatted reports that mix code and text. The session includes exercises on basic R programming concepts, data manipulation, and creating functions. The document also guides students through practical tasks such as downloading data, using loops and the apply family, and generating summary statistics. The tutorial emphasizes the importance of understanding the unit of analysis and concludes with steps to clean and prepare data, ensuring it is ready for further analysis.
Autors: Hannes Datta
Pages: 45
-
picture_as_pdf
Engineering data sets (in-class tutorial).pdf
In this document, the tutorial covers the process of engineering datasets using R, focusing on practical data wrangling techniques with the tidyverse package. It emphasizes the importance of structuring scripts in a way that facilitates automation, using the setup-input-transformation-output (ITO) framework. The tutorial guides students through exercises involving data exploration, filtering, arranging, mutating data, and summarizing results. It also covers merging datasets, dealing with missing values, and feature engineering. The session aims to prepare students for creating well-structured, automated workflows and handling common data operations efficiently.
Autors: Hannes Datta
Pages: 32
-
picture_as_pdf
Automation with make (in-class tutorial).pdf
In this document, the tutorial focuses on automating data projects using the make tool. It introduces the use of make for automating workflows, particularly in the context of data analysis projects, and explains how to create and manage makefiles. The session guides students through writing their first makefile, modularizing code, and setting up an automated pipeline. The document also emphasizes best practices in coding, such as maintaining a clean project directory and using relative file paths. The tutorial concludes with a reflection on the importance of integrating these automation techniques into the students' broader workflow to enhance productivity.
Autors: Hannes Datta
Pages: 23
-
picture_as_pdf
tsh_make_cheatsheet.pdf
In this document, the Make cheatsheet provides a concise guide on how to use the make tool to automate research projects and ensure reproducibility. It explains the basic structure of a Makefile, which includes targets, prerequisites, and commands to build the targets. The cheatsheet outlines steps to create a Makefile, structure project directories, and use variables to simplify paths. It also covers how to write rules within Makefiles to execute code and manage project dependencies, emphasizing the efficiency and error reduction that automation with make can bring to research workflows.
Autors: Hannes Datta
Pages: 1
-
picture_as_pdf
GitHub (in-class tutorial).pdf
In this document, the tutorial introduces students to using GitHub for managing and collaborating on projects. It covers the basics of Git and GitHub, including setting up repositories, understanding branches, commits, pull requests, and organizing tasks using project boards. The session emphasizes the importance of using GitHub for version control and project management, following the Scrum methodology. It includes practical exercises for creating repositories, managing issues, and performing the Git workflow both online and locally. The tutorial also encourages students to enhance their GitHub profiles, which can be beneficial for their professional development.
Autors: Hannes Datta
Pages: 38
-
picture_as_pdf
Preparation before the course starts _ Data Preparation and Workflow Management.pdf
In this document, the instructions are provided for preparing before starting the 'Data Preparation and Workflow Management' course. It details the necessary steps, including installing R and RStudio, setting up a GitHub account, installing Git and Make, and obtaining a good text editor like Visual Studio. Additionally, the document recommends obtaining premium access to Datacamp.com for tutorials and suggests completing specific introductory tutorials on R, GitHub, and command line/terminal usage. The goal is to ensure students are well-prepared and familiar with the tools they will use throughout the course.
Autors: Hannes Datta
Pages: 3
-
picture_as_pdf
Team Project _ Data Preparation and Workflow Management.pdf
In this document, the team project for the course focuses on using GitHub to create reproducible workflows, manage files with Git, and automate data preparation and analysis. The project emphasizes infrastructure over writing a research paper. Teams of 4-5 students will work together, with guidance provided in coaching sessions. The final submission is a self-documenting GitHub repository, due by October 13, 2024.
Autors: Hannes Datta
Pages: 3
-
picture_as_pdf
Exam _ Data Preparation and Workflow Management.pdf
In this document, the final exam format, technical requirements, content focus, and preparation strategies for the 'Data Preparation and Workflow Management' course are outlined. It includes information on the exam structure, submission process, and tips for effective preparation.
Autors: Hannes Datta
Pages: 3
-
picture_as_pdf
Example questions.pdf
In this document, example questions for the 'Data Preparation and Workflow Management' course exam are provided. The questions cover various levels of complexity, including comprehension, application, synthesis, and evaluation. The document includes both open-ended and multiple-choice questions, focusing on tasks such as data manipulation in R, creating reproducible workflows, and applying concepts like Git, Make, and RMarkdown. These examples are meant to help students prepare for the final exam by practicing the types of questions they may encounter.
Autors: Hannes Datta
Pages: 4
-
picture_as_pdf
dPrep - Course Summary & Exam Preparation.pdf
In this document, the exam format, technical requirements, content focus, and preparation strategies for the 'Data Preparation and Workflow Management' course are outlined. It includes instructions on how to prepare, use of TestVision for the exam, and tips for managing files and code during the exam. The document also emphasizes the importance of practicing with the tools and skills covered in the course.
Autors: Hannes Datta
Pages: 4
-
picture_as_pdf
Workplan and coaching _ Data Preparation and Workflow Management.pdf
In this document, a workplan and coaching schedule for the 'Data Preparation and Workflow Management' course are provided. The plan outlines the weekly stages of the team project, from initial data exploration to final deployment, with specific tasks and deliverables for each week. Coaching sessions are scheduled to provide feedback on progress, aligned with the grading criteria. The document emphasizes following the plan to ensure timely project completion.
Autors: Hannes Datta
Pages: 2
-
picture_as_pdf
Course _ Data Preparation and Workflow Management.pdf
In this document, the course syllabus for 'Data Preparation and Workflow Management' is outlined. It includes the course description, learning goals, prerequisites, teaching format, and assessment details. The document also covers passing requirements, resit policies, and opportunities for earning bonus points. Additionally, it provides guidelines for class conduct, emphasizing diversity, inclusion, and open communication.
Autors: Hannes Datta
Pages: 6
-
picture_as_pdf
Enroll _ Data Preparation and Workflow Management.pdf
In this document, information about enrollment and obtaining course credits for the 'Data Preparation and Workflow Management' course is provided. It includes details for Tilburg University students, including course codes and enrollment procedures.
Autors: Hannes Datta
Pages: 1
-
picture_as_pdf
Grading _ Data Preparation and Workflow Management.pdf
In this document, the grading criteria for the 'Data Preparation and Workflow Management' course are detailed. The team project is evaluated based on three main components: the GitHub repository (30%), data preparation and analysis (45%), and the quality of source code and automation (25%). Each component is further broken down into specific criteria, such as research motivation, repository structure, data exploration, and code quality.
Autors: Hannes Datta
Pages: 4