Course Description

Quarter 1 of the capstone covers two parallel topics:

  1. The basics of “data science methodology” for a large project, including best practices for data handling and project reproducibility (“lecture”).
  2. Beginning research into your choice of domain. Acquaintance with a domain is made via replicating a specified result in a your domain of inquiry (“discussion”).

The Q1 Project will result in (2) will use the best practices learned in (1). This work then serves as a foundation for project proposals due at the end of the quarter. The projects will be worked on, in groups, in the second quarter. While the methodology portion is taught in a traditional lecture setting, most of the material covered in this course will be done through self-guided learning: readings, data exploration, and ensuing discussion.

Data Scientists typically work on projects in groups. As such, large assignments throughout the sequence (the replication; project proposal; project itself) are worked on in groups. Like in a career or research situation, these groups will be formed using a variety of factors, including academic background, mutual interests, and a little randomness.

Course Components

Lecture (data science methodology)

One hour per week will be devoted to lecture on data science methodology. There will be accompanying light homework assignments.

Discussion (domain)

Two hours per week will be devoted to discussion about domain specific topics (1hr in section; 1hr in office hours); you must attend the section for your choice of domain. Discussion will involve discussion about readings and assignments, so it is imperative that you complete the relevant assignments before attending discussion section. Each section begins with a set of questions to which you will write a response; your response will serve to drive the class discussion. If you do not ask questions in discussion section, no discussion will occur. Domains are run by mentors, not instructors; you must actively participate in discussion.

Remark on how the course is split

As is common in Data Science, you will likely find yourself as a bridge between domain specialists and (computing) methodology specialists. In the case of this course, it is expected and normal that discussion section leaders will not know specifics of your code (or even know the language you are coding in!). You will have both (1) office hours with a methodology expert and (2) office hours and discussion with domain experts. As such, it is up to you to formulate your questions for the appropriate audience (domain expert or computing expert), so that you can adequately communicate with them to solve the problem you are facing.

Course Deliverables

See the page on Q1 assignments

Assessments and Grades

The course grade will be computed using the following proportions:

Component % of Grade
Methodology HW 5%
Discussion Section Participation (Mentor) 5%
Discussion Section Particicpation (Written Responses) 5%
Domain Q1 Project (reports + checkpoint) 50%
Domain Q1 Project (code) 20%
Project proposal 15%

Grading Policy

Implementing a consistent grading scheme for work in such a diverse collection of areas is helped by both a clear rubric and a coarse grading scheme.

  • Each assignment will have a (generally applicable) grading rubric that will help guide your grading.
  • Each assignment will be graded using a coarse schema that reflects broad checkpoints that students met. This schema helps maintain focus on large, impactful things that students can improve on and should reduce grading disaggreements.

The grading scheme for assignments in the course are given on an A/B/C/F scale (without plus/minus). Generally, these grades reflect the following criteria (credit: Shannon Ellis),

Grade Criteria
A (4.0) Accomplishes the task accurately, completely, and clearly. Code is clear, effective, and efficient. Written component is concise, at the appropriate level, and correct. Oral component (when applicable) is effective both visually and explanation; is within the time limit.
B (3.0) Accomplishes the task well, but lacks some completeness or clarity. Code runs but lacks some aspect of clarity, effectiveness, and or efficiency. Written component is logical and generally correct, but lacks either clarity or accuracy. Oral component (when applicable) is moderately effective and/or slightly outside the time window.
C (2.0) The task is somewhat accomplished, but lacks significantly when it comes to completeness and clarity. Code present but does not accomplish the task up to the standards of a data science graduating senior. Written component lacks substantial clarity/correctness. Oral component (when applicable) significantly lacks effectiveness/clarity.
F (0.0) The task largely remains unaccomplished. Code lacks completeness, structure, and is unclear. Written component lacking required information to understand what you did and/or your results. Oral component (when applicable) is nonsensical/unclear.

Final grades will be computed using the grade-points above, using the proportions given in the course components table. Letter grades will be assigned using the standard university cutoffs.

Collaboration Policy and Academic Integrity

In DSC 180, we expect you to work hard and engage with material that originates outside the academic walls. All ideas and work must be your own, that of your approved group, or properly cited. Act with integrity and don’t cheat.

In DSC 180 you are encouraged to use outside resources to help with your work. However, you must properly cite any concepts, writing, or code that originates from other sources. If you are unsure of whether something needs a citation, it’s best to:

  • consult the domain expert for your section, and
  • follow the examples in course readings.
  • place code citations with the relevant link in comments.

The following activities are considered cheating and ARE NOT ALLOWED in DSC 180 (this is not an exhaustive list):

  • Using or submitting either writing or code acquired from other students (except your partner, where allowed).
  • Not properly citing ideas, writing, or code acquired from outside sources. (Citations are a good thing!)
  • Having any other student complete any part of an assignment on your behalf.
  • Completing an assignment on behalf of someone else.

The following activities are examples of appropriate collaboration and ARE ALLOWED in DSC 180:

  • Discussing the general approach to understanding or solving a problem.
  • Talking about debugging/cleaning strategies or issues you ran into and how you solved them.
  • Using outside material with proper citations (including StackOverflow code!).