Database Systems

Course instructor: Yanif Ahmad.
Course staff: Michael Rushanan (TA), Mohit Dia (CA).

Class schedule: MW 12-1.15pm, Shaffer 100.

Office hours:
    Yanif: Mon 1.15-2.30pm (or by email), Shaffer 200A
    Michael: Thurs 1.00-2.00pm, Shaffer 204E
    Mohit: TBA, Shaffer 200A


Course Description |  Organization |  Calendar |  Syllabus |  Course material

Course Description

This course serves as an introduction to the architecture and design of modern database management systems. Database management systems (DBMS) are widely used to manage, store and query diverse datasets and have become an invaluable tool in today's enterprises and large web companies with applications in transaction processing, business intelligence and analytics. Topics include query processing algorithms and data structures, data organization and storage, query optimization and cost modeling, transaction management and concurrency control, high-availability mechanisms, parallel and distributed databases, and a survey of modern architectures including NoSQL, column-oriented and streaming databases. In addition to technical material, we will devote a portion of weekly lectures to looking at the use of database technology in today's enterprises, including document indexing at Google, parallel data warehousing with systems such as Hadoop and HIVE, and transactional web applications. Coursework includes programming assignments and experimentation in a simple database framework written in Java.

Database Systems (CS 600.316/416) is complementary to the Databases course (CS 600.315/415) offered in the Fall semester and make a natural course sequence. 315/415 focuses on the usage of databases, with topics including database schema design and models, and database programming languages. 316/416 focuses on database internals and the implementation of a database runtime to realize declarative queries, including storage and data organization, indexing, query processing and optimization and transactions. 316/416 does not have 315/415 as a prerequisite but it is recommended that you have some prior exposure to SQL.

Course area: Systems

Prereq: CS 600.120 and CS600.226 or equivalent.

Academic Conduct

All activities related to this course are subject to JHU's academic ethics and student conduct policies. Students are also expected to adhere to the Computer Science Academic Integrity Code.

Organization

Database Systems is a 13-week course that meets twice a week and is subdivided into system design, algorithms and architecture topics as listed in the syllabus below in 1 week units. Classes consist of lectures, discussions and reading, with a series of programming assignments comprising the bulk of a student's grade. Further details on the assignments can be found below. In addition, there will be two short quizzes making up an in-class midterm and final. The primary source of course material will be lecture slides made available on the course's Blackboard page. The textbooks below are not required, but may be of use as reference material for in-depth study of topics.

Textbooks (recommended, not required):
Database Management Systems. (3rd edition)
Raghu Ramakrishnan and Johannes Gehrke.
http://pages.cs.wisc.edu/~dbbook/
(the "Cow" book)

Database Systems Concepts.
Avi Silberschatz, Henry F. Korth, S. Sudarshan. (6th edition)
http://codex.cs.yale.edu/avi/db-book/
(the "Boat" book)

Assignments:
There will be 5 x 2-week programming assignments in this course on the following topics: We will use Java as our programming language in this course. Students may use any development environment of their choice, although we recommend Eclipse. Each assignment will have details on how to hand in your solution. We will ask you to perform a codewalk in the event that we have difficulty getting your assignment to compile and run. Assignment grades will be reported back via email.

Collaboration:
You may discuss Java related issues, potential "bugs" in your code, and clarifications on the assignment handouts with other students. We will use Piazza rather than Blackboard's discussion forums for this purpose. Your activity on Piazza, in-class questions and office hour interactions will determine the class participation component of your grade. All code that you turn in must be your own. Please adhere to the Computer Science Academic Integrity Code and the University's Ethics Code.

Office hours:
We encourage you to use office hours to ask further questions on course material and any detailed assignment questions specific to your solution that should not be shared with other students. You can also email Yanif for a 1-on-1 if you're unable to meet any of the course staff during office hours due to scheduling conflicts.

Grading

45% Assignments
15% Design documents
15% Midterm
15% Final
10% Class participation

Calendar


Syllabus

Week Topic Suggested reading
1: January 30 Welcome, SQL+DBMS intro.
2: February 6 Storage and data organization. Cow 9.3-9.7, Boat 10.5-10.8
3: February 13 Indexing and access methods. Cow 10, Boat 11-11.4
4: February 20 Query processing. Cow 12-14, Boat 12
5: February 27 Query optimization. Cow 15, Boat 13-13.4
6: March 5 Physical database design. Cow 20, Boat 13.5-13.6,24.1-24.1
7: March 12 Transaction processing. Cow 16-17, Boat 14,15
8: March 26 Logging and recovery. Cow 17-18, Boat 15,16
9: April 2 Data analytics. Cow 25, Boat 20
10: April 9 Parallel and distributed QP. Cow 22, Boat 18
11: April 16 Modern architectures: NoSQL Cassandra/HBase, MongoDB.
12: April 23 Modern architectures: MPP and streams. Hadoop, Hive, StreamBase, Storm.
13: April 30 Modern architectures: graph and prob. DBs Pregel/GoldenOrb/Giraph, MayBMS, MCDB.

Course material:

All course material will be managed through our Blackboard and Piazza pages.
Look there for course announcements, assignment discussions, etc.