Database Systems
Course instructor: Yanif Ahmad.
Course staff: Michael Rushanan (TA), Mohit Dia (CA).
Class schedule: MW 12-1.15pm, Shaffer 100.
Office hours:
Yanif: Mon 1.15-2.30pm (or by email), Shaffer 200A
Michael: Thurs 1.00-2.00pm, Shaffer 204E
Mohit: TBA, Shaffer 200A
This course serves as an introduction to the architecture and design of modern
database management systems.
Database management systems (DBMS) are widely used to manage, store and query
diverse datasets and have become an invaluable tool in today's enterprises and
large web companies with applications in transaction processing, business intelligence
and analytics.
Topics include query processing algorithms and data
structures, data organization and storage, query optimization and cost modeling,
transaction management and concurrency control, high-availability mechanisms,
parallel and distributed databases, and a survey of modern architectures including
NoSQL, column-oriented and streaming databases.
In addition to technical material, we will devote a portion of
weekly lectures to looking at the use of database technology in today's
enterprises, including document indexing at Google, parallel data warehousing
with systems such as Hadoop and HIVE, and transactional web applications.
Coursework includes programming assignments and experimentation in a
simple database framework written in Java.
Database Systems (CS 600.316/416) is complementary to the Databases
course (CS 600.315/415) offered in the Fall semester and make a natural
course sequence. 315/415 focuses on the usage of databases, with topics
including database schema design and models, and database programming languages.
316/416 focuses on database internals and the implementation of a database
runtime to realize declarative queries, including storage and data organization,
indexing, query processing and optimization and transactions. 316/416 does not
have 315/415 as a prerequisite but it is recommended that you have some prior
exposure to SQL.
Course area: Systems
Prereq: CS 600.120 and CS600.226 or equivalent.
Academic Conduct
All activities related to this course are subject to JHU's
academic ethics and
student
conduct policies. Students are also expected to adhere to the
Computer Science Academic Integrity Code.
Database Systems is a 13-week course that meets twice a week and is
subdivided into system design, algorithms and architecture topics as
listed in the syllabus below in 1 week units. Classes consist of
lectures, discussions and reading, with a series of programming
assignments comprising the bulk of a student's grade. Further details
on the assignments can be found below. In addition, there will be two
short quizzes making up an in-class midterm and final. The primary source
of course material will be lecture slides made available on the course's
Blackboard page. The textbooks below are not required, but may be of use
as reference material for in-depth study of topics.
Textbooks (recommended, not required):
Database Management Systems. (3rd edition)
Raghu Ramakrishnan and Johannes Gehrke.
http://pages.cs.wisc.edu/~dbbook/
(the "Cow" book)
Database Systems Concepts.
Avi Silberschatz, Henry F. Korth, S. Sudarshan. (6th edition)
http://codex.cs.yale.edu/avi/db-book/
(the "Boat" book)
Assignments:
There will be 5 x 2-week programming assignments in this course on the
following topics:
- Database storage, implementing a heap file and buffer pool.
- Indexing, implementing a B+-tree.
- Query processing, implementing query operators.
- Query optimization, implementing the System-R optimizer.
- Transactions, implementing a locking and deadlock detection scheme.
We will use Java as our programming language in this course. Students
may use any development environment of their choice, although we
recommend Eclipse. Each assignment will have details on how to hand in
your solution.
We will ask you to perform a codewalk in the event that we have
difficulty getting your assignment to compile and run.
Assignment grades will be reported back via email.
Collaboration:
You may discuss Java related issues, potential "bugs" in your code, and
clarifications on the assignment handouts with other students. We will
use Piazza rather than Blackboard's discussion forums for this purpose.
Your activity on Piazza, in-class questions and office hour interactions
will determine the class participation component of your grade.
All code that you turn in must be your own. Please adhere to the Computer
Science Academic Integrity Code and the University's Ethics Code.
Office hours:
We encourage you to use office hours to ask further questions on course
material and any detailed assignment questions specific to your solution
that should not be shared with other students. You can also email Yanif
for a 1-on-1 if you're unable to meet any of the course staff during
office hours due to scheduling conflicts.
Grading
45% Assignments
15% Design documents
15% Midterm
15% Final
10% Class participation
Week |
Topic |
Suggested reading |
1: January 30 |
Welcome, SQL+DBMS intro. |
|
2: February 6 |
Storage and data organization. |
Cow 9.3-9.7, Boat 10.5-10.8 |
3: February 13 |
Indexing and access methods. |
Cow 10, Boat 11-11.4 |
4: February 20 |
Query processing. |
Cow 12-14, Boat 12 |
5: February 27 |
Query optimization. |
Cow 15, Boat 13-13.4 |
6: March 5 |
Physical database design. |
Cow 20, Boat 13.5-13.6,24.1-24.1 |
7: March 12 |
Transaction processing. |
Cow 16-17, Boat 14,15 |
8: March 26 |
Logging and recovery. |
Cow 17-18, Boat 15,16 |
9: April 2 |
Data analytics. |
Cow 25, Boat 20 |
10: April 9 |
Parallel and distributed QP. |
Cow 22, Boat 18 |
11: April 16 |
Modern architectures: NoSQL |
Cassandra/HBase, MongoDB. |
12: April 23 |
Modern architectures: MPP and streams. |
Hadoop, Hive, StreamBase, Storm. |
13: April 30 |
Modern architectures: graph and prob. DBs |
Pregel/GoldenOrb/Giraph, MayBMS, MCDB. |
All course material will be managed through our Blackboard and Piazza pages.
Look there for course announcements, assignment discussions, etc.