Large Scale Data Science (CS626)

This is just a place holder. Information will be available soon. This course was originally developed by Dr. Licong Cui. We will use the original materials for Spring 2021.
Semester: Spring, 2021.
Class Time: 12:30-1:45PM, Tuesdays and Thursdays:
Classroom: This class will be offered fully online:

Instructor: Dr. Jun Zhang,, Tel: 257-3892.
Office: 321 Marksbury Building.
Office Hours: Mondays and Wednesdays, 9:00AM-10:00AM, or by appointment.

Suggested Optional Text Book:
Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale (4th Edition),
by Tom White
(ISBN-13: 978-149-190-1632), ISBN-10: 149-190-1632.

The Reference Book.

Here is the tentative syllabus for Spring 2021. I reserve March 25, 2021 as the Midterm Day!

Read the following information before you decide to take CS626.

Class Notes

This class was originally by Dr. Licong Cui. For the Spring 2021, we will use her teaching materials.

There will be a lot of programming projects involved using the Hadoop system with MapReduce programming style. It is required that the student be proficient in Java programming and familiar with the Unix operating system. Previous exposure to database systems and data mining concepts is helpful. Knowledge of cloud computing can be advantageous.

If you want to use your computer to install a Hadoop system, your computer must have at least 10 GB memory


Homeworks and Projects


Here are some exciting research projects in scientific and parallel computing and my ambitious research team.

