Skip to content

A repository for scripts and notebooks for the UCSD big data course

Notifications You must be signed in to change notification settings

czarifis/UCSD_BigData

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Big Data Analytics

This course is an introduction to big data analytics. The course will cover algorithmic, statistical and data management aspects of big data analysis. The emphasis is on algorithm whose run time is linear in the size of the input.

For additional information visit:

============

Specifically we will cover the following areas:

AWS, EC2, S3, Git and Github.

The IPython notebooks

  • Using notebooks on AWS.
  • numpy and Pandas.
  • Matplotlib.

Performance and the memory Hierarchy.

I/O efficient sorting.

Statistical Models and Compression.

  • Linear regression, LPC and vocoders.
  • Vector quantization and K-Means.
  • Singular value decomposition and compressing of cyclical signals.
  • Kolmogorov Complexity and Kolmogorov Sufficient statistics.

The Map-Reduce framework.

  • HDFS, Hadoop and map-reduce.
  • Word-count
  • Vector-Matrix Multiplication
  • Selections
  • Projections
  • Natural Join
  • Aggregation.

The art of sampling

  • Estimation through sampling, Hoeffding bound, Gilvenco-Cantelli theorem.
  • Empirical Bernstein inequality and sequential estimation.
  • Stratified sampling.

Column-based databases.

  • HBase, Comparison of HBase to HDFS
  • Hashing
  • Min-Hash and finding similar documents.
  • Locality Sensitive Hashing.
  • LSH for L1 and L2 distances.
  • LSH for the Entity resolution problem

Streaming algorithms.

  • Counting distinct elements.
  • Estimating moments (Alon-Matias-Szegedy algorithm)
  • Counting ones in a window.
  • Finding heavy hitters.

About

A repository for scripts and notebooks for the UCSD big data course

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.2%
  • Shell 1.8%