Recommendation System
From Hutch Research
Contents
[hide]Contributors[edit]
- Brian Hutchinson, Assistant Professor, Computer Science Department, WWU
- Aaron Tuor, Graduate Student, Computer Science Department, WWU
- Katy McClintic, Undergraduate, Computer Science Department, WWU
Experiments[edit]
Tips and Tricks[edit]
Permissions
- The script is hutch_research/bin/group_permissions.sh.
- It takes a single argument: the directory whose contents you want to (recursively) fix permissions on.
- Or for an individual file or folder just:
- chmod g+wx /home/hutch_research/restOfFilepath
Wiki tips
General Data Processing
Notes and Posters[edit]
- Posters
- Brian's gradient notes for Joint Factorization Model
- Data Notes
- Matrix Factorization Notes
- Similarity Notes
- Correspondence
Background and Basics[edit]
Video Lectures
- Linear Algebra MIT Open Courseware (Gilbert Strang)
- Mining Massive Data Sets Coursera (Jure Leskovec, J. McCauley. Stanford)
- Machine Learning Coursera (Andrew Ng. Stanford)
- Recommender Systems Coursera (Joseph Konstan, Michael Ekstrand. University of Minnesota)
Notes
Books
- Mining of Massive Datasets (2012. Leskovec, et. al.)
- Resource for data mining, distance measures, intro to recommendations systems
- Collaborative Filtering Recommendation Systems (Konstan, et. al. 2011)
- Good introductory survey of the basics and the state of the art in the field circa 2011
- Recommender Systems Handbook (Ricci, et. al. 2011)
- A more advanced overview of the field of recommender systems
- Graph Mining: Laws, tools, and case studies (Chakrabarti and Faloutsos 2012)
- Few chapters on tensors (chpts 14 \& 15)
- Free online checkout via WWU
- Matrix Cookbook
- Lots of valuable derived formulas for finding gradients of matrix equations
Papers
- Tensor decompositions and applications (Kolda and Bader 2009)
- The basics about tensors
- Accurate Methods for the Statistics of Surprise and Coincidence. (Dunning 1993)
- In case you want to know more about why log-likelihood makes sense
- Latent Dirichlet Allocation (Ng, et. al. 2002)
- Famous paper on Latent Dirichlet Allocation
Wikis
- ACM RecSys Wiki
- compiled by the ACM Conference Series on Recommendation Systems
- has links to many resources from datasets, articles, books, projects, software and other tools
Tutorials
- Matrix Factorization Tutorial in Python
- Great beginning tutorial on matrix factorization as it applies to Recommendation Systems
Web Pages
- Top 10 movie recommendation engines
- Winning the Netflix Prize: A summary
- Funk SVD Explanation
- Why Netflix Never Implemented The Algorithm That Won
- Deconstructing Recommender Systems
- Amazon Reviewers
Conferences and Jourals[edit]
Public Datasets[edit]
- Amazon Reviews Snap Data(Stanford)
- The dynamics of viral marketing (Leskovec, et. al. 2007)
- First citation of Amazon Reviews Dataset
- Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text. (Leskovec and McAuley 2013)
- Later citation from more developed dataset
- Max Span Tree Graphs of Cross-Domain reviewers
- The dynamics of viral marketing (Leskovec, et. al. 2007)
Processed Datasets (in progress)[edit]
Open Source Software[edit]
- Apache Mahout
- Tool for building machine learning applications
- Resource for this tool: Mahout in Action pdf Book
- JavaDocs
- Apache Mahout tutorial (series)
- Mahout Item Recommender Tutorial using Java and Eclipse
- MyMediaLite: Recommender System Library
- Written in C#, for the .NET platform; runs on every architecture supported by Mono: Linux, Windows, Mac OS X.
- LensKit
- Recommender Systems toolkit implemented in java
- Recommender 101
- Framework written in Java to carry out offline experiments for Recommender Systems.
- Local Collective Embeddings
- This repository contains the MATLAB implementation of the Local Collective Embeddings model for cold-start recommendations.
- pmtk3: Matlab probabilistic modelling toolkit
- Matlab implementations of neural networks and other machine learning tools
- Matrix Factorization Jungle
- Downloadable implementations of many advanced matrix factorization algorithms.
- CNTK Computational Network Toolkit
- Computational networks (CNs) generalize models that can be described as a series of computational steps such as DNN, CNN, RNN, LSTM, and maximum entropy models.
- CNTK is a research code and ongoing project. There will be bugs in places.
- Poblano
- Poblano is a Matlab toolbox of large-scale algorithms for unconstrained nonlinear optimization problems. The algorithms in Poblano require only first-order derivative information (e.g., gradients for scalar-valued objective functions), and therefore can scale to very large problems. The driving application for Poblano development has been tensor decompositions in data analysis applications (bibliometric analysis, social network analysis, chemometrics, etc.).
Past Updates[edit]
- Winter 2015
- Spring 2015
CourseRank[edit]
Literature[edit]
Extended Literature
Non-Linearity
MF Optimization Techniques
Combining MovieLens and Imdb data
Tensors
Matrix Factorization
Neural Networks
Joint Factorization
Research Methodology
Review text & Product description analysis
User Cold Start