| Lecture | Contents & Refs | Papers | 
	
	
		|  Week 1 | INTRODUCTION
		(PPT)
		
		WEB TECHNOLOGIES (PPT) (Baldi) | 
			
			
			Brin, Sergey, and Lawrence Page. "The anatomy of a large-scale 
			hypertextual Web search engine." Computer networks and ISDN systems 
			30, no. 1 (1998): 107-117. (HTML)
			
			Singhal, Amit. "Modern information retrieval: A brief overview." 
			IEEE Data Eng. Bull. 24, no. 4 (2001): 35-43. (PDF)
			
			Broder, Andrei. "A taxonomy of web search." In ACM Sigir forum, 
			vol. 36, no. 2, pp. 3-10. ACM, 2002. (PDF) | 
	
		| Week 2 | WEB CRAWLING (PPT) 
		(Ch8-Bing Liu) Web Crawling and Basic Text Analyis (PPT)  
		by Hongning Wang 
		
		IIR Ch20 (PDF)
		
		Open Source Search Engines in Java
 - http://java-source.net/open-source/search-engines
 - http://www.manageability.org/blog/stuff/open-source-web-crawlers-java
		
		Start with Nutch – http://nutch.apache.org/
		
		Index directly to SOLR
		
		
		Create a seed list from DMOZ rdf
		
		http://www.dmoz.org/rdf.html
		
		http://wiki.apache.org/nutch/NutchTutorial
 Entity Extraction
		
		–LingPipe http://alias-i.com/lingpipe/
		
		–OpenNLP http://incubator.apache.org/opennlp/
		
		Entity Identification / Taxonomies
		
		–Freebase http://www.freebase.com/
		
		Basic Web Page Parser –https://github.com/pjaol/Webcrawler
		
		Example of OpenNLP usage
		
		–https://github.com/pjaol/entity_extractor
		
		Wikiperida: http://en.wikipedia.org/wiki/Web_crawler
 | 
			
			
			Olston, Christopher, and Marc Najork. "Web crawling." Foundations 
			and Trends in Information Retrieval 4, no. 3 (2010): 175-246. (PDF)
			
			Abiteboul, Serge, Mihai Preda, and Gregory Cobena. "Adaptive on-line 
			page importance computation." In Proceedings of the 12th international 
			conference on World Wide Web, pp. 280-290. ACM, 2003. (PDF)
			
			Rendle, Steffen, Christoph Freudenthaler, and Lars Schmidt-Thieme. "Factorizing 
			personalized markov chains for next-basket recommendation." In 
			Proceedings of the 19th international conference on World wide web, 
			pp. 811-820. ACM, 2010. (PDF)
			
			Shkapenyuk, Vladislav, and Torsten Suel. "Design and implementation 
			of a high-performance distributed web crawler." In Data Engineering, 
			2002. Proceedings. 18th International Conference on, pp. 357-368. IEEE, 
			2002. (PDF)
			
			Chakrabarti, Soumen, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, 
			David Gibson, and Jon Kleinberg. "Automatic resource compilation 
			by analyzing hyperlink structure and associated text." Computer 
			Networks and ISDN Systems 30, no. 1 (1998): 65-74. (HTML)
			
			Hull, David A. "Stemming algorithms: A case study for detailed 
			evaluation." JASIS 47, no. 1 (1996): 70-84. (PDF)
			
			Xu, Jinxi, and W. Bruce Croft. "Corpus-based stemming using cooccurrence 
			of word variants." ACM Transactions on Information Systems (TOIS) 
			16, no. 1 (1998): 61-81. (PDF) | 
	
		| Week 3
 | BOOLEAN MODEL (PPT) - IIR 
		Ch. 1
 - Shakespeare 
		plays
 TERMS AND POSTINGS (PPT)
 - IIR 
		Ch. 2
 
 | - 
		http://zembereknlp.blogspot.com.tr/ 
 - Porter's 
		stemmer (MIR), Porter 
		stemming algorithm (Official)
 - A 
		skip list cookbook (Pugh 1990)
 - Fast 
		phrase querying with combined indexes (Williams, Zobel, Bahle 2004)
 - Efficient 
		phrase querying with an auxiliary index (Bahle, 
		Williams, Zobel 2002)
 | 
	
		| Week 4 | DICTIONARIES AND TOLERANT RETRIEVAL (PPT) - IIR 
		Ch. 3
 | -Techniques 
		for automatically correcting words in text (Kukich 1992) -Finding 
		approximate matches in large lexicons (Zobel and Dart 1995)
 -Efficient 
		Generation and Ranking of Spelling Error Corrections (Tillenius)
 -How 
		to write a spelling corrector (Peter Norvig)
 | 
	
		| Week 5 | INDEX CONSTRUCTION (PPT) - IIR 
		Ch. 4
 
 
 INDEX COMPRESION (PPT)
 
		- IIR 
		Ch. 5 | - MapReduce: 
		simplified data processing on large clusters (Dean and Ghemawat 2004) - Efficient 
		single-pass index construction for text databases (Heinz and Zobel 2003)
 - Compression 
		of inverted indexes for fast query evaluation (Scholer et al. 2002)
 - Inverted 
		index compression using word-aligned binary codes (Anh and Moffat 2005)
 
 
 
			
			
			Zobel, Justin, and Alistair Moffat. "Inverted files for text search 
			engines." ACM computing surveys (CSUR) 38, no. 2 (2006): 6. (PDF)
			
			Scholer, Falk, Hugh E. Williams, John Yiannis, and Justin Zobel. 
			"Compression of inverted indexes for fast query evaluation." In 
			Proceedings of the 25th annual international ACM SIGIR conference on 
			Research and development in information retrieval, pp. 222-229. ACM, 
			2002. (PDF)
			
			Yan, Hao, Shuai Ding, and Torsten Suel. "Inverted index compression 
			and query processing with optimized document ordering." In 
			Proceedings of the 18th international conference on World wide web, 
			pp. 401-410. ACM, 2009. (PDF) | 
	
		| Week 6 | SCORING, TERM WEIGHTING AND THE VECTOR SPACE MODEL (PPT) 
 
 
		
		IIR 6.2 - 6.4.3IR Models from Chap 03: Modeling, Baeza-Yates & Ribeiro-Neto, Modern 
		Information Retrieval, 2nd Edition (PDF) | -
		
		Cosine Similarity -
		
		Exploring the similarity space
 -
		
		Okapi BM25
 
 
 
			
			
			Salton, Gerard, and Christopher Buckley. "Term-weighting approaches 
			in automatic text retrieval." Information processing & management 
			24, no. 5 (1988): 513-523. (PDF)
			
			Raghavan, Vijay V., and SK Michael Wong. "A critical analysis of 
			vector space model for information retrieval." Journal of the 
			American Society for information Science 37, no. 5 (1986): 279-287. (PDF)
			
			Singhal, Amit, Chris Buckley, and Mandar Mitra. "Pivoted document 
			length normalization." In Proceedings of the 19th annual 
			international ACM SIGIR conference on Research and development in 
			information retrieval, pp. 21-29. ACM, 1996. (PDF)
			
			Turney, Peter D., and Patrick Pantel. "From frequency to meaning: 
			Vector space models of semantics." Journal of artificial 
			intelligence research 37, no. 1 (2010): 141-188. (PDF)
			
			Sahlgren, Magnus. "The Word-Space Model: Using distributional 
			analysis to represent syntagmatic and paradigmatic relations between 
			words in high-dimensional vector spaces." (2006). (PDF) | 
	
		| Week 7 | SCORES IN A COMPLETE SEARCH SYSTEM (PPT) IIR Ch. 7
 |  | 
	
		| Week 8 | EVALUATION IN INFORMATION RETRIEVAL (PPT) Example (PDF)
 
		
		IIR Ch. 8 
 | 
			
			
			Borlund, Pia. "The IIR evaluation model: a framework for evaluation 
			of interactive information retrieval systems." Information research 
			8, no. 3 (2003). (PDF)
			
			Clarke, Charles LA, Maheedhar Kolla, Gordon V. Cormack, Olga 
			Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 
			"Novelty and diversity in information retrieval evaluation." In 
			Proceedings of the 31st annual international ACM SIGIR conference on 
			Research and development in information retrieval, pp. 659-666. ACM, 
			2008. (PDF)
			
			Smucker, Mark D., James Allan, and Ben Carterette. "A comparison of 
			statistical significance tests for information retrieval 
			evaluation." In Proceedings of the sixteenth ACM conference on 
			Conference on information and knowledge management, pp. 623-632. 
			ACM, 2007. (PDF)
			
			Buckley, Chris, and Ellen M. Voorhees. "Retrieval evaluation with 
			incomplete information." In Proceedings of the 27th annual 
			international ACM SIGIR conference on Research and development in 
			information retrieval, pp. 25-32. ACM, 2004. (PDF)
			
			Carterette, Ben, James Allan, and Ramesh Sitaraman. "Minimal test 
			collections for retrieval evaluation." In Proceedings of the 29th 
			annual international ACM SIGIR conference on Research and 
			development in information retrieval, pp. 268-275. ACM, 2006. (PDF) Common evaluation measures (TREC)
 Evaluation methods in text categorization
 The use of MMR, diversity-based reranking for 
		reordering documents and producing summaries (Carbonell and 
		Goldstein 1998)
 | 
	
		| Week 9 | RELEVANCE FEEDBACK AND QUERY EXPANSION (PPT) IIR Ch. 9
 |  | 
	
		| Week 10 | SOCIAL NETWORK ANALYSIS (PPT) 
		(Ch7-Bing Liu) |  | 
	
		| Week 11 | OPINION MINING AND SENTIMENT ANALYSIS (PPT, PDF, PDF) 
		(Ch11-Bing Liu) |  |