Projects I Have Authored
AUTO-GFQG
An Automatic Gap-Fill Question Generation system that creates multiple choice, fill-in-the-blank questions from text corpora. Textbooks, factoid archives, news articles, reports, lecture notes, legal proceedings -- the minimum viable input is a small to moderate sized collection of coherent, well-formed english.
github.com/malcolmgreaves/auto-gfqg
SMO-fun
An efficient Scala implementation of the Sequential Minimal Optimization algorithm for training Support Vector Machines.
github.com/malcolmgreaves/smo-fun
Functional programming for machine learning
fp4ml: A library of machine learning algorithms implemented using principles of functional programming.
github.com/malcolmgreaves/fp4ml
DATA_TC
A type class for data of all sizes: write an algorithm once and run on local Scala collections or a Spark cluster.
github.com/malcolmgreaves/data-tc
REx
Supervised relation extraction from free-form text on Spark. Complete end-to-end implementation of my Master's thesis: http://goo.gl/DzMr6c.
critical_CELL
Critical cell finding algorithm using functional programming. Critical cell finding is a pre-processing step for table data extraction.
github.com/malcolmgreaves/critical_cell
scv
Small library of image processing and computer vision algorithms. Depends on the boof-cv Java library.
FUNFLOW
A functional abstraction for dependent, asynchronous computation.
Projects I have Collaborated On
AVRO-CODEGEN
Scala code generation from Avro schemas.
Data-PIPELINES
Tutorial code from my Fall 2014 Datapalooza session. Includes beginner to advanced material for functional programming in Scala. Also includes classification and ranking using nearest neighbors as well as clustering using k-means. Includes some elements of "fp4ml".
github.com/Nitro/data-pipelines
pdsimplify
A Scala project that extends PDFBox 2.x with a simplified document object model. Wrote the PDFBox 1.x based predecessor of this library as a Nitro internal project. Assisted, mentored, and code reviewed Sagnik Choudhury on the PDFBox 2.x re-write and open source development.
ProPPR
ProPPR (pronounced "proper"): Graph-algorithm inferences over local groundings of first-order logic programs