Projects I Have Authored

AUTO-GFQG

An Automatic Gap-Fill Question Generation system that creates multiple choice, fill-in-the-blank questions from text corpora. Textbooks, factoid archives, news articles, reports, lecture notes, legal proceedings -- the minimum viable input is a small to moderate sized collection of coherent, well-formed english.

github.com/malcolmgreaves/auto-gfqg

 

SMO-fun

An efficient Scala implementation of the Sequential Minimal Optimization algorithm for training Support Vector Machines.

github.com/malcolmgreaves/smo-fun

 

Functional programming for machine learning

fp4ml: A library of machine learning algorithms implemented using principles of functional programming.

github.com/malcolmgreaves/fp4ml

 

DATA_TC

A type class for data of all sizes: write an algorithm once and run on local Scala collections or a Spark cluster.

github.com/malcolmgreaves/data-tc

 

REx

Supervised relation extraction from free-form text on Spark. Complete end-to-end implementation of my Master's thesis: http://goo.gl/DzMr6c.

github.com/malcolmgreaves/rex

 

critical_CELL

Critical cell finding algorithm using functional programming. Critical cell finding is a pre-processing step for table data extraction.

github.com/malcolmgreaves/critical_cell

 

scv

Small library of image processing and computer vision algorithms. Depends on the boof-cv Java library.

github.com/malcolmgreaves/scv

 

FUNFLOW

A functional abstraction for dependent, asynchronous computation. 

github.com/malcolmgreaves/funflow


 

Projects I have Collaborated On

AVRO-CODEGEN

Scala code generation from Avro schemas.

github.com/Nitro/avro-codegen

 

Data-PIPELINES

Tutorial code from my Fall 2014 Datapalooza session. Includes beginner to advanced material for functional programming in Scala. Also includes classification and ranking using nearest neighbors as well as clustering using k-means. Includes some elements of "fp4ml".

github.com/Nitro/data-pipelines

 

pdsimplify

A Scala project that extends PDFBox 2.x with a simplified document object model. Wrote the PDFBox 1.x based predecessor of this library as a Nitro internal project. Assisted, mentored, and code reviewed  Sagnik Choudhury on the PDFBox 2.x re-write and open source development.

github.com/sagnik/pdsimplify

 

ProPPR

ProPPR (pronounced "proper"): Graph-algorithm inferences over local groundings of first-order logic programs

github.com/TeamCohen/ProPPR