Skip to content

Overhauled project organization

Jim Wallace requested to merge corpus-refresh into main

Refactored how protocols and classes support Data Collection + Encoding. Removed BoW as an option. Much more.

This sets up a baseline for the project - compiles on Linux and Mac OS, provides basic functionality to load from Reddit archives, encode using GloVE-50 embeddings. Next step is to integrate Henry's work + CoreML embeddings.

Merge request reports