Skip to content
Snippets Groups Projects

Swift NLP

Language License: MIT

Swift NLP is a research-based, unsupervised topic modelling pipeline for social media, written in Swift. It draws from available Swift libraries to support data collection, document encoding, dimensionality reduction, clustering, and unsupervised topic modeling. Our goals are to provide a modular and efficient set of unsupervised topic modelling tools that work across MacOS and Linux.

We are currently developing a base implementation for a modular and composable topic modelling pipeline. That pipeline will be paired with built-in support for data collection from sources like Reddit — we currently have cross-platform support for data collection from both Reddit API and PushShift dump files. This article provides some useful background on sentence embeddings, and outlines much of the fundamental approach.

Our intended use case looks like this:

import SwiftNLP

// Pull live data via Reddit API, encode it using GloVE Embeddings
var corpus = DictionaryCorpus(encoding: .glove6B50d)
corpus.addDocuments(fromSubreddit: "uwaterloo") 

// Create a topic model and print summary 
let topicModel = corpus.cluster() 
print(topicModel)

Roadmap

Contributing

This project is developed by a team of researchers from the Human-Computer Interaction and Health Lab at the University of Waterloo. The project is led by Prof. Jim Wallace, with contributions from:

  • Jason Zhao
  • Nicole Mathis
  • Peter Li
  • Adrian Davila
  • Henry Tian

If you would like to contribute to the project, contact Prof. Wallace with "SwiftNLP" in the subject line, and mention one or more of the roadmap items above that you would like to work on.

License

All original code released under the MIT license for commercial and non-commercial use.