First Draft of reddit archive reading (!3) · Merge requests · Jim Wallace / Curio

Jason Zhao requested to merge data-pipe-v2 into main May 24, 2023

Purpose of this MR:

What is my solution?

I added 2 utility functions in the Test folder that reads in the RedditComment or RedditSubmission files in the Test/Resource folder.
The main swiftNLP package has 2 new functions that decodes json strings into respective RedditComment or RedditSubmission data classes
The goal is to separate the file reading logic from the main logic of our package, i.e. users can pull data using whatever medium they want.

What should the final version look like?

It is not the best idea to put a lot of test files in the repo, so my suggestion is that we store the files in a cloud
During the initial test stage, I believe that just pulling data from the existing OneDrive resources is fine. However, accessing one-drive using API will require setting an Azure App to get API keys & secret keys.
Some alternatives are AWS S3, Google Cloud, Microsoft Azure Blob Storage, all are low cost options to store the files. However, I believe using one-drive should be enough for now.

First Draft of reddit archive reading