First Draft of reddit archive reading
Purpose of this MR:
- To set up a basic pipeline that reads in test Reddit Archive json files
What is my solution?
- I added 2 utility functions in the
Test
folder that reads in theRedditComment
orRedditSubmission
files in theTest/Resource
folder. - The main
swiftNLP
package has 2 new functions that decodes json strings into respectiveRedditComment
orRedditSubmission
data classes - The goal is to separate the file reading logic from the main logic of our package, i.e. users can pull data using whatever medium they want.
What should the final version look like?
- It is not the best idea to put a lot of test files in the repo, so my suggestion is that we store the files in a cloud
- During the initial test stage, I believe that just pulling data from the existing
OneDrive
resources is fine. However, accessing one-drive using API will require setting anAzure App
to get API keys & secret keys. - Some alternatives are
AWS S3
,Google Cloud
,Microsoft Azure Blob Storage
, all are low cost options to store the files. However, I believe using one-drive should be enough for now.