666,886 posts

I Archived The Entire Subreddit And Coded A Simple Website To Read It

by dream-hunter | October 08, 2018 | TheRedPill

987 upvotes

Reddit View

Link: https://theredarchive.xyz/

Preview: https://i.imgur.com/BXXQOke.png & https://i.imgur.com/niDdoEW.png

As a web developer that discovered TRP 1 year ago and is very grateful for the subreddit, I've always wanted to contribute here, but I never knew how, until now. After TRP has been quarantined, I feared it would get banned one day. So I decided to figure out a way to scrape the entire subreddit and have it viewed on a simple website.

I saw TRP's current backup of the subreddit and I wasn't happy with its design and hard-to-use website (and its lack of posts). So I decided to spend 8 hours trying to figure out how to scrape the entire subreddit and then code a website to view the posts as simple as possible.

Features:

  • 160,035 posts from TheRedPill & askTRP & RedPillParenting & RedPillWomen & ThankTRP & becomeaman & altTRP & GEOTRP
  • Comments (+ replies) included
  • TRP's subreddit theme
  • Search through all the posts instantly
  • Lightweight and simple to use website & no ads
  • DDoS protected and secured

Edit: Thank you for the amazing feedback everyone. I just finished scrapping RedPillWomen & RedPillParenting & ThankTRP and added them both to the website. Up to 1.5k posts added.

Edit 2: Almost every single post (around 64k) ever posted on TRP, from all the way back to 2012 till now, can now be viewed on the website.

Edit 3: Option to search through entire archive added; search through titles, posts, authors and even comments.

Edit 4: Every single post from askTRP, becomeaman and RedPillWomen (altTRP + GEOTRP) has been archived; that counts up to 60k posts since 2012. We are now at a total of 160k archived posts!


Post Information
Title I Archived The Entire Subreddit And Coded A Simple Website To Read It
Author dream-hunter
Upvotes 987
Comments 72
Date 08 October 2018 04:41 PM UTC (1 year ago)
Subreddit TheRedPill
Link https://theredarchive.com/post/52869
Original Link https://old.reddit.com/r/TheRedPill/comments/9mgilv/i_archived_the_entire_subreddit_and_coded_a/
Similar Posts

TRP terms found in post
Click to open them on Dictionary

Comments

[–]ReeZoX49 points50 points  (3 children) | Copy

Really like it!

Only thing I would like is being able to sort/search the results based on the categories (flairs) and then sort that with the most upvoted one's :)

[–]dream-hunter[S] 15 points16 points  (2 children) | Copy

Doing that is possible. Click on the Category column on the table two times then you'll see the most upvoted posts on that flair (or type the entire category's name in the search function). You can even see the most upvoted posts of each subreddit; just click the Subreddit column.

Thank you for the feedback!

EDIT: You can now filter which subreddits you want to be shown - check the search tab.

[–]ReeZoX1 point2 points  (1 child) | Copy

The search function works, but I can't switch the category on the column, there's either "Science" or none/unspecified displaying for me there ;)

[–]dream-hunter[S] 2 points3 points  (0 children) | Copy

Refresh page > click Column tab two times then you'll see the posts from each category in order of most upvoted.

[–]Modredpillschool[M] 123 points124 points  (5 children) | Copy

Thanks. So did we. https://www.forums.red/i/theredpill

We will always welcome more backup points, so thank you for helping out.

Our backup includes 3,176 Posts between TRP and AskTRP. With over 519,619 individual comments.

Counting RPW and ThankTRP, our total archive is 4,573 posts.

Instant search can be found here

In case of emergency we will release the entire database as torrent. FYI.

Forums opening soon with new mobile-friendly design.

[–][deleted] 13 points14 points  (1 child) | Copy

Yours looks great, its the go to back up. His however is quite a bit faster and very sleek, a collaboration would be nice.

[–]Modredpillschool1 point2 points  (0 children) | Copy

While we are working on proper caching at the moment, I'm legitimately curious what makes you say that.

https://www.forums.red/i/theredpill generates in 0.2027 seconds. A page with 345 comments generated in 0.1854 seconds.

A side-by-side load test shows my pages serve a bulk of the page 1 full second quicker (aside from post-loaded JS and images, which are specifically set to load slowly)

http://a.trp.red/files/timing.jpg

I'm hoping to streamline this so please let me know.

[–]drakehfh8 points9 points  (0 children) | Copy

It would be good to make the site open source with all the database. That way people would contribute.

[–]throne_deserter0 points1 point  (0 children) | Copy

TRP changed my life. Quarantine had me scared but I am glad that you guys are doing all this for those who might need help.

[–]RainySeasonInPH-1 points0 points  (0 children) | Copy

Not sure if you guys are aware, the entire reddit post history is backed up on google biquery. I've used it to dredge old redpill posts. I'm pretty sure there's also a torrent.

[–]izzyinjurious24 points25 points  (12 children) | Copy

What language did you use to scrape it? It's awesome btw good work.

[–]dream-hunter[S] 22 points23 points  (11 children) | Copy

PHP, my favorite language. :) Thank you!

[–]TheTriviaMan13 points14 points  (1 child) | Copy

I write php myself and even though you've already done the scraping I would suggest for any future developers to use a python library known as "beautiful soup" https://pypi.org/project/beautifulsoup4/ it's made specifically for web scraping

[–]SpiderAlpha332 points3 points  (0 children) | Copy

Yeah BeautifilSoup is fast and reliable, and for dynamically generated websites I use Selenium with headless Firefox.

[–]1McDrMuffinMan9 points10 points  (0 children) | Copy

Wow.... You're either a masochist or something else.

God speed Man!

[–]ThePantsThief2 points3 points  (5 children) | Copy

Why scrape it when they have an API? Fellow developer here. Curious in case the scraping solution is better somehow

[–]Modredpillschool2 points3 points  (0 children) | Copy

The https://forums.red/i/TheRedPill archive was done via API

[–]needz0 points1 point  (3 children) | Copy

Sometimes if you already have your favorite tool and it works, there's no need to learn another tool (or in this case, API).

[–]SilkTouchm3 points4 points  (2 children) | Copy

Scraping is always harder than just using the api.

[–]needz0 points1 point  (1 child) | Copy

Never say never or always.

So if I've been using BeautifulSoup since it came out, have an entire framework and dev environment dedicated to scraping websites and have little to no experience with APIs, which is gonna be harder?

[–]SilkTouchm1 point2 points  (0 children) | Copy

which is gonna be harder?

Scraping, by far. It doesn't matter how much experience you have, you still have to do the legwork. An API is just using a few pre-packaged methods, you don't need experience on it.

[–]the-dan-man0 points1 point  (1 child) | Copy

Out of curiousity, why is PHP your favourite language, and where did you learn it?

[–]dream-hunter[S] 1 point2 points  (0 children) | Copy

I don't remember, but long ago I wanted to start coding my own screenshot software, and my friend told me I could do that with my back-then favorite language & PHP, so I started learning PHP.

I learnt PHP by coding, and whenever I wanted to code something, I searched on Google about it and read the code. PHP is really simple.

[–]Brushyourteethm813 points14 points  (2 children) | Copy

Solid work, thanks! Will you be doing the same for AskTRP, MRP and AskMRP? Some solid posts and advice in each

[–]dream-hunter[S] 18 points19 points  (1 child) | Copy

I'll run my scrapper on all the Red Pill subreddits from the sidebar, I kinda forgot about those.

They'll be on the site within 48hr.

[–]Brushyourteethm83 points4 points  (0 children) | Copy

Good effort, thanks

[–]fuckboiwithfeelings28 points29 points  (4 children) | Copy

The community that keeps on giving, way to go!

[–]dream-hunter[S] 9 points10 points  (3 children) | Copy

Looking forward to contributing more. If you have more ideas about TRP for a web developer, let me know!

[–]Modredpillschool12 points13 points  (2 children) | Copy

I can invite you to our private dev forum (open soon). Instead of duplicating efforts we have a lot of tasks that need doing.

[–]RedPillHanSolo0 points1 point  (1 child) | Copy

I've sent you numerous e-mails, but haven't heard back. Is it because you didn't get them or you only invite trusted members and whatnot?

[–]Modredpillschool4 points5 points  (0 children) | Copy

I haven't forgotton about you. I simply haven't gotten the private forum running yet. Lots of work going on in the background.

[–]robodylan12310 points11 points  (2 children) | Copy

Did something similar here: http://trpbackup.com I have every single post backed up but only the top 1000 are browsable at the moment. You’ll be able to search all of them soon though.

[–]unn4med2 points3 points  (0 children) | Copy

This is great that so many backups are being created! TRP shall live on.

[–][deleted] 10 points11 points  (3 children) | Copy

Great website but is there an option to search by date?

[–]dream-hunter[S] 18 points19 points  (1 child) | Copy

Just added. Check the date column and click on it to sort. https://i.imgur.com/kCQf3h9.png

[–]Modredpillschool7 points8 points  (0 children) | Copy

[–]hardlifeman6 points7 points  (5 children) | Copy

Thanks for doing this.

When I click on these "load more comments" in old posts, it opens up a seperate "about:blank" page.

[–]dream-hunter[S] 3 points4 points  (3 children) | Copy

Load more comments, unfortunately, doesn't work, as I wasn't able to scrape those.

[–]Modredpillschool8 points9 points  (2 children) | Copy

This might help:

        $json = $this->request('/api/morechildren', array(             'link_id' => 't3_' . $link_id,             'children' => implode(',', $children),             'api_type' => 'json',         )); 

https://www.reddit.com/dev/api#GET_api_morechildren

[–]dream-hunter[S] 4 points5 points  (1 child) | Copy

Thank you, looks good. Unfortunately, I didn't scrape the posts using Reddit's API. I might be able to find another way to do it, though.

[–]Modredpillschool7 points8 points  (0 children) | Copy

Gotcha, good luck.

[–]Modredpillschool0 points1 point  (0 children) | Copy

Until he fixes it, forums.red does have exhaustive comment chains.

[–]WarViper13375 points6 points  (0 children) | Copy

With the way things are going having extra back ups is probably a good thing.

[–]huhub9 points10 points  (0 children) | Copy

I've been working on one of TRP backups, tried to post here, but the post did not make it.

AskTRP announce is here: asktrp/comments/9jxdtf/trp_sub_offline_backup/

I have completed 2016 and 2017, 2013-2015 are in progress, and I'm going to periodically update 2018. The entire archive contain 68,000+ posts + uncountable comments, no scripts and other unnecessary information (which reduced the size by half at least).

[–]El_Ejcovero4 points5 points  (0 children) | Copy

You can never have too many backups, especially with the purging of "controversial" content and people online as of late.

[–]Magnus_ORily4 points5 points  (0 children) | Copy

I feel part of an ancient civilisation who's history has been preserved

[–]Mgtow_Maester3 points4 points  (3 children) | Copy

Any chance you could do that for mgtow as well? Not to mention some other manosphere subs.

[–]dream-hunter[S] 5 points6 points  (1 child) | Copy

Not sure, not a big fan of that sub. If more TRP people want that then I can make it happen.

[–]JerryAwesome1 point2 points  (0 children) | Copy

Awesome, thanks!

[–]ImmunosuppressedTau1 point2 points  (0 children) | Copy

Thanks man!

[–]SalesOverEverything1 point2 points  (0 children) | Copy

Just wanted to thank you as well. This makes a real impact on men everywhere, usually for the better.

Thank you.

[–]_Neon_Shadow_1 point2 points  (0 children) | Copy

Amazing! I'm not in love with trp.red but this is an excellent substitute. Thank you.

[–]YesToControversy1 point2 points  (0 children) | Copy

Have this virtual beer, mate. Don't drink it and drive though!

🍺

[–]lonefireinwater1 point2 points  (0 children) | Copy

Thank you! This is amazing.

[–][deleted] 1 point2 points  (0 children) | Copy

You're doing god's work my guy, keep it up!

[–]BlueFreedom4201 point2 points  (0 children) | Copy

Thank you for this. Keep the memory before the big blackout (when the state and the elites to finally take the Earth)

[–]MortalSisyphus1 point2 points  (1 child) | Copy

This is amazing. It would be great to have this done on my subreddit, which is obviously at high risk of being banned. Since you already developed the code, would it be particularly difficult to set this up for another sub? Please let me know.

[–]ubisoft-vs-ea1 point2 points  (0 children) | Copy

You are amazing, if I wasn’t poor I’d give you gold

[–]johnpayne101 point2 points  (0 children) | Copy

Dude, excellent job. The website interface is really smooth, it loads very quickly. I want to give two suggestions though.

 

First: Maybe you shouldn't put asktrp posts in there? Most of the posts in asktrp are not worth anyone's time. It is a good thing to provide advice to someone asking for help. But I don't think it is necessary to add asktrp questions to the archive site. Add more TRP posts if you can. Because the posts and the comments sections on TRP are invaluable. If you already have added the top posts from all time, add the ones from this current year. Add the hot posts. The new posts.

 

Second: For the posts that you have uploaded only a part of the comments section is avaIlable. If possible, try to add the entire comments section for all posts. Because their is a lot of gold in the comments.

 

That being said, you have done an excellent job. It will help a lot of people out there. So congratulations and thank you.

[–]wereworm51 point2 points  (0 children) | Copy

You are the hero we all wanted !

[–]Asktrpthrowaway4201 point2 points  (0 children) | Copy

Really awesome, clean and easy to navigate

[–]standardmissile1 point2 points  (0 children) | Copy

Brilliant. Apart from the practical benefit this is a great example of DOING rather than WHINGING in response to the quarantine.

Some readers here really are internalising TRP and it's fantastic to see. Well done OP.

[–]A_solo_tripper1 point2 points  (2 children) | Copy

The fact that you transferred so much data from reddit is amazing to me. I wish there were multiple platforms or versions of reddit, therefore if someone gets banned from one site, that account could still continue post unaffected, AND SEEN throughout accept that one site.

How long did it take you to do this?

Why not do this for all of reddit?

Why not allow accounts to post to reddit and your site? If someone posts on your site with their account, it'll show up on reddit as well under their account.

This will solve so many issues I have with reddit. From being shadow banned, to regular banned, to posts being removed, etc.

If you take my advice and you make millions, would you throw me a bag? :)

[–] points points | Copy

[permanently deleted]

[–]A_solo_tripper1 point2 points  (0 children) | Copy

But I'm considering doing my project open-source in the future for other developers to scrape their own subreddits.

I, along with many more, would definitely appreciate this.

Keep up the good work!! :)

[–]DulceDeLecheMardel1 point2 points  (2 children) | Copy

Can you make it downloadable so we can mirror it?

[–] points points | Copy

[permanently deleted]

[–]modAutoModerator[M] 0 points1 point  (0 children) | Copy

Just a friendly reminder that as TRP has been quarantined, we have developed backup sites: https://www.trp.red and our full post archive (and future forums) https://www.forums.red/i/TheRedPill. Don't forget to register on TRP.RED and reserve your reddit name today. Forums.Red is currently locked but will be opened soon.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]modAutoModerator[M] 0 points1 point  (0 children) | Copy

Just a friendly reminder that as TRP has been quarantined, we have developed backup sites: https://www.trp.red and our full post archive (and future forums) https://www.forums.red/i/TheRedPill. Don't forget to register on TRP.RED and reserve your reddit name today. Forums.Red is currently locked but will be opened soon.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.



You can kill a man, but you can't kill an idea.

© TheRedArchive 2020. All rights reserved.

created by /u/dream-hunter