August 5, 2014

[Mod] Recruiting volunteers - please read
redpillschool
05 August 2014
TheRedPill
[–]1FrogTrainer9 points10 points  (3 children) | Copy

Would it make sense to put a project such as this on Github?

That way you are at less risk of losing work should a dev drop the project before he can finish, and multiple people could submit to it/review it, etc.

I might be able to contribute but I don't want to make any large time commitments at this point. But it would be nice to have a Github repo we could all look at.

We'd probably want to create a extra accounts on GH to prevent doxxing. My normal account is heavily tied to my work projects.

[–][deleted] 3 points4 points  (2 children) | Copy

Git hub it so anyone can contribute. I, myself, might want to contribute a few lines of code, and we could have lots of code auditing to prevent some sort of attack on us.

[–]RedPillington7 points8 points  (1 child) | Copy

that's presuming feminists can code

[–]jolly--roger12 points13 points  (0 children) | Copy

They have tons of betas who can

[–]antwonedw1 point2 points  (0 children) | Copy

Definitely interested. Experienced sr software engineer here. Replying instead of pm cause on phone.

[–]CokeandGrappa1 point2 points  (4 children) | Copy

I am not thrilled by the "offline archival", but i got the qualities you ask for. Care to elaborate your intentions?

[–]Modredpillschool[S] 7 points8 points  (3 children) | Copy

Just looking to back up some of our best posts and discussions in a rare case of emergency.

[–]erqos0 points1 point  (1 child) | Copy

to back up some of our best posts and discussions

How does one determine what's "best"? Are you going to go down in descending order by link karma points? Because I've seen really good posts and comments that were neglected in terms of votes for various reasons.

Also, I'm not opposed to a coordinated and centralized effort, but it'd probably be a good thing for multiple groups or individuals to spearhead their own distinct archival efforts, based on their own preferences. Also we need a much better query system than what reddit's providing, so as to facilitate specific lookups based on specific criteria. But yeah, regardless of how we do it, if we lose this treasure trove of knowledge, it'd be equivalent to when Alexandria was burned down.

[–][deleted] 1 point2 points  (0 children) | Copy

Why don't we just put it in a torrent?

[–]trpill0 points1 point  (0 children) | Copy

You can back this entire sub easily. There are people who back all of GW.

Venture to /r/datahoarder for more information

[–]QuietlyLearning1 point2 points  (11 children) | Copy

Would we be able to download said archive using something like a torrent since I presume it would be large?

[–]Endorsed ContributorAFPJ1 point2 points  (9 children) | Copy

Every post and comment ever made to this sub fit into less than 3GB of UTF-8.

[–]jolly--roger0 points1 point  (7 children) | Copy


That's a fucking ludicrously huge amount of text. A few libraries perhaps.

We're in the range of MBs at most, and text is easily compressed as well.

[–]Endorsed ContributorAFPJ0 points1 point  (6 children) | Copy

3GB is only 3,221,225,472 characters. Now factor in indexing, XML/HTML encapsulation & XDATA overhead. The "Men Are Not Happy" post is among the largest on this sub and its source contains 809,335 Characters circa this comment - only 3,980 of those would fit into 3GB. A lot of space is saved via compression & redacting superfluous / irrelevant HTML into XML but 3GB is a miniscule amount of data, even for plaintext.

You'd have to redact the text to a fucked up encoding like ASCII, use draconian delimiters such as CDATA or straight up line breaks and then compress everything with a SOTA algorithm like BWT, PPM or Dynamic Markov in order to preserve this sub in "the range of MBs". Why would anyone ever want to write software to do this when a few gigs costs a few pennies? OP isn't trying to transmit TRP to the mars rover.

[–]jolly--roger0 points1 point  (5 children) | Copy

only 3,980 of those would fit into 3GB

And that's about 3700 articles full of noise, actual reposts or semantical reposts, covering say hypergamy over and over (and over again), like we didn't catch it the first time.

Please, don't try to tell me you'd actually need 3GB to save the essence of TRP. You would not.

I think it's pointless to backup the whole sub. I'm not saying it's not easy, it's just pointless.

[–]Endorsed ContributorAFPJ0 points1 point  (4 children) | Copy

I think it's pointless to backup the whole sub. I'm not saying it's not easy, it's just pointless.

Thread removed so looks like OP found whatever he was looking for, but it's easier to archive the whole sub than run PageRank or some other kind of Fuzzy Logic / AI or even worse, manually hand-pick posts.

[–]jolly--roger0 points1 point  (3 children) | Copy

it's not removed, it's no longer stickied.

thanks for regurgitating what I already said -- ie. making your comment and this reply extremely pointless.

go ahead and pick quantity whenever it suits you. I pick quality.

[–]Endorsed ContributorAFPJ0 points1 point  (2 children) | Copy

I know it's 6AM and you probably haven't had coffee, but at least pretend to pay attention.

You said

it's just pointless.

I said

but it's easier

Because your logic of

pick quantity whenever it suits you. I pick quality.

Is faulty. You get all the quantity without losing any of the quality with backups.

It's an automated process that doesn't require human intervention or hand-picking.

[–]jolly--roger0 points1 point  (1 child) | Copy

It's 4 pm here.

You make assumptions and, apparently, wrong ones.

[–]Endorsed ContributorAFPJ0 points1 point  (0 children) | Copy

Bro, just stop it hahah. http://en.wikipedia.org/wiki/Straw_man

Why are you still replying?

[–]1FrogTrainer-1 points0 points  (0 children) | Copy

Where did you pull the 3GB stat from?

[–][deleted] 0 points1 point  (0 children) | Copy

TheRedFile, I like the ring of that.

[–][deleted] 0 points1 point  (0 children) | Copy

If you are doing an archival service, could you please make a Syndie archive? Such as if we're blocked in the UK as a hate site and so people can access our material through the darknet. Essentially, it allows you to distribute your backup to multiple people who then can synchronize it with their own copies, which they can distribute ad nauseam. But then you can post and it'll go to everything else. Unfortunately, it's styled after regular forums. https://syndie.i2p2.de/

You can kill a man, but you can't kill an idea.

