ArXiv preprint server plans multimillion-dollar overhaul

karpathy · on July 2, 2016

I developed and maintain Arxiv Sanity Preserver (http://www.arxiv-sanity.com/), one of the Arxiv overlays the article mentions. I built it to try address some of the pains that the "raw" arXiv introduces, such as being flooded by paper submissions without any support or tools for sifting through them.

I'm torn on how Arxiv should proceed in becoming more complex. I support what seems to be the cited poll consensus ("The message was more or less ‘stay focused on the basic dissemination task, and don’t get distracted by getting overextended or going commercial’") and I think the simplicity/rawness of arXiv was partly what made it succeed, but there is also a clear value proposition offered by more advanced search/filter/recommendation tools like Arxiv Sanity Preserver. It's not clear to me to what extent arXiv should strive to develop these kinds of features internally.

Whether they go a simple or more complex route, I really hope that they keep their API open and allow 3rd party developers such as myself to explore new ways of making the arXiv repository useful to researchers. Somewhat disappointedly, the arXiv poll they ran did not include any mentions of their API, which in my opinion are a critical, overlooked and somehow undervalued. For example, today their rate limits are very aggressive and make it tricky to pull down publication metadata for Arxiv Sanity Preserver, even when this undoubtedly costs minimal bandwidth. In the future, I'm concerned they will discontinue the API altogether and prevent similar 3rd party overlays from being built.

maus42 · on July 2, 2016

I can see many potential problems arising if arxiv itself implements recommender systems or in some other way tries to be a gigantic journal or the primary social media for scientists or something alike. Tools that are strictly about making accessing the repository easier, like better word search, are fine.

I'd prefer if arxiv stays a repository, and the filtering and discussion and rating is left to other parties, for example, overlay journals such as Discrete Analysis (discreteanalysisjournal.com) [that ranked highly on HN when it launched]. Many separate journals can compete. The future of arxiv being as neutral as possible ground that disseminates papers freely on the internet (in standardized format and with maybe even the API you mentioned) and allows for a beneficial ecosystem of journals and other communities to develop, well, it sounds a more healthier future than arxiv trying to become the science community that encompasses everything from hosting of papers to discussion about them and judging their quality.

Assuming they adopt objective like that, I can easily imagine arxiv becoming a FB-of-science of sorts, and it might be very convenient ("I can just log in to Arxiv and everything will be there!") and maybe people would love it. (Everybody except lunatics and luddites and generally people who like to be inconvenient out of sheer malice to their fellow human beings are in FB, right? At least in my social circles. And that gives FB overwhelming ability to dictate terms for using FB, given that the ordinary user isn't their customer.) It could hand too much power to just one institution to define how the scientific community communicates.

Suppose they introduce comments. Sooner or later that would go the same way as any other internet community that has commenting and isn't 4chan, and they'd need actual comment moderation, maybe even ban people. If they are the major science-on-the-internet platform, that could be bit too much. I understand that currently they only do some basic sanity checks that submitted papers are not utter garbage?

visarga · on July 2, 2016

Hi Andrej, first of all let me thank you for creating my prefered interface to arXiv.

Do you select papers to be displayed on arXiv-sanity by hand or automatically? Does manual selection explain why there are sometimes 2-3 days with no publications, and the suddenly a bulk of papers?

karpathy · on July 2, 2016

I don't select anything by hand, it's listed by date, as it comes from arxiv API. The 2 day lags are due to weekends, when arxiv does not update.

stared · on July 2, 2016

There are some arXiv overlays for voting and comments, like: https://scirate.com/ (which didn't catch for some reason).

While I was initially for having arXiv with more features, now I appreciate this unix philosophy of "doing one thing, but doing it well". Various voting, comment, recommendation or accreditation system may catch or not. So I am all in favor of providing a separate services (even if by the same team), communicating by API.

jessriedel · on July 2, 2016

The major issue is just that arXiv maybe be able to gather a critical mass in a way that third parties like SciRate can't. Also, if a third party did get a foothold, it's possible they could go bad (in a way that's less likely for the arXiv), and the community would be stuck.

Still, all things considered I agree that I'd rather arXiv not take this on.

j2kun · on July 1, 2016

Readers commenting on papers would be a welcome addition. Hopefully, with a reputation system to go with it.

santaclaus · on July 2, 2016

I would love to have some kind of stack overflow type system for commenting on papers. If you find a bug in some paper chances are someone else will benefit from the knowledge. Oh, the authors didn't report some key step in the algorithm and you figure it out? Pay it forward and leave a comment! Crowd sourced errata, if you will.

Hell, integrate with GitHub so you can link to implementations, etc.

fuzzythinker · on July 2, 2016

http://gitxiv.com/

paulsutter · on July 2, 2016

The fact that reader comments are controversial (about as many people are opposed to this as in favor, see graph in article) suggests that it's a great idea, and that the design of the reputation system is critical.

Perhaps they can semi-hide the comments during an introductory phase where they develop the reputation system. A well-crafted reputation system could eventually take on a role resembling peer review, but should be approached very carefully.

argonaut · on July 2, 2016

I'm not an academic, but I think you'll find many academics would be against this. Things would get more complicated because there would obviously need to be a vetting process on who could comment. Things shouldn't be a popularity contest, so I think a reputation system would be counterproductive.

noobermin · on July 2, 2016

>Things shouldn't be a popularity contest

You clearly are not an academic. Publishing papers is a popularity contest.

star-trek-fleet · on July 2, 2016

But it's a different type of popularity contest. Commenting on publications is virtually non-existent. It's not that there is no venue of commenting, we do have plenty of academic conferences; but the amount of comments received in the conferences is virtually none. People do ask questions, but seldom give constructive (be it positive or negative) comments.

adpouasd · on July 2, 2016

I don't think this is true in general (though it might be so in the field you are working in). I've received many constructive comments from colleagues, and this has often lead to subsequent collaboration.

GFK_of_xmaspast · on July 2, 2016

Pubpeer?

argonaut · on July 2, 2016

Are you an academic? Because peer review is almost always (at least in CS) blind, with authors anonymous to the reviewers.

santaclaus · on July 2, 2016

Well, I find CS venues (conferences and journals) to be about 50/50 in terms of blind or not. Also, while it will certainly vary subfield to subfield, once you are familiar enough with an area you can pretty confidently guess what group (if not what exact authors) are behind any given paper. I bet it would be pretty easy to cook up a automated paper un-anomymizer based on the body of published work out there already.

noobermin · on July 2, 2016

In my field of laser physics, there are some really, well known lasers, so if your paper talks about this or that facility and cites this "monumental work by X group" you can easily gather who is the author. Not to mention (yes heaarsay) of at least one person in my group who refereed a paper, harshly criticizing it paper only to have the editor come down and say "accept this paper anyway." Academia has some good parts but it's as political as anything, if not more so.

smaddox · on July 2, 2016

Reputation systems are nice for providing a carnal motivation, but moderation is likely going to be necessary to maintain a high quality density. Crowd-sourced moderation a la slashdot is a reasonable place to start. I think there's a lot of room for improvement in this space, though. An academic publishing

I hope that in the future, something like ArXiv with a discussion system will be built on a fully distributed network a la IPFS. That's been one of my most ambitious dream projects since my third year of grad school. I haven't made any progress towards it. I think such a system could stave off some of the decline in academia. I'm glad that ArXiv is headed in that direction. They're the experts.

luaskldj · on July 2, 2016

I don't think this is a useful thing to have. There is no clear need for it, and it would likely reinforce group think a la HN or reddit.

slater · on July 1, 2016

I just hope they keep it Lucida Grande :)