Friday, August 03, 2007

The issue of finding those papers...

I read lots of academic papers in my field - though certainly not as many as I "should" - but how do I go about finding them? It sometimes strikes me that I don't really have a good strategy for keeping up to date, or for finding good references when I get a new idea.

I go to conferences, like others do. But obviously I don't go to every conference, and I don't see every presentation on a conference, and I'm not mentally present during every presentation I see. Anything else would be impossible. Worse, conference proceedings are usually only available as hard-to-search CDs or books, instead of for free on the conference website, which would be the sensible option.

There are a few repositories meant to contain papers, or links to papers, in particular research fields, and also to provide good means of finding the papers you want. Sadly, many of them are half-baked.

CoRR (arXiv) have never reached anywhere near the same popularity in Computer Science as it has in physics, probably partly due to weird requirements of submitting the latex source of every paper, something that rarely works in practice. Cogprints have likewise failed to take off, even though the technical platform seems decent enough. Citeseer used to be good around 2002-2003, but seems to have been neglected by its administrators lately (I've had serious problems correcting missing or faulty metadata for my own papers). Bill Langdon's GP Bibliography is excellent, though for a limited domain.

In the best of all world, every paper should be easy to find through Google Scholar. A main obstacle to this is that so many researchers fail to make their papers available on their personal websites. Even in computer science! This is puzzling, and shameful.

I think it is every serious researcher's obligation to make his complete scientific output publicly available on his own home page, unless he/she has a very good excuse. Otherwise one would suspect that he/she has something to hide.

So if you are reading this, and still haven't made all your publications freely downloadable from your website, go and do it. Now. For the sake of science, and your own reputation as an honest scientist. Unless you have a very, very good reason why you shouldn't. And you probably haven't.

(Yes, I do feel quite strongly about this...)


nojhan said...

One good reason not to publish papers is that it may be forbidden by the law.

Only a few good computer science journals allow self-publication of postprints, even if a bit more allow preprints (SHERPA/ROMEO is your friend if you want to know).

One solution to the problem of papers availability lies in Open Access.

For example, the PLoS has launch PLoS-ONE, a really innovative way of publishing, using a web-oriented paradigm, instead of a paper one. And it has a (battered) Computer Science page, among a large number of biology categories.

Incidentally, I find really surprising that the Computer Science community don't seems to comprehend the true advantage of using computers to share information...

Togelius said...

Yes, open access is excellent and we should have more of it. JMLR and JAIR are excellent examples of how it should be done.

Still, even if some publishers don't allow you to republish your papers, that is still not a good version. In practice such bans are extremely unlikely to be enforced. The publishers have a lot to lose, and a publisher that sues or prosecutes its own authors is likely to see a sharp drop in submitted manuscripts. After all, we do have a choice. E.g., in evolutionary computation there are major journals and/or conference proceedings published by IEEE, ACM, Springer and MIT Press.

So (hypothetically!) if ACM sent me threatening e-mails for making my own papers available, I would just never send a paper to Gecco again. And they know this.

Anonymous said...

Google Scholar happily indexes online versions of publications by major pubishers.

E.g., GECCO 2007 is already indexed.

Of course, the institution still needs to have a subscription...

Nosophorus said...

Hi! :)

But GECCO allows an author, although under some conditions, to republish his paper in another journal, conference, etc, see here:

Submit substantially new work: The material in a paper must represent substantially new work that has not been previously published by conferences, journals, or edited books in the genetic and evolutionary computation field. GECCO allows submissions of material that is substantially similar to a paper being submitted contemporaneously or review in another conference. However, if the submitted paper is accepted by GECCO, the authors agree that substantially the same material will not be published by another conference in the evolutionary computation field. Material may be later revised and submitted to a journal, if permitted by the journal.

Sometimes that kind of atitude is important to avoid the so-called publish or perish behaviour, but, at same time, it can also limit the freedom of researcher upon his own paper.

Até Mais!