Category Archive: Papers

Reading papers on an iPad

The iPad is a pretty good platform for reading Linguistics papers, not just novels. I actually did all of my grading on the iPad, using a program called iAnnotate, which I would recommend. It lets you add highlights, text notes, drawings to PDFs. It’s a more sophisticated version of Aji Annotate for the iPhone. Getting PDFs in and out requires running a server on your computer to sync a folder over WiFi, and the annotations are all saved not into the original file, but into a copy marked as annotated. But it works pretty well, and it has so far been a nice way to grade and to read journal articles and so forth.

If you’re not trying to annotate, but just to read, I think I like GoodReader best of the PDF readers (universal for iPad and iPhone). Reading with iAnnotate is not bad, but it’s a bit better with GoodReader, and GoodReader has lots of ways of getting documents onto the device (including direct access to your Dropbox folder). It will also read Office and iWork documents, and even works as a video player.

Remove your PDF metadata


I get abstracts to review on a fairly regular basis, and conference organizers are often not particularly diligent about removing the PDF metadata. So, since I use EagleFiler to organize everything, it diligently picks up the metadata, displays it prominently in my list view, and quite often makes the abstracts very much non-anonymous.

At least in Mac OS X 10.5.x (Leopard), there’s a pretty quick fix. Here’s how to create a little droplet application that you can dump your PDF on to strip the metadata out. If you are organizing a conference: do this.

Open Automator, click Choose (to just open a blank workflow). From under Files & Folders, drag Get Selected Finder Items into the workflow. Then, from under PDF, drag Set PDF Metadata into the workflow below it. Check all the boxes. Save it, file format Application, as something like “RemovePDFmetadata”. Et voilĂ . Select a PDF (or a whole slew of PDFs) and drop them on the application, and it will remove the metadata.

Note: This relies on Automator, and therefore will only work on Mac OS X 10.5.x (not on previous versions of Mac OS X).

Even better BibDesk preview pane

bibdesk-skim-sidebar.gifHaving been inspired by the preview pane template I just posted about, I created an even better one. It’s a HTML template, and it’s kind of a work in progress, but it does some interesting things. First: it will display the Skim notes. It will also pick up citations in the Cited-by and Cites fields (which I intend to use as “see also” links) and displays them as clickable links that will bring up the linked-to BibDesk record. It’ll print the abstract if there is one and the keywords if there are any (although I could not figure out how to make the keywords clickable).

The image on the right shows what it does. The red thing at the top is the cite key, the link below is to the PDF itself (and any linked files will appear there), then the keywords, then the links to the BibDesk records mentioned in the Cites field (format is “citekey1,citekey2,citekey3”). Below that the Skim notes.

There are two files that make up this template, one is the outer layer of the HTML page (betterHTML.html), and the other is the inner “partial” that is used for each item in the list (betterItemHTML.html). You can see how it works by reading through it. The code is below the fold (or click on the links just given to download them directly). To install, put these in ~/Library/Application\ Support/BibDesk/Templates/ (if you downloaded them, be sure to change the name so the extension is .html, not .txt) and then go to the Templates pane of the Preferences and add a new template. Give it a name, choose betterHTML.html as the file, and then select the new template and hit + to add a second file within the template. Add betterItemHTML.html as type Default Item. And, then, you should be able to select it as a display mode for one of your panes (I’d suggest the right-hand pane).

Read the rest of this entry »

BibDesk preview pane with Skim notes

I just came across this nice template for BibDesk that will display your Skim notes in a preview pane. My new BibDesk layout has the notes preview in the right-hand pane and the files preview in the lower pane. Also, note that in the files display, there is a hidden zoom control in the top center.

Get the “previewTemplate” here at Jim Harrison’s templates page . To install it, put previewTemplate.rtf in ~/Library/Application\ Support/BibDesk/Templates/ and then go into the BibDesk Templates preference pane, make sure nothing is selected, and hit the + button. Click on the new name twice slowly and give it a name like “previewTemplate”, and double-click on the file to choose the previewTemplate.rtf file you just put into your Library folder. Finally, select the previewTemplate from the template popup under the pane you want to put the preview template in.

Acrobat 9’s ClearScan is great, but.. er.. selective

In my efforts to convert my piles of photocopies into searchable PDFs, I’ve come across Acrobat’s 9 ClearScan option. It’s really nice. It can make some of the most beautiful and small scans I’ve ever seen. I think what it must be doing is actually using its own internal representation of the fonts to denoise the resulting image, but the result looks great.

ClearScan OCR dialog

Here’s a side-by-side comparison, with the original on the right and the much much smaller ClearScan output on the left.

After and Before

Look how nice and clear the output is. Well, except… hmm…

I don’t know what the root cause is, but for some reason ClearScan will silently throw away bits of your text. This is very uncool. I think on balance, having all of the words is better than having a nice clean printout. I can’t seem to find any discussion of this on the web, though there should be some. This is a serious problem with ClearScan.

Update: It looks like the text isn’t exactly gone, it’s just off the page. Examine Document showed me this:

Examine Document

Furthermore it seems to choke specifically where it detects a period within a word. According to Examine Document, the text there is not “Kenneth” but “Ke.nneth”. I can understand that, there’s a dot on the scan, but it seems to throw the alignment totally out of whack. No idea yet how to fix this. I did however determine that the loss of text elsewhere on another page also had this same property of having a dot in the middle of the word before the “lost” text.

Someone on the Canon interface design team earns a D.

Why, pray tell, did anyone think this was a good idea?


I mean, I don’t have particularly fat fingers, but that “Stop” button is just a little bit too easy to hit, when I meant to hit “Start.” Why did it even need to be anywhere near the “Start” button?

Most Canon copiers have a very nice feature, for those of us who scan things, called “Job Build.” You get at it by hitting “Special Features,” and then selecting “Job Build.”

Photo_060908_002b.jpg Photo_060908_004b.jpg

Once you’ve done that, you can flop the book onto the glass, hit “Start,” flip the page, flop the book back down, hit “Start” again, etc., etc., etc. It will happily collect the pages together into one big document. You can get into a pretty good rhythm: flop, Start, scan, flip, flop, Start, scan, flip, …

Here I am after having scanned 42 pages.


The instruction on the top says “Change the original and press the Start key.” To scan my 43rd page, I would do exactly that.

What it doesn’t say is that if you hit “Stop” it will instantly throw away everything you’ve scanned so far and cheerily wait for you to start something new.


Update: Plugin to push from Zotero to BibDesk

Of course, as soon as I finished writing up my commentary about how Zotero doesn’t play nice with BibDesk, I came across a newly released plugin to integrate Zotero and BibDesk, written by George MacKerron.

I haven’t stress tested it, but what it seems to do is change the behavior of the action of adding to your Zotero library by clicking on the button by the URL bar, causing the item to be added both to your Zotero library and simultaneously pushed over to your BibDesk library.

In my single test, it seemed to work. Currently, I can see a couple of downsides, but this is certainly a step forward. One is that there are still two databases, which can go out of sync. For example, if there is an error in the automatically captured information, I need to change it both in Zotero and in BibDesk. At some point I need to decide which database is the real one, but if it’s Zotero that I decide on, then I might as well just abandon BibDesk and re-export my library to BibTeX format every time I make a change. Right now, Zotero is not a contender because it doesn’t have a good way to handle citation keys, but there is ongoing discussion about the issue, and I expect that when Zotero reaches 2.0, it will support them.

I have a lot of things in my Zotero database already, and I don’t think there’s currently a way to push an existing entry over to BibDesk. Further, even if there were, if I made a change in Zotero and pushed it again, it would not update the BibDesk entry but would instead just create a new (mostly duplicate) entry.

I’m not sure where I want to see this all go. First, I would like to see Zotero become slightly quicker to use. The simple addition of a couple more shortcut keys would make a big difference: the less I have to move the pointer and click the better. It would be great if I could hit a key to get to the “related” tab, and another to add a new relation (or note, or keyword, etc.) without having to mouse over to and click an “Add” button. Second, I would like it to be easy and reliable to export from Zotero to a BibTeX file (perhaps one I set ahead of time, so it just keeps overwriting it).

One problem I see with getting this to work is that I have had to specially set the titles in a number of my BibTeX entries in order to preserve selected capitalizations. If I don’t set “LF” like “{LF}” and “Japanese” like “{J}apanese” within the title, the bibliography style file I use will change them to “lf” and “japanese”. This just isn’t information that Zotero has, and suggests to me that I might need to keep BibDesk as the authoritative database, using Zotero just for finding and pushing things to BibDesk. Another problem is, of course, the BibTeX citation key, which Zotero does not currently give access to (on export, Zotero creates its own, without any user control).

Anyway, one step forward on a path that seems still relatively long.

Organizing all those PDFs

I am a packrat, and I have been collecting and scanning PDFs for years. The trick is how to organize them in a way that allows me to find what I need quickly enough to be useful.


The way I do this now is with EagleFiler. I’ve tried a number of other options, but I haven’t found a better one. Here are some good things about EagleFiler:

  • The EagleFiler library is nothing more than a folder on your disk, with some associated metadata. This means that if, somewhere down the road, EagleFiler ceases to exist, the files are just as accessible as they were (minus the metadata).
  • EagleFiler will refuse to import a file into your library you already have, so you don’t wind up with a lot of duplicates if you just dump a folderful of PDFs on it.
  • Importing is very easy, a single keypress will send whatever you’re looking at (in a web browser, in the Finder, etc.) into the library.
  • It’s stable, and it’s quick.
  • It indexes the files in the library so you can easily search for “Japanese” and find all of the papers that mention Japanese.
  • It saves a checksum for all of the files that allows it to do integrity checking, so you can be confident that the files in your library have not gotten corrupted.
  • The developer is very responsive and the product is under active development.
  • The company that produces it is named C-Command Software.

EagleFiler isn’t perfect, but it is being improved at a relatively rapid pace as well. One thing that could be better is that, although files can be tagged with keywords, the interface is not well suited for having large numbers of tags (so I tag things pretty conservatively). There are things that it doesn’t do now (like allow you to create smart folders from searches, or detect when a file is added within the library folder from outside EagleFiler) that are likely to be on the horizon for future versions.

EagleFiler is not very good for taking notes, and I’m not sure it will get better. You can easily create new RTF files and enter text into them from within EagleFiler, but they aren’t associated with anything, they are just their own standalone files. Each file in an EagleFiler library does have a “notes” field, but there’s just one, and the only way to see them is in the inspector window.

A lot of the programs I use allow linking to PDF files, and I find that the EagleFiler library is as good a place as any to link things in to. So, in my bibliography programs and notetaking programs, I just link to the file in EagleFiler.


Although I haven’t made a lot of use of it yet, another program that allows you to browse and tag is Yep. Yep is pretty, and it has good support for tagging with lots of keywords. It integrates with Spotlight, so it’s easy to just point it to the PDF folder inside my EagleFiler library, allowing me to use Yep to browse. However, a downside of using Yep this way is that it doesn’t pick up the metadata from EagleFiler (it has its own metadata), and so browsing it this way requires relying on the filename. (Also, if I’m browsing files inside an EagleFiler library, I can’t change the filename within Yep, or EagleFiler will no longer be able to find it.) Despite the fact that it’s pretty, I don’t think Yep really serves a necessary function for me. I can browse papers easily enough in EagleFiler itself. Yep’s note capabilities are no better than EagleFiler’s.


One program that I might actually use to browse papers, apart from EagleFiler, is BibDesk. It’s not really any better than EagleFiler for browsing (it’s pretty much the same, it will allow PDFs to be attached, and searched), but it is at its core a bibliography manager (for LaTeX bib files), and so I find myself with the program open often enough anyway. It has its own datafiles, but they serve a different purpose from EagleFiler’s metadata (although it is still the case that I’ve entered the title and author information for each paper twice). You can enter notes here, but they’ll go into your bib file (and, again, there’s just a single repository per paper for notes, and to see them you have to click over to the notes pane of the inspector window).


One thing that neither EagleFiler nor Yep allow for is linking one paper to another, which can be nice to do. This is probably something that EndNote can do, but I don’t really like EndNote very much. I did recently take a look at Zotero, a Firefox plugin for bibliography and web clippings management. There are things I like a lot about Zotero. You can add notes (either linked to a paper, or just by themselves), tag everything with keywords, save searches as smart folders. And it lets you capture papers right off the web where you find them. It also indexes any attached PDF files and allows you to search them, and it’s relatively quick and responsive. And it does allow you to link papers to one another (only symmetrically, so any paper you link to is also linked back). You can also add as many notes as you like, which can themselves be linked to multiple papers or keywords.

The main downside I find to Zotero, which is enough to keep me from trying to use it seriously now, is that the way the library is organized requires more clicking around that I’d like. Each bibliographic entry is treated as a folder, to which notes, web pages, and PDF files can be attached. But to attach a file to an entry, you need to click on the Attachments tab and then click a button and find the file, and to view the attached file, you need to click on the paper, reveal its contents, click on the PDF file, and then click on the view button.

Zotero has a plugin for Microsoft Word that allows you to use it like EndNote, which is cool. In fact, I’ve found the Zotero plugin to be more flexible than either EndNote’s or Bookends’. For LaTeX, I have so far not discovered any way for it to be as seamlessly integrated as BibDesk is. As far as I can tell, you need to export the Zotero library to a bib file every time you make a change that needs to be reflected in your LaTeX documents. I will, however, keep trying occasionally with Zotero, because I really want to like it.

I’ll probably say something else later about bibliography management, but Bookends is a pretty good bibliography manager that has a Microsoft Word plugin (so, again, it can be used like EndNote), and it serves as a decent paper browser, much like BibDesk does. Because I’ve recently converted to LaTeX, I don’t think I will be using Bookends much anymore, however; BibDesk does pretty much all the same stuff.

Papers also looked like it would be promising, but it just hasn’t come up to speed quickly enough for me to be using it yet. It’s also very nearly bibliography management software itself, but it’s still a bit clumsy (like Zotero) for getting citation information into bib format (and apparently a bit buggy as well). I also found it not easy to figure out how to deal with manuscripts that are not published. If Papers evolves to play better with LaTeX and handle the online databases relevant for Linguistics better (it is right now kind of focused on medical research, although the options are expanding), I might consider it. It is a nicely designed application, and the user experience is pleasant. It also does not seem as if it possible to relate papers to one another here, though maybe I’m just missing it somehow. Papers also, I really want to like, and I’ll be keeping an eye on its future development.

For taking notes, Journler is not bad, although I still haven’t quite managed to figure out what the best workflow is. The developer of Journler is working on another project called Lex that might be better suited for this kind of thing, but as far as I know, Lex hasn’t even entered alpha testing. But for the little bit of notetaking that I have done in Journler, I’ve had pretty good luck linking PDF files from my EagleFiler libraries.

Ok, that’s enough commentary on this topic. Updates later as needed.