«

»

Remove your PDF metadata

RemovePDFmetadata-automator.jpg

I get abstracts to review on a fairly regular basis, and conference organizers are often not particularly diligent about removing the PDF metadata. So, since I use EagleFiler to organize everything, it diligently picks up the metadata, displays it prominently in my list view, and quite often makes the abstracts very much non-anonymous.

At least in Mac OS X 10.5.x (Leopard), there’s a pretty quick fix. Here’s how to create a little droplet application that you can dump your PDF on to strip the metadata out. If you are organizing a conference: do this.

Open Automator, click Choose (to just open a blank workflow). From under Files & Folders, drag Get Selected Finder Items into the workflow. Then, from under PDF, drag Set PDF Metadata into the workflow below it. Check all the boxes. Save it, file format Application, as something like “RemovePDFmetadata”. Et voilà. Select a PDF (or a whole slew of PDFs) and drop them on the application, and it will remove the metadata.

Note: This relies on Automator, and therefore will only work on Mac OS X 10.5.x (not on previous versions of Mac OS X).

7 comments

1 ping

  1. Dr. H. S. Mallik says:

    http://www.acrobatusers.com/tutorials/using-examine-document-remove-sensitive-information

    or

    http://help.adobe.com/en_US/Acrobat/9.0/Standard/WS7E9FA147-10E3-4391-9CB6-6E44FBDA8856.w.html

    Examine a PDF for hidden content

    Use the Examine Document feature to find and remove content from a document that you don’t want, such as hidden text, metadata, comments, and attachments.
    If you want to examine every PDF for hidden content before you close it or send it in email, specify that option in the Documents preferences using the Preferences dialog box.

    1. Choose Document > Examine Document.

    If items are found, they are listed in the Examine Document panel with a selected check box beside each item.
    2. Make sure that the check boxes are selected only for the items that you want to remove from the document:

    Metadata
    Metadata includes information about the document and its contents, such as the author’s name, keywords, and copyright information, that can be used by search utilities. To view metadata, choose File > Properties.

    File Attachments
    Files of any format can be attached to the PDF as an attachment. To view attachments, choose View > Navigation Panel > Attachments.

    Annotations And Comments
    This item includes all comments that were added to the PDF using the comment and markup tools, including files attached as comments. To view comments, choose View > Navigation Panel > Comments.

    Form Fields
    This item includes form fields (including signature fields), and all actions and calculations associated with form fields. If you remove this item, all form fields are flattened and can no longer be filled out, edited, or signed.

    Hidden Text
    This item indicates text in the PDF that is either transparent, covered up by other content, or the same color as the background. To view hidden text, click Preview. Click the double-arrow buttons to navigate pages that contain hidden text, and select options to show hidden text, visible text, or both.

    Hidden Layers
    PDFs can contain multiple layers that can be shown or hidden. Removing hidden layers removes these layers from the PDF and flattens remaining layers into a single layer. To view layers, choose View > Navigation Panel > Layers.

    Bookmarks
    Bookmarks are links with representational text that open specific pages in the PDF. To view bookmarks, choose View > Navigation Panel > Bookmarks.

    Embedded Search Index
    An embedded search index speeds up searches in the file. To determine if the PDF contains a search index, choose Advanced > Document Processing > Manage Embedded Index. Removing indexes decreases file size but increases search time for the PDF.

    Deleted Hidden Page And Image Content
    PDFs sometimes retain content that has been removed and which is no longer visible, such as cropped or deleted pages, or deleted images.

    3. Click Remove to delete selected items from the file, and click OK.
    Note: When you remove checked items, additional items are automatically removed from the document: digital signatures; document information added by third-party plug-ins and applications; and special features that enable Adobe Reader users to review, sign, and fill in PDF documents.
    4. Choose File > Save, and specify a filename and location. If you don’t want to overwrite the original file, save the file to a different name, location, or both.

    The selected content is permanently removed when you save the file. If you close the file without saving it, you must repeat this process, making sure to save the file.

    remove metadata

  2. Paul Hagstrom says:

    Thanks for the pointer to the elaborate walkthrough from Adobe. The Acrobat solution will work of course regardless of platform, and is probably more thorough than the Automator solution I suggested. My original intent with the post was to target the little 2-page conference abstracts that are supposed to be anonymously reviewed, and probably don’t have a great deal of hidden metadata to begin with. Still, it’s good to have this list of places to look for potentially identifying information for more general purposes as well, thanks for the link.

  3. Jill Beckman says:

    Paul, thanks for this–it does exactly what I need to do with a minimum of fuss (and doesn’t require the paid version of Acrobat to implement–something that my grad students will appreciate).

  4. John says:

    This will also work as a service in SL. Just leave off the first Finder step and apply the service to pdf files.

  5. mark franke says:

    On a mac, how can you check specific pdf’s to confirm that the metadata has been removed? Open it in preview, then look where?

    Thanks!

  6. Paul Hagstrom says:

    In Preview, you can see at least some of the metadata by opening the inspector (Command-I) and looking in the first two panes (General Info inspector and Keywords inspector).

  7. Juan says:

    Thanks for this quick help. Do you see any chance to also erase or overwrite the Metadata “PDF Producer”?

  1. Daily links for 11/05/2010 | Blog | Bob Sutor says:

    [...] lingtech : Remove your PDF metadata [...]

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>