Working with Attachments in CouchDB

CouchDB documents can have attachments just like an email message can have attachments. (...) Attachments can be any data. It is easiest to think about attachments as files attached to a document. These files can be text, images, Word documents, music, or movie files.

CouchDB documentation

The CouchDB documentation doesn't clearly describe the attachments in much detail. In this post I give an overview how attachments work in CouchDB and how to use them in a node.js application.

As the official documentation states, the main characteristic of an attachment is that it is unspecified data stored alongside a document. An attachment is identified by its parent document and a name. The only properties of an attachment are its length and content type. It can be binary data (e.g. image/jpeg, application/octet-stream) or text (text/plain, application/json). The content of an attachment cannot be queried, only retrieved by name. Attachments are suitable for storing potentially big data that belong to a document but it is not necessary to fetch them every time that document is used.

The lifecycle of an attachment is bound to its parent document. When documents are deleted, their attachments are deleted automatically. The same is true for replication. Attachments themselves don't have revisions; instead, the revisions of parent documents are used. Adding an attachment to a document is in fact an update operation on that document -- its current revision number must be provided to be checked and the operation results in a new revision.

Example: Announcements repository

Let's assume we are processing announcements from remote peers. The announcements are received in form of text messages which must be parsed, relevant parts stored in structured form and the original message must not be thrown away but stored along the data for further use, e.g. to display it to the user.

The following code is written in CoffeeScript and uses the nano library, a lightweight node.js wrapper around Couch's HTTP API. It shows a typical implementation of a repository class responsible for storing the JSON announcement together with a free-text message. This is added to the document under the name "message" and content type text/plain. The two methods for fetching the data change the semantic of CouchDB to return null instead of error if the data is not found.

nano = require('nano')("http://localhost:5984/")  
Buffer = require('buffer').Buffer  
uuid = require('node-uuid')

class AnnouncementsRepository  
    constructor: () ->
        @db = nano.use("announcements")

    store: (announcement, message, callback) ->
        if(not announcement._id?)
            announcement._id = uuid.v4()
        @db.insert announcement, (err, body) =>
            if(err?) then return callback(err)
            if(not message?)
                return callback(null, announcement._id)
            @db.attachment.insert body.id, "message", message, "text/plain; charset=utf-8", {rev: body.rev}, (err, body) ->
                if(err?) then return callback(err)
                callback(null, announcement._id)

    fetchAnnouncement: (id, callback) ->
        @db.get id, {include_doc: true}, (err, body) ->
            if(err?)
                if(err.status_code == 404)
                    return callback(null, null) #if not found, return null without error
                return callback(err)
            callback(null, body)

    fetchMessage: (id, callback) ->
        @db.attachment.get id, "message", (err, body) ->
            if(err?)
                if(err.status_code == 404)
                    return callback(null, null) #if not found, return null without error
                return callback(err)
            message = new Buffer(body).toString() #body is byte array
            callback(null, message)

module.exports = AnnouncementsRepository  

Notes

The code above uses an API for accessing attachments separately from the documents. Alternatively, it is possible to inline the attachments directly to the document, saving database roundtrips but being more CPU intensive because the content must be Base64-encoded.

The size of an attachment is not limited. CouchDB even supports HTTP range requests, allowing streaming of attachments data, resumable downloads, skippable audio/video streams etc.