# Bulk imports with Datomic

This is a repost. You can find the original here

I’ve been really happy with Datomic, but doing an initial bulk import wasn’t as familiar as SQL dump/restore. Here are some things that I’ve learned from doing several imports.

The Datomic transactor handles concurrency by transacting datoms serially, but that doesn’t mean it isn’t fast! In my experience, the bottleneck is actually in the reshaping of data and formatting transactions. I use core.async to parallelize just about everything in the import pipeline.

One example of how I’ve leveraged core.async for import jobs can be found in my Kevin Bacon project repository.

I use DynamoDB as my storage backend in production. I used to try to run my import tasks directly to the production transactor/storage. Lately, though, I’ve found it really helpful to run my import tasks to a locally-running transactor and the dev storage backend.

Running an import locally means I don’t have to worry about networking, which speeds the whole process up quite a bit; also, it give me a much more freedom to iterate on the database design itself. (I rarely get an import correct the first time.) And in the case of DynamoDB, I save some money, as I don’t have to have my “write throughput” cranked way up for as long.

Bulk imports create some garbage, so manually reindexing before backing up is advantageous. Here’s what a REPL session looks like:

(def conn (d/connect "datomic:dev://localhost:4334/database-name"))
(d/request-index conn)
(->> conn d/db d/basis-t (d/sync-index conn) deref)
;; blocks until done indexing
(d/gc-storage conn (java.util.Date.))


For more information on why this cleanup is important, see the relevant Datomic documentation.

1. Run the datomic backup-db command against the local import.
3. Run the datomic restore-db command from the backup folder to the remote database.