This is a repost. You can find the original here
Sometimes, aggregating data can become overly complex in a normal ActiveRecord model. Because Rails works well with SQL views, we can create associations to SQL views that aggregate data for us, simplifying our models and potentially speeding up queries.
Modeling an inbox Link to heading
I’ve got an inbox. A cat inbox. For real.
There are many possible implementations for modeling an inbox. I’ve gone with a relatively simple approach. Two users participate in a conversation, sending messages back and forth to each other. The Conversation model has a subject, but the body of the initial message is part of the Message object.
# app/models/conversation.rb
class Conversation < ActiveRecord::Base
# fields: to_id, from_id, subject
belongs_to :to, class_name: "User"
belongs_to :from, class_name: "User"
has_many :messages, dependent: :destroy, inverse_of: :conversation
end
# app/models/message.rb
class Message < ActiveRecord::Base
# fields: user_id, conversation_id, body
belongs_to :conversation, inverse_of: :messages
belongs_to :user
end
After the initial message, the two participants on the conversation send messages back and forth. A user may have any number of conversations with other users. As such, the main inbox view must list the conversations a user is a participant on, as well as some summary information about that conversation.
For our purposes, we’ve decided on an HTML table view with the following columns:
- From - Who the original message was sent from
- To - The original recipient of the message
- Message - the Subject of the conversation, as well as the first line of the most recent message
- Last post - The date/time of the most recent message
- Replies - The number of replies on the conversation (excluding the first message)
Although the subject is part of the conversation itself, everything else comes from its various associations. This is the view, which reveals the expected interface each conversation object should have:
%table#inbox
%thead
%tr
%th From
%th To
%th Message
%th Last post
%th Replies
%tbody
- conversations.each do |conversation|
%tr
%td= conversation.from_name
%td= conversation.to_name
%td
%p
%strong= conversation.subject
= conversation.most_recent_message_body
%td
= time_ago_in_words(conversation.most_recent_message_sent_at)
ago
%td= conversation.reply_count
Let’s explore a typical way to model this in our model directly.
A typical Ruby implementation Link to heading
# app/models/conversation.rb
class Converation < ActiveRecord::Base
# associations, etc...
def most_recent_message_body
most_recent_message.body if most_recent_message
end
def most_recent_message_sent_at
most_recent_message.created_at if most_recent_message
end
def reply_count
messages.size - 1
end
def to_name
to.name
end
def from_name
from.name
end
private
def most_recent_message
@most_recent_message ||= messages.by_date.first
end
end
# app/models/message.rb
class Message < ActiveRecord::Base
# associations, etc...
def self.by_date
order("created_at DESC")
end
end
This approach is fairly straightforward. We obtain the
most_recent_message_body
and most_recent_message_sent_at
from the
most recent message, which is trivial after we’ve ordered the messages
association by date. The to_name
and from_name
methods are delegated
to their respective associations. And reply_count
is simple the total
number of messages, minus one (the initial message doesn’t count as a
“reply”).
This approach offers a number of advantages. For one, it is familiar.
I believe most Rails developers would be able to understand exactly
what’s going on above. It also locates all of the domain logic within
the Conversation
model, making it easy to find.
Having everything in the Conversation
model is actually a blessing
and a curse. Although everything is easy to find, the model is also
quickly becoming bloated. It may not seem like much right now, but as
more information is added to the inbox, it will become unruly.
The other problem with the above is the multitude of N+1 queries that it has introduced. With only 3 conversations in play, loading the inbox outputs a log like this:
Started GET "/" for 127.0.0.1 at 2013-02-11 09:49:02 -0600
Connecting to database specified by database.yml
Processing by InboxesController#show as HTML
User Load (12.8ms) SELECT "users".* FROM "users" LIMIT 1
Conversation Load (0.6ms) SELECT "conversations".* FROM "conversations" WHERE (1 IN (from_id, to_id))
User Load (18.3ms) SELECT "users".* FROM "users" WHERE "users"."id" = 2 LIMIT 1
User Load (0.5ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1 LIMIT 1
Message Load (12.4ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 7 ORDER BY created_at DESC LIMIT 1
(0.6ms) SELECT COUNT(*) FROM "messages" WHERE "messages"."conversation_id" = 7
User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."id" = 3 LIMIT 1
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1 LIMIT 1
Message Load (0.4ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 8 ORDER BY created_at DESC LIMIT 1
(0.4ms) SELECT COUNT(*) FROM "messages" WHERE "messages"."conversation_id" = 8
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1 LIMIT 1
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 2 LIMIT 1
Message Load (0.5ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 9 ORDER BY created_at DESC LIMIT 1
(0.4ms) SELECT COUNT(*) FROM "messages" WHERE "messages"."conversation_id" = 9
Rendered inboxes/show.html.haml within layouts/application (683.9ms)
Completed 200 OK in 691ms (Views: 272.5ms | ActiveRecord: 418.1ms)
Eager-loading Link to heading
We can definitely cut down on the N+1 query problem by introducing eager
loading. In our controller, the conversations
exposure is currently
defined thusly:
# app/controllers/inboxes_controller.rb
class InboxesController < ApplicationController
expose(:user) { User.first }
expose(:conversations) { user.conversations }
end
Let’s change that to eagerly load its associations:
expose(:conversations) { user.conversations.includes(:messages, :to, :from) }
With eager-loading in place, the log now looks slightly more reasonable:
Started GET "/" for 127.0.0.1 at 2013-02-11 09:55:24 -0600
Processing by InboxesController#show as HTML
User Load (0.3ms) SELECT "users".* FROM "users" LIMIT 1
Conversation Load (0.3ms) SELECT "conversations".* FROM "conversations" WHERE (1 IN (from_id, to_id))
Message Load (0.3ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" IN (7, 8, 9)
User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" IN (1, 2)
User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" IN (2, 3, 1)
Message Load (0.4ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 7 ORDER BY created_at DESC LIMIT 1
Message Load (0.4ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 8 ORDER BY created_at DESC LIMIT 1
Message Load (0.3ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 9 ORDER BY created_at DESC LIMIT 1
Rendered inboxes/show.html.haml within layouts/application (9.5ms)
Completed 200 OK in 13ms (Views: 10.0ms | ActiveRecord: 2.4ms)
There are more optimizations we could make here in Ruby land. But data transformation and aggregation is something that databases are good at. We can use a native feature of SQL to aggregate information for us: views.
SQL views Link to heading
A SQL view is essentially a virtual table. It can be queried just like a normal table, but does not physically store anything itself. Instead, a view has a query definition that it uses to represent its data.
In our case, SQL views can allow us to treat a complex SQL query as a table, abstracting away the complexity into view itself. SQL views are also read-only, and therefore are usually only used for querying, but not updating data directly.
ActiveRecord plays nicely with SQL views out of the box. It considers a SQL view a normal table, and all associations and querying methods work like they would with a normal table, with one exception: the records are read-only.
Add a migration for the view Link to heading
Let’s create a view to handle the to_name
and from_name
methods on
conversation. We can do this in a normal migration, but it needs to be
created with raw SQL:
class CreateConversationSummaries < ActiveRecord::Migration
def up
execute <<-SQL
CREATE VIEW conversation_summaries AS
SELECT ...
SQL
end
def down
execute 'DROP VIEW conversation_summaries'
end
end
This is the basic syntax for adding a view with ActiveRecord migrations.
Our view needs to incorporate to_name
and from_name
, so let’s add
those fields:
CREATE VIEW conversation_summaries AS
SELECT c.id,
f.name as from_name,
t.name as to_name
FROM conversations c
inner join users t on t.id = c.to_id
inner join users f on f.id = c.from_id
After we migrate our database, we can use our database console to verify that we see what we expect:
mailbox_development=# select * from conversation_summaries;
id | from_name | to_name
----+-----------------+-----------------
7 | Felionel Richie | Cat Stevens
8 | Nelly Purrtado | Cat Stevens
9 | Cat Stevens | Felionel Richie
(3 rows)
Cool. The id
corresponds to the conversation, and to_name
and
from_name
columns come from the users table, but it’s all displayed to
us as one table.
ActiveRecord associations for views Link to heading
Now that our view exists, we can integrate it into our application:
class Conversation < ActiveRecord::Base
class Summary < ActiveRecord::Base
self.table_name = "conversation_summaries"
self.primary_key = "id"
belongs_to :conversation, foreign_key: "id"
end
has_one :summary, foreign_key: "id"
end
Let’s break down what’s going on here.
I’ve chosen to nest the Summary model within the Conversation namespace, mostly to call out the fact that we’re doing something non-standard. Also, the Summary class only makes sense in the context of a Conversation. For that reason, we need to manually set the name of the table.
We must also choose a primary key, because Rails cannot infer it for SQL
views. The association itself should be familiar. It works like a normal
has_one
/belongs_to
relationship, except that we override the foreign
key.
Now that the relationships are set up, let’s actually take advantage
of our new view by changing the implementation of the to_name
and
from_name
methods.
class Conversation < ActiveRecord::Base
# ...
def to_name
# Used to be to.name
summary.to_name
end
def from_name
# Used to be from.name
summary.from_name
end
end
One the biggest benefits about this approach is that we can eager-load
a view assocation. We no longer need the to
or from
associations
eager-loaded, since we are no longer using any attributes from them in
the view. Let’s update our controller’s exposure to only eager-load the
necessary parts:
expose(:conversations) { user.conversations.includes(:summary, :messages) }
And when we visit the inbox again, the log looks like this:
Started GET "/" for 127.0.0.1 at 2013-02-11 14:26:12 -0600
Processing by InboxesController#show as HTML
User Load (0.5ms) SELECT "users".* FROM "users" LIMIT 1
Conversation Load (0.4ms) SELECT "conversations".* FROM "conversations" WHERE (1 IN (from_id, to_id))
Conversation::Summary Load (0.6ms) SELECT "conversation_summaries".* FROM "conversation_summaries" WHERE "conversation_summaries"."id" IN (7, 8, 9)
Message Load (0.3ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" IN (7, 8, 9)
Message Load (0.4ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 7 ORDER BY created_at DESC LIMIT 1
Message Load (0.4ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 8 ORDER BY created_at DESC LIMIT 1
Message Load (0.4ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 9 ORDER BY created_at DESC LIMIT 1
Rendered inboxes/show.html.haml within layouts/application (10.4ms)
Completed 200 OK in 13ms (Views: 9.5ms | ActiveRecord: 2.9ms)
That’s definitely an improvement, albeit a small one. We’ve pushed data from the user model into our SQL view, but we don’t need to stop there.
Push it down! Link to heading
Let’s update our view migration to include more aggregated information about each conversation.
class CreateConversationSummaries < ActiveRecord::Migration
def up
execute <<-SQL
CREATE VIEW conversation_summaries AS
SELECT c.id,
f.name as from_name,
t.name as to_name,
m.body as most_recent_message_body,
m.created_at as most_recent_message_sent_at,
(select count(*) from messages m2 where m2.conversation_id = c.id) - 1 as reply_count
FROM conversations c
inner join users t on t.id = c.to_id
inner join users f on f.id = c.from_id
left outer join (
select distinct on(conversation_id) conversation_id, body, created_at
from messages m1
order by conversation_id, created_at desc
) m ON m.conversation_id = c.id
SQL
end
def down
execute 'DROP VIEW conversation_summaries'
end
end
After running rake db:migrate:redo
, we can verify that everything is
still working as expect in the database console:
mailbox_development=# select * from conversation_summaries;
id | from_name | to_name | most_recent_message_body | most_recent_message_sent_at | reply_count
----+-----------------+-----------------+----------------------------------------+-----------------------------+-------------
7 | Felionel Richie | Cat Stevens | Say you. Say meow. | 2013-02-08 02:45:27.07712 | 2
8 | Nelly Purrtado | Cat Stevens | Except that I'm a cat | 2013-02-05 16:45:27.088292 | 0
9 | Cat Stevens | Felionel Richie | I'm sorry that you're feeling that way | 2013-01-30 16:45:27.092443 | 1
(3 rows)
That’s a lot of SQL! But actually, all I’ve added are one join to a subquery, and a subselect. Let’s review both of these changes.
There are many ways to grab the most recent message for a conversation in SQL, including using window functions. The method I’ve opted for here is a subquery in the table expression. The subquery alone would return rows for only the most recent messages for each conversation:
conversation_id | body | created_at
----------------+----------------------------------------+----------------------------
7 | Say you. Say meow. | 2013-02-08 02:45:27.07712
8 | Except that I'm a cat | 2013-02-05 16:45:27.088292
9 | I'm sorry that you're feeling that way | 2013-01-30 16:45:27.092443
By joining with only the most recent message per conversation, we avoid
duplicate rows and only get the body
and created_at
columns from
the most recent message. Then, joining against this subquery, we can
add the body
and created_at
to the list of projections, naming
them most_recent_message_body
and most_recent_message_sent_at
,
respectively.
The other thing we’ve added to the view this iteration is the
reply_count
column, which is a subselect to get the count. We also
subtract 1, just as before.
Let’s take a look at our Conversation model now:
# before
class Conversation < ActiveRecord::Base
belongs_to :to, class_name: "User"
belongs_to :from, class_name: "User"
has_many :messages, dependent: :destroy, inverse_of: :conversation
def most_recent_message_body
most_recent_message.body if most_recent_message
end
def most_recent_message_sent_at
most_recent_message.created_at if most_recent_message
end
def reply_count
[messages.size - 1, 0].max
end
def to_name
to.name
end
def from_name
from.name
end
private
def most_recent_message
@most_recent_message ||= messages.by_date.first
end
end
# after
class Conversation < ActiveRecord::Base
class Summary < ActiveRecord::Base
self.table_name = "conversation_summaries"
self.primary_key = "id"
belongs_to :conversation, foreign_key: "id"
end
belongs_to :to, class_name: "User"
belongs_to :from, class_name: "User"
has_many :messages, dependent: :destroy, inverse_of: :conversation
has_one :summary, foreign_key: "id"
delegate :most_recent_message_sent_at, :most_recent_message_body,
:reply_count, :to_name, :from_name, to: :summary
end
With much of our data transformation and aggregation in our SQL view, our model has become trivially simple. It literally only contains assocations and delegation now. We update our exposure to only eager-load the conversation summary:
expose(:conversations) { user.conversations.includes(:summary) }
Now, reloading the page yields the following log output:
Started GET "/" for 127.0.0.1 at 2013-02-11 15:37:49 -0600
Processing by InboxesController#show as HTML
User Load (1.0ms) SELECT "users".* FROM "users" LIMIT 1
Conversation Load (0.2ms) SELECT "conversations".* FROM "conversations" WHERE (1 IN (from_id, to_id))
Conversation::Summary Load (0.8ms) SELECT "conversation_summaries".* FROM "conversation_summaries" WHERE "conversation_summaries"."id" IN (7, 8, 9)
Rendered inboxes/show.html.haml within layouts/application (5.5ms)
Completed 200 OK in 8ms (Views: 6.0ms | ActiveRecord: 2.0ms)
Now we see some real improvement. All N+1 queries are gone, replaced instead with the eager-loading of the the Conversation::Summary model.
Real World Benefits Link to heading
I used this technique in a real-world application. It helped abstract some of the mundane details of the inbox and allowed us to think about each conversation at a higher level with a summary.
In fact the app included even more business rules than I’ve included here. Each conversation had to include a read/unread status that updated with each sent message. Although it was easily implemented in pure Ruby, it cluttered the model and created yet more N+1 queries in the app view.
The inbox also had to be sorted by the most recent message date,
so that the conversation with the most recent activity would
appear first in the list. This kind of sorting without SQL is both
cumbersome and inefficient in Ruby; you have to load all messages
for each conversation. With the SQL view, it was as simple as
changing the scope from user.conversations.include(:summary)
to
user.conversations.include(:summary).order("conversation_summaries.most _recent_message_sent_at DESC")
.
Conclusion Link to heading
Any time that we push stuff into the database, we make a tradeoff. In this case, when we move data transformation into the SQL view, we sacrifice the co-location of the conversation model and the definition of its summary. With the summary definition located in the database, there’s one extra layer of indirection.
The other tradeoff is that any time we’d like to make a non-trivial change the view, we actually have to create an entirely new view, replacing the old one. If for example, we knew that our inbox was likely to change or add fields, the SQL view approach might be too brittle.
On the other hand, we effectively removed N+1 queries from our application and simplified our model considerably. By abstracting the conversation’s summary into a model backed by a SQL view, we’re able to think of the Summary as an object in its own right. This provides a cognitive simplification, but also yields performance gains as the dataset grows.
It may not be right for every situation, but knowing and understanding how we can use SQL views in our Rails applications adds another tool to our toolbelt.
Example app Link to heading
As before, while writing this post, I created a sample Rails app to iterate quickly. I used TDD to write the pure-ruby approach, and reused the specs while I “refactored” the implementation to the subsequent approaches. Of particular note is the history of the Conversation model(https://github.com/jgdavey/tree-sql-example/commits/master/app/mo dels/category.rb), which mirrors the code above.