Ahoy deep dive part 2: Database Indexing Strategies

Facing performance bottlenecks with Ahoy in your Rails application? Keep reading for industry vetted optimization tips and tricks.
Alex McDermid
Alex McDermid
September 29, 2023

Once you've streamlined the application level, the next stop is the database. In a tracking-intensive application, your database can quickly become a bottleneck. This section focuses on strategies specific to optimizing Ahoy events and visits, helping your database keep pace with your growing needs.

Sharding

What: Breaking up large database tables into smaller, more manageable pieces.
Why: As your Ahoy events and visits tables grow, queries can become slower. Sharding can help by dividing the large table into smaller, faster, more easily managed tables ('shards').
How: This often involves partitioning tables based on certain criteria like user ID or time ranges.

Separate Databases (Highly Recommended)

What: Moving Ahoy-related tables into a separate database instance.
Why: This isolates Ahoy's load from your main application database, preventing any performance degradation in your primary database.
How: Utilize Rails' multiple databases feature or third-party solutions to route Ahoy queries to a different database.

Separate Databases Explained (Ruby on Rails 6+)

When your application grows, maintaining database performance becomes increasingly critical. With tools like Ahoy that gather valuable analytics, the database can quickly become bloated, affecting the entire application's performance. Rails 6+ offers a built-in feature for managing multiple databases, making it easier to separate concerns and enhance performance. Here's how to set it up:

Update Database Configuration: Open your config/database.yml file and configure a separate database for Ahoy:

development:
  primary:
    database: my_primary_database
    user: my_database_user
    password: password
    # ... other configurations
  ahoy:
    database: my_ahoy_database
    user: my_ahoy_database_user
    password: ahoy_password
    # ... other configurations

Run Migrations for Ahoy Database: Rails 6+ allows you to run migrations for a specific database. Use the following command:

rails db:migrate:up DATABASE=ahoy

Update Ahoy Models: In your Ahoy models (Ahoy::Event and Ahoy::Visit), you'll need to specify which database they should connect to. Update the models like so:

class Ahoy::Event < ApplicationRecord
  connects_to database: { writing: :ahoy, reading: :ahoy }
  # ... remaining code
end

class Ahoy::Visit < ApplicationRecord
  connects_to database: { writing: :ahoy, reading: :ahoy }
  # ... remaining code
end

By implementing these steps, you successfully isolate Ahoy data from your primary application data. This ensures that any heavy read or write operations related to analytics won't interfere with your main application's performance.

Data Warehousing

What: Transferring older records to a separate system optimized for read-heavy operations.
Why: This keeps the main database focused on current data, making it faster and more efficient.
How: Tools like Amazon Redshift or Google BigQuery can be used for this purpose.

Archiving and TTL (Time-To-Live)

What: Implement an automatic archival strategy for older records or remove them after a certain period.
Why: Maintains a leaner database by only keeping relevant data.
How: Use database triggers or scheduled tasks to move or delete old records based on certain conditions.

Database Indexing

What: Ensuring proper indexes on database tables to improve query performance.
Why: While indexes can dramatically speed up data retrieval times, they can also slow down write operations. It's crucial to strike a balance.
How: Evaluate existing indexes and use query analysis tools to identify potential new indexes.

Database Indexing Explained

For example, in our database Ahoy Event table we've added many indexes for various reasons. Below is an example of what our Ahoy Event schema looks like and a high level explanation of why we added certain indexes.

  create_table "ahoy_events", force: :cascade do |t|
    t.bigint "visit_id"
    t.bigint "user_id"
    t.string "name"
    t.jsonb "properties"
    t.datetime "time", precision: nil
    t.index ["name", "time"], name: "index_ahoy_events_on_name_and_time"
    t.index ["properties"], name: "index_ahoy_events_on_properties", opclass: :jsonb_path_ops, using: :gin
    t.index ["time"], name: "index_ahoy_events_on_time"
    t.index ["user_id"], name: "index_ahoy_events_on_user_id"
    t.index ["visit_id"], name: "index_ahoy_events_on_visit_id"
  end

name, time: For quicker searches and filters based on event names and their occurrence time.

Technical Insight: This is a composite index, meaning it's built on multiple columns. It works well when you have queries that filter by the event name and also have a time-based condition. For example, queries like SELECT * FROM ahoy_events WHERE name = 'Clicked Button' AND time BETWEEN '2021-01-01' AND '2021-12-31' would benefit immensely from this composite index

properties: We used a GIN index for faster queries on our JSONB columns.

Technical Insight: A GIN (Generalized Inverted Index) is particularly useful for indexing array-based values and full-text search. For JSONB columns, it's almost a go-to choice when the data stored in the properties field is unpredictable and can have nested objects or arrays. This index will enable us to efficiently query nested properties within the JSONB column, something which could be extremely slow otherwise.

time: To enable efficient time-based queries.

Technical Insight: A single-column index on the time field optimizes queries that have WHERE, ORDER BY, or JOIN clauses that make use of this field. For example, you might have queries that aggregate events on a time basis, such as SELECT COUNT(*) FROM ahoy_events WHERE time > '2021-01-01'. The index can quickly seek the relevant records without scanning the entire table, making the query more efficient.

user_id, visit_id: These indexes improve the performance of JOIN operations.

Technical Insight: These are what we call Foreign Key indexes. They are crucial when performing JOIN operations with other tables. For example, if you have a users table and you often perform queries that join ahoy_events and users on the user_id column, then an index on user_id will greatly speed up these JOIN queries. Similarly, the index on visit_id will optimize JOIN operations involving the ahoy_visits table.

Our Ahoy Visits table also has some indexes, I'll give a brief overview of the reasoning here:

create_table "ahoy_visits", force: :cascade do |t|
    t.string "visit_token"
    t.string "visitor_token"
    t.bigint "user_id"
    t.text "user_agent"
    t.text "referrer"
    t.string "referring_domain"
    t.text "landing_page"
    t.string "browser"
    t.string "os"
    t.string "device_type"
    t.string "country"
    t.string "region"
    t.string "city"
    t.string "utm_source"
    t.string "utm_medium"
    t.string "utm_term"
    t.string "utm_content"
    t.string "utm_campaign"
    t.string "app_version"
    t.string "os_version"
    t.string "platform"
    t.datetime "started_at", precision: nil
    t.string "ip", default: "127.0.0.1"
    t.float "latitude"
    t.float "longitude"
    t.index ["user_id"], name: "index_ahoy_visits_on_user_id"
    t.index ["visit_token"], name: "index_ahoy_visits_on_visit_token", unique: true
  end

user_id: Index on user identifier.

Technical Insight: Just like in the ahoy_events table, the user_id index is crucial for JOIN operations with a table that contains user information. This can be important for analytics queries that need to relate visit data to specific users, for example, to know how often a particular user visits the platform.

visit_token: Unique Index on the visit token.

Technical Insight: This is a unique index, which serves two purposes. First, it ensures data integrity by making sure that each visit_token is unique, meaning that you can't have two rows with the same visit token. Secondly, this index makes lookup queries based on visit_token extremely fast. This can be particularly useful when linking an event back to a visit, ensuring you're retrieving or associating the correct visit record.

Evaluating existing indexes and using query analysis tools to identify potential new indexes is an ongoing engineering task. As well as revisiting indexing strategies periodically to ensure they align with evolving data patterns and query needs.

Caching

What: Using caching to reduce the number of database reads.
Why: This offloads some of the stress on the database, especially for frequently-read data.
How: Utilize caching mechanisms like Redis or Memcached for this purpose.

Optimizing at the Infrastructure Level

The final frontier of performance optimization lies in the underlying infrastructure that hosts your application. There are several strategies to dynamically adapt to varying loads, distribute traffic, and handle data replication, ensuring that your Ahoy tracking capabilities are robust and scalable. Some of these strategies include auto-scaling, load balancing, and replication.

Setting the Stage for Optimized Performance

You've just explored the intricate landscape of optimizing Ahoy Events and Visits within a Rails application. While we haven't delved into the step-by-step setup of all these strategies, we've armed you with the knowledge and high-level insights you need to begin thinking critically about your application's performance.

Optimizing your application is not a one-off task but an ongoing process. The strategies discussed—from application-level tactics like rate limiting and batching to database and infrastructure-level approaches like sharding and load balancing—are conceptual tools to add to your optimization toolkit.

As you dive into these optimization strategies, you might find yourself curious about how to measure their impact and align them with your broader objectives. If so, don't miss our related article, Achieving Your Goals and Finding New Opportunities Using KPIs and Stats, which will give you actionable insights into tying your optimization efforts back to key performance indicators and objectives.

Keep in mind that while these best practices are widely applicable, every application has its own set of unique challenges and requirements. The real test comes in adapting and fine-tuning these approaches to meet the specific needs of your application.

So what's next? Armed with this newfound knowledge, you're well-prepared to delve deeper, whether that's into your own application's code, or into further resources to make these high-level concepts a reality.

Interested in joining a mission-driven team with a passion for Ruby on Rails? If so, please take a moment to look at our open positions!.

About the author

Alex McDermid

Alex is a Full Stack Developer based out of Surrey, BC.