Datasets - Custom Indexes for Custom Datasets

Introduction

Creating custom indexes for custom datasets in Corva’s Dev Center is a feature that can improve the performance of frequent queries on large databases. This document describes what custom indexes are, when custom indexes should be created, how to create a custom index and best practices on how to use a custom index in an API request.

What is a custom index?

Indexing is the process of associating a key with the location of a corresponding data record. Indexes support the efficient execution of queries. Without them, the database must scan every document in a dataset to select those that match the query statement. If an appropriate index exists for a query, the database can use the index to limit the number of documents it must inspect.

Custom indexes can be a single index or a compound index. Compound indexes are indexes composed of several different fields. For example, instead of having one index on “bit_size” and another on “bit_name”, it is typically most efficient to create an index that includes both “bit_size” and “bit_name” if you query against both of the names. A compound index can still be used to filter queries that specify the bit_name only.

Custom datasets that are created in the Dev Center come with default indexes. Here are examples of the three different types of custom datasets and the associated default indexes for time, depth and reference.

Time based datasets have default indexes of asset_id and timestamp. See image.

This is a time based dataset

Depth based datasets have default indexes of asset_id and log_identifier and measured_depth. See image.

This is a depth based dataset

Reference datasets have a default index of company_id. See image.

This is a reference based dataset.

When should I create a custom index?

You may consider creating a custom index when your API request queries are slow.

Here are some common reasons why a query could be slow:

  • The query is unsupported by your current indexes

  • Some documents in your collection have large array fields that are costly to search and index

  • One query retrieves information from multiple collections with $lookup

How do I create a custom index?

  1. Within Dataset Explorer, select the desired dataset from the left hand navigation list.

This is an image of dataset indexes in the Dataset Explorer.

  1. Click on the + NEW INDEX

This is an image of dataset indexes in the Dataset Explorer.

  1. Input the name of the custom index. In this example, I am using: asset_id: 1, data.wob_plus_hookload: 1, timestamp: 1.

This is an image of the dataset index creation wizard in the Dataset Explorer.

  1. Click on Save to save the newly created indexes.

This is an image of newly created custom dataset indexes in the Dataset Explorer.

How do I use a custom index in an API request?

Order matters when using a custom index within an API request query. When using the custom indexes in your API request query, be sure to use the exact same order as the order of the custom index was created.

For example, if the custom index that you created is asset_id: 1, data.wob_plus_hookload: 1, timestamp: 1, then your API request query will look like the following:

from corva import Api, Cache, ScheduledNaturalTimeEvent, scheduled


@scheduled  
def scheduled_app(event: ScheduledNaturalTimeEvent, api: Api, cache: Cache):
   
    asset_id = event.asset_id
    start_time = event.start_time
    end_time = event.end_time
   
    api.get_dataset(
        provider='corva',
        dataset='test-time-dataset',
        query={
            'asset_id': asset_id,
            'data.wob_plus_hookload': {
                '$gt': 1000
            },
            'timestamp': {
                '$gte': start_time,
                '$lte': end_time,
            }
        },
        sort={'timestamp': 1},
        limit=1000,
        fields='data.wob_plus_hookload'
    )

This is a Python example.

What does a bad custom index look like?

You can definitely make things worse by creating a bad index and conducting a bad query. Here is an example of a good index and query and a bad index and query.

Indexes

Good:
asset_id: 1, timestamp: 1

Bad:
timestamp: 1, asset_id: 1 (order of the fields matters, keep it consistent) 

-----

Queries

Good:
asset_id: 123456, timestamp: { $gt: 1621261407 }

Bad:
timestamp: { $gt: 1621261407 }, asset_id: 123456 (order of the query fields should match the order of the index fields)


This is an example of an index and query.

For more information on this topic please reference Performance Best Practices: Indexing and Monitor and Improve Slow Queries.