Skip to main content
Version: Develop

Migrate a9s Elasticsearch To a9s Search 2

This document describes the process of migrating from a9s Elasticsearch to an a9s Search 2 service instance.

The main strategy to migrate the whole dataset from an a9s Elasticsearch service instance to an a9s Search 2 service instance is to use the Reindex Data Operation. The following migration paths have been tested and are supported by the a9s Data Services Framework:

  • Elasticsearch 6 -> a9s Search 2
  • Elasticsearch 7 -> a9s Search 2

Known Issues

  • Downtime during the migration: This will depend on the amount of data inside your a9s Elasticsearch service instance and the network connection throughput between your a9s Elasticsearch and a9s Search 2 service instances.

When To Migrate

It is highly recommended that you isolate the a9s Elasticsearch service instance before making the migration. This means that you should not write data to the a9s Elasticsearch service instance during the migration. This action prevents data inconsistency that might arise from the a9s Elasticsearch service instance receiving new data while the migration is taking place.

Due to this isolation, there will be a system downtime during the migration process; therefore we suggest planning your timeline in a way that you cover both minimizing the impact to your a9s Elasticsearch clients/users, and being able to respond to any unforeseen scenarios.

Migration Duration

While there is no comprehensive workbench for the duration of the migration, we do have rough estimates based on internal and isolated tests with compact and simple data sets. On average, migrating 1GB takes about 1-3 minutes. This estimation however, does not consider factors such as the environment, its state and particularities, the network's throughput, etc.

Thus, we encourage you to take the estimate above with reservations, and to consider your own environment's capabilities when planning your migration timetable, as it may take some time to successfully execute.

Prerequisites

Create New a9s Search 2 Service Instance

Create a new a9s Search 2 service instance; this instance will be the destination of the data migration from an existing a9s Elasticsearch service instance.

caution
  • Do not insert any data before the migration itself, and the verification of a successful migration is done.
  • Create a service instance with the same size (or greater) in regards to CPU, memory, and disk space.
  1. Create a a9s Search 2 service instance.

  2. Find The Service Instance Access Credentials. Ensure that you already have a9s service instance up and running.

    a. First, create a service key.

    cf create-service-key <cf-service-instance> <your-key>

    b. Retrieve the service key information. The host, password, port, and username are important.

    cf service-key <cf-service-instance> <your-key>

    Outcome Example:

    (...)
    "host": [
    "https://vbd907998.service.dc1.dsf2.a9s:9200"
    ],
    "hosts": [
    "vbd907998-os-0.node.dc1.dsf2.a9s"
    ],
    "password": "8e045ba28abe2bb1f840a9cc25ad3eab3d5d8416",
    "port": 9200,
    "scheme": "https",
    "username": "c2ee7c1d7a566bdf6e9f0b84c345dcdcb34b5455"
    (...)

Create a Reverse SSH Tunnel

The goal is to give the application developer access to the a9s Search service instance. To do this, it is necessary to create SSH tunnels to the a9s Search 2 service instance.

In the end, we want to achieve the scenario below.

                    (port 9200)                             (port 9200)
/-----> * a9s Search 2 * <--- Reindex --- * a9s Elasticsearch *
/
* CF Application * (via SSH Tunnel) [infrastructure network]
-------/----------------------------------------------------------------------
/ [Developer network]
* Developer Machine * (local port 9200)
note

Use 9200 as the local SSH tunnel port to access the a9s Search service instance.

Moreover, it is necessary to have access to the a9s Elasticsearch service instance in order to be able to perform the following tasks:

In the end, we also want to achieve the following scenario below.

                    (port 9200)
/--> * a9s Elasticsearch *
/
* CF Application * (via SSH Tunnel) [infrastructure network]
-------/----------------------------------------------------------------------
/ [Developer network]
* Developer Machine * (local port 9201)
note

Use 9201 for the local SSH tunnel port to access the a9s Elasticsearch service instance.

  1. Create the SSH Tunnels using the following information. a. Create the SSH Tunnel to access the a9s Search service instance. Use port 9200. b. Create the SSH Tunnel to access the a9s Elasticsearch service instance. Use port 9201.

It is possible to access any of the a9s Data Services locally. That means you can connect via a local client to the service instance for any purpose, such as debugging. Cloud Foundry (CF) provides a smart way to create SSH forward tunnels via a pushed application. For more information about this feature see the Accessing Apps with SSH section of the CF documentation.

First of all you must have an application bound to the service. For further details on how to do this see Bind an application to a Service Instance.

info

cf ssh support must be enabled in the platform. You can ask your platform operator for further details.

You can find the steps to create the reverse tunnel in the section Create a Tunnel to The Service. of our Using a9s Search document.

Guarantee the Existence of Backups Before The Migration (Optional)

Even though the Reindex operation, the approach we are using in our migration process, only performs read operations, and should not harm the source data, we recommend to err on the side of caution when making any operation with the data, and strongly advise to create backups of the last state of the data before trying it.

Isolate The Node Before The Migration

As mentioned before, it is recommended that you avoid client connections and stop writing data into the a9s Elasticsearch service instance during the migration.

You may achieve this by unbinding all applications from your a9s Elasticsearch service instance.

danger

Not isolating the node can cause data loss, as well as inconsistencies during the database verification.

Migration Steps

This section explains all of the migration steps, both the mandatory and optional ones. These steps include various examples with placeholders for the necessary values. For further details on these placeholder values please see the Placeholder Values section.

Before Performimg The Migration (Optional)

The migration is done using the Reindex operation and we need migrate one index each time. As a result we need to do the following steps for each index in the destination a9s Search service instance.

Before the migration, the Application Developers may take some actions based on the particularities of their deployment environment. These actions are related to migration decisions and performance matters.

Migrate Mapping And Settings (Optional)

caution

Reindex does not copy the settings from the source or its associated template. So any mapping, template, shard count, alias, replica, and so on must be configured ahead of time.

1 - Retrieve the desired information from the a9s Elasticsearch (migration source) and save this information.

  • Settings
# Elasticsearch Request
# GET <target_index>/_settings?pretty

curl -k -u <source-elasticsearch-service-instance-username-localhost>:<source-elasticsearch-service-instance-password-localhost> -XGET https://<source-elasticsearch-service-instance-host-localhost>:<source-elasticsearch-service-instance-port-localhost>/<target_index>/_settings?pretty

Documentation Reference: Indices - Get Settings.

  • Mapping
# Elasticsearch Request
# GET <target_index>/_mapping?pretty

curl -k -u <source-elasticsearch-service-instance-username-localhost>:<source-elasticsearch-service-instance-password-localhost> -XGET https://<source-elasticsearch-service-instance-host-localhost>:<source-elasticsearch-service-instance-port-localhost>/<target_index>/_mapping?pretty

Documentation Reference: Indices - Get Mapping.

  • Templates
# Elasticsearch Request
# GET _template?pretty
# GET _template/<template_name>?pretty

curl -k -u <source-elasticsearch-service-instance-username-localhost>:<source-elasticsearch-service-instance-password-localhost> -XGET https://<source-elasticsearch-service-instance-host-localhost>:<source-elasticsearch-service-instance-port-localhost>/_template?pretty

# and

curl -k -u <source-elasticsearch-service-instance-username-localhost>:<source-elasticsearch-service-instance-password-localhost> -XGET https://<source-elasticsearch-service-instance-host-localhost>:<source-elasticsearch-service-instance-port-localhost>/_template/<template_name>?pretty

Documentation Reference: Get Templates.

  • Alias
# Elasticsearch Request
# GET _alias?pretty

curl -k -u <source-elasticsearch-service-instance-username-localhost>:<source-elasticsearch-service-instance-password-localhost> -XGET https://<source-elasticsearch-service-instance-host-localhost>:<source-elasticsearch-service-instance-port-localhost>/_alias?pretty

Documentation Reference: Get Alias

2 - Create the index with the settings and mapping in the a9s Search (migration destination).

For migrating from Elasticsearch 6 only:
Since mapping types are not supported in OpenSearch (like `tweet` in the example below), they have to be removed completely. As a result, the "properties" under a mapping type will be placed directly within "mappings", in order to preserve their content.

Additionally, incompatible keys inside "settings" (such as provided_name) should be removed too. These keys can be identified by the error messages from the OpenSearch PUT request in the following step.

More information about breaking changes between Elasticsearch 6 & 7 can be found here: Documentation Reference: settings changes Documentation Reference: mappings changes Documentation Reference: removal of mapping types

Elasticsearch 6 data example:

{
"settings" : {
"index" : {
"creation_date" : "1691071082956",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "8ggyfJXjTGGnHKyFTKs6UA",
"version" : {
"created" : "6082399"
},
"provided_name" : "twitter"
}
},
"mappings" : {
"tweet" : {
"properties" : {
"message" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}

Modified data for a successful OpenSearch PUT request:

{
"settings" : {
"index" : {
"number_of_shards" : "5",
"number_of_replicas" : "1"
}
},
"mappings" : {
"properties" : {
"message" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
# Opensearch Request
# PUT <target_index>
#
# Data Example:
# {
# "settings": {
# "index": {
# "number_of_shards": 2,
# "number_of_replicas": 1
# }
# },
# "mappings": {
# "properties": {
# "age": {
# "type": "integer"
# }
# }
# }
# }

curl -k -u <destination-opensearch-service-instance-username>:<destination-opensearch-service-instance-password> -XPUT https://<destination-opensearch-service-instance-host>:<destination-opensearch-service-instance-port>/<target_index> -H 'Content-Type: application/json' -d '{
"settings": {
"index": {
"number_of_shards": 2,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"age": {
"type": "integer"
}
}
}
}'

Documentation Reference: Create index.

3 - Set templates

# Opensearch Request
# PUT _index_template/<template_name>
#
# Data Example:
# {
# "index_patterns": [
# "logs-2020-01-*"
# ],
# "template": {
# "aliases": {
# "my_logs": {}
# },
# "settings": {
# "number_of_shards": 2,
# "number_of_replicas": 1
# },
# "mappings": {
# "properties": {
# "timestamp": {
# "type": "date",
# "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
# },
# "value": {
# "type": "double"
# }
# }
# }
# }
# }

curl -k -u <destination-opensearch-service-instance-username>:<destination-opensearch-service-instance-password> -XPUT https://<destination-opensearch-service-instance-host>:<destination-opensearch-service-instance-port>/_index_template/<template_name> -H 'Content-Type: application/json' -d '{
"index_patterns": [
"logs-2020-01-*"
],
"template": {
"aliases": {
"my_logs": {}
},
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"value": {
"type": "double"
}
}
}
}
}'

Documentation Reference: Index Template.

Improve The Migration Performance (Optional)

caution

As reindexing can be an expensive operation, depending on the size of your source index, we recommend that you disable replicas in your destination index by setting number_of_replicas to 0, and to disable the refresh feature by setting refresh_interval to -1. Then re-enable them once the reindex process is complete.

The index.refresh_interval default value is "1s".

1 - Set the refresh_interval and number_of_replicas.

# Opensearch Request
# PUT /<target_index>/_settings

curl -k -u <destination-opensearch-service-instance-username>:<destination-opensearch-service-instance-password> -XPUT https://<destination-opensearch-service-instance-host>:<destination-opensearch-service-instance-port>/<target_index>/_settings -H 'Content-Type: application/json' -d '{
"index": {
"refresh_interval": -1,
"number_of_replicas": 0
}
}'

Documentation Reference: Update Setting.

2 - After the reindex process is complete, you can set your desired replica count and refresh interval settings.

Perform The Migration

  1. Use curl to perform the reindex request against the a9s Search service instance, to migrate the data from the a9s Elasticsearch service instance.
# Opensearch Request
# POST _reindex

curl -k -u <destination-opensearch-service-instance-username>:<destination-opensearch-service-instance-password> -XPOST https://<destination-opensearch-service-instance-host>:<destination-opensearch-service-instance-port>/_reindex?wait_for_completion=false -H 'Content-Type: application/json' -d '{
"source": {
"remote": {
"host": "https://<source-elasticsearch-service-instance-host>:<source-elasticsearch-service-instance-port>",
"username":"<source-elasticsearch-service-instance-username>",
"password":"<source-elasticsearch-service-instance-password>",
"socket_timeout": "<socket-timeout>"
},
"index": "<target_index>",
"size": <number-of-documents-to-reindex-by-batch>
},
"dest": {
"index": "<target_index>"
}
}'

Documentation Reference: Reindex.

Outcome Example:

{
"task" : "_hF1RPPpTyaqOPHtH0ZAig:5214"
}
note
  • We will make this request asynchronously by using the parameter wait_for_completion=false. This is necessary because this operation can take a long time depending on the amount of data.
  • The task value will be named <task-id> for future usage.
  1. Monitor the reindex process, to verify if it finished successfully.
# Opensearch Request
# GET _tasks/<task-id>?pretty

curl -k -u <destination-opensearch-service-instance-username>:<destination-opensearch-service-instance-password> -XGET https://<destination-opensearch-service-instance-host>:<destination-opensearch-service-instance-port>/_tasks/<task-id>?pretty

Documentation Reference: Tasks.

Outcome Example:

{
"completed" : true,
(...)
"action" : "indices:data/write/reindex",
(...)
"total" : 101,
"created" : 101,
"batches" : 21,
(...)
"failures" : []
"errors" : []
}
}

We are looking to verify that:

  • The completed field is true. Also, the errors and failures fields are empty.
  • The action field is write/reindex.
  • The total field has the total amount of documents existing in the source a9s Elasticsearch service instance.

Known Issues

The <number-of-documents-to-reindex-by-batch> must be chosen accordingly to your indices properties. If the size is bigger than the service instance capabilities support the operation will fail.

As an example, for the small service instance (2GB RAM and 4GB Disk), we would suggest:

  • When the average size of the documents is around 10MB use the size 5.
  • When the average size of the documents is around 1MB use the size 50.
  • When the average size of the documents is around 100kb use the size 500.
  • When the average size of the documents is less than 50kb use the size 1000 or greater.

Please be aware that if the value <number-of-documents-to-reindex-by-batch> is incorrectly set the following errors may happen:

"type" : "illegal_argument_exception",
"reason" : "Remote responded with a chunk that was too large. Use a smaller batch size.",

# or

"reason":"rejected execution of coordinating operation [coordinating_and_primary_bytes=0, replica_bytes=0, all_bytes=0, coordinating_operation_bytes=102278834, max_coordinating_and_primary_bytes=90282393]"}]
"type":"rejected_execution_exception"

# or

"reason":"Connection is closed"
"type":"connection_closed_exception"

If you encounter such an error, it is advisable to try to decrease <number-of-documents-to-reindex-by-batch>.

Also, if the <number-of-documents-to-reindex-by-batch> is too small the migration can take too long.

Check The Database Consistency

This section has some actions that can help to verify the data consistency after the migration. These steps include various examples with placeholders for the necessary values. For further details on these placeholder values please see the Placeholder Values section.

Check The Indices

  • a9s Elasticsearch (Source).

1 - Refresh the indices.

# Elasticsearch Request
# POST <target_index>/_refresh
# POST _refresh

curl -k -u <source-elasticsearch-service-instance-username-localhost>:<source-elasticsearch-service-instance-password-localhost> https://<source-elasticsearch-service-instance-host-localhost>:<source-elasticsearch-service-instance-port-localhost>/<target_index>/_refresh

Documentation Reference: Refresh.

2 - List all indices.

# Elasticsearch Request
# GET _cat/indices?v

curl -k -u <source-elasticsearch-service-instance-username-localhost>:<source-elasticsearch-service-instance-password-localhost> https://<source-elasticsearch-service-instance-host-localhost>:<source-elasticsearch-service-instance-port-localhost>/_cat/indices?v

Documentation Reference: Cat index.

Outcome Example:

health status index                uuid                   pri rep  docs.count docs.deleted store.size pri.store.size
green open index1 SGKtW0JgRimwh9yjhiU7Ig 1 2 101 0 671.6mb 671.6mb
green open index2 SGKtW0JgRimwh9yjhiU7Ig 1 2 50 0 232.6mb 232.6mb
  • a9s Search (Destination)

1 - Refresh the indices.

# Opensearch Request
# POST <target_index>/_refresh
# POST _refresh

curl -k -u <destination-opensearch-service-instance-username>:<destination-opensearch-service-instance-password> https://<destination-opensearch-service-instance-host>:<destination-opensearch-service-instance-port>/<target_index>/_refresh

Documentation Reference: Refresh.

2 - List all indices

# Opensearch Request
# GET _cat/indices?v

curl -k -u <destination-opensearch-service-instance-username>:<destination-opensearch-service-instance-password> https://<destination-opensearch-service-instance-host>:<destination-opensearch-service-instance-port>/_cat/indices?v

Documentation Reference: Cat index.

Outcome Example:

health status index                uuid                   pri rep  docs.count docs.deleted store.size pri.store.size
green open index1 SGKtW0JgRimwh9yjhiU7Ig 1 2 101 0 692.6mb 692.6mb
green open index2 SGKtW0JgRimwh9yjhiU7Ig 1 2 50 0 239.6mb 239.6mb
green open .opendistro_security RLj5S2PiSFG8eQ45T7El6A 1 0 10 0 59.7kb 59.7kb
green open .tasks 2My3SpdoRVKaPHR89DHDtQ 1 0 2 0 14.9kb 14.9kb
note

Note that the indices starting with ., such as .opendistro_security and .tasks, are reserved for the Opensearch environment. We must not consider it during the checking.

Checking:

  • Verify if both service instances contain the same indices created by the application developer(s).
  • Verify if each index has the same amount of docs.count. It must have the same value since there should be no write operation during the migration.
  • Verify some samples of documents on each index if they are equivalent in both service instances. Notice that this comparison depends on the database particularities, therefore Application Developer must decide how to do it. For example, documents with an specific timestamp can be compared between indices.
caution

It is known that the OpenSearch indices can be larger in disk space than the corresponding Elasticsearch indices. Please consider this when creating the a9s Search 2 service instance intended to be the destination of the data migration. To have OpenSearch using less disk space, change the index codec setting to best_compression, at the expense of slower stored fields performance. For more information see: Index Settings

Check The Settings and Mapping (Optional)

Follow the same steps and Point 1 at Migrate Mapping and Setting for a9s Elasticsearch and a9s Search and compare it.

info

It's advisable to do whether you have done the step of the Before Performing The Migration section and do care about the setting and mapping.

Glossary

Placeholder Values

PlaceholderDescriptionExampleRequired?
<destination-opensearch-service-instance-host>The domain of the destination service instancelocalhost or 127.0.0.1 once the developer will use the Reverse SSH TunnelYes
<destination-opensearch-service-instance-port>The port of the destination service instance9200 once the developer will use the Reverse SSH TunnelYes
<destination-opensearch-service-instance-password>The password of the destination service instance---Yes
<destination-opensearch-service-instance-username>The username of the destination service instance---Yes
<source-elasticsearch-service-instance-host>The domain of the source service instanceService keys hostYes
<source-elasticsearch-service-instance-port>The port of the service instance that contains data to be migrated9200Yes
<source-elasticsearch-service-instance-password>The password of the source service instance---Yes
<source-elasticsearch-service-instance-username>The username of the source service instance---Yes
<number-of-documents-to-reindex-by-batch>The number of documents to reindex. The default number is 1000100No
<socket-timeout>The wait time for socket reads. The Default number is 30s"60m" safe number.No
<target_index>The index"my_index_from_source"Yes
<template_name>Template name---Yes
<source-elasticsearch-service-instance-host-localhost>The domain of the source service instancelocalhost or 127.0.0.1 once the developer will use the Reverse SSH TunnelYes
<source-elasticsearch-service-instance-port-localhost>The port of the service instance that contains data to be migrated9201 once the developer will use the Reverse SSH TunnelYes
<source-elasticsearch-service-instance-password-localhost>The password of the source service instance---Yes
<source-elasticsearch-service-instance-username-localhost>The username of the source service instance---Yes