Migrate a9s Elasticsearch To a9s Search 2
This document describes the process of migrating from a9s Elasticsearch to an a9s Search 2 service instance.
The main strategy to migrate the whole dataset from an a9s Elasticsearch service instance to an a9s Search 2 service instance is to use the Reindex Data Operation. The following migration paths have been tested and are supported by the a9s Data Services Framework:
- Elasticsearch 6 -> a9s Search 2
- Elasticsearch 7 -> a9s Search 2
Known Issues
- Downtime during the migration: This will depend on the amount of data inside your a9s Elasticsearch service instance and the network connection throughput between your a9s Elasticsearch and a9s Search 2 service instances.
When To Migrate
It is highly recommended that you isolate the a9s Elasticsearch service instance before making the migration. This means that you should not write data to the a9s Elasticsearch service instance during the migration. This action prevents data inconsistency that might arise from the a9s Elasticsearch service instance receiving new data while the migration is taking place.
Due to this isolation, there will be a system downtime during the migration process; therefore we suggest planning your timeline in a way that you cover both minimizing the impact to your a9s Elasticsearch clients/users, and being able to respond to any unforeseen scenarios.
While there is no comprehensive workbench for the duration of the migration, we do have rough estimates based on internal and isolated tests with compact and simple data sets. On average, migrating 1GB takes about 1-3 minutes. This estimation however, does not consider factors such as the environment, its state and particularities, the network's throughput, etc.
Thus, we encourage you to take the estimate above with reservations, and to consider your own environment's capabilities when planning your migration timetable, as it may take some time to successfully execute.
Prerequisites
Create New a9s Search 2 Service Instance
Create a new a9s Search 2 service instance; this instance will be the destination of the data migration from an existing a9s Elasticsearch service instance.
- Do not insert any data before the migration itself, and the verification of a successful migration is done.
- Create a service instance with the same size (or greater) in regards to CPU, memory, and disk space.
Create a a9s Search 2 service instance.
Find The Service Instance Access Credentials. Ensure that you already have a9s service instance up and running.
a. First, create a service key.
cf create-service-key <cf-service-instance> <your-key>
b. Retrieve the service key information. The
host
,password
,port
, andusername
are important.cf service-key <cf-service-instance> <your-key>
Outcome Example:
(...)
"host": [
"https://vbd907998.service.dc1.dsf2.a9s:9200"
],
"hosts": [
"vbd907998-os-0.node.dc1.dsf2.a9s"
],
"password": "8e045ba28abe2bb1f840a9cc25ad3eab3d5d8416",
"port": 9200,
"scheme": "https",
"username": "c2ee7c1d7a566bdf6e9f0b84c345dcdcb34b5455"
(...)
Create a Reverse SSH Tunnel
The goal is to give the application developer access to the a9s Search service instance. To do this, it is necessary to create SSH tunnels to the a9s Search 2 service instance.
In the end, we want to achieve the scenario below.
(port 9200) (port 9200)
/-----> * a9s Search 2 * <--- Reindex --- * a9s Elasticsearch *
/
* CF Application * (via SSH Tunnel) [infrastructure network]
-------/----------------------------------------------------------------------
/ [Developer network]
* Developer Machine * (local port 9200)
Use 9200
as the local SSH tunnel port to access the a9s Search service instance.
Moreover, it is necessary to have access to the a9s Elasticsearch service instance in order to be able to perform the following tasks:
In the end, we also want to achieve the following scenario below.
(port 9200)
/--> * a9s Elasticsearch *
/
* CF Application * (via SSH Tunnel) [infrastructure network]
-------/----------------------------------------------------------------------
/ [Developer network]
* Developer Machine * (local port 9201)
Use 9201
for the local SSH tunnel port to access the a9s Elasticsearch service instance.
- Create the SSH Tunnels using the following information.
a. Create the SSH Tunnel to access the a9s Search service instance. Use port
9200
. b. Create the SSH Tunnel to access the a9s Elasticsearch service instance. Use port9201
.
It is possible to access any of the a9s Data Services locally. That means you can connect via a local client to the service instance for any purpose, such as debugging. Cloud Foundry (CF) provides a smart way to create SSH forward tunnels via a pushed application. For more information about this feature see the Accessing Apps with SSH section of the CF documentation.
First of all you must have an application bound to the service. For further details on how to do this see Bind an application to a Service Instance.
cf ssh
support must be enabled in the platform. You can ask your platform operator for further details.
You can find the steps to create the reverse tunnel in the section Create a Tunnel to The Service. of our Using a9s Search document.
Guarantee the Existence of Backups Before The Migration (Optional)
Even though the Reindex operation, the approach we are using in our migration process, only performs read
operations,
and should not harm the source data, we recommend to err on the side of caution when making any operation with the data,
and strongly advise to create backups of the last state of the data before trying it.
Isolate The Node Before The Migration
As mentioned before, it is recommended that you avoid client connections and stop writing data into the a9s Elasticsearch service instance during the migration.
You may achieve this by unbinding all applications from your a9s Elasticsearch service instance.
Not isolating the node can cause data loss, as well as inconsistencies during the database verification.
Migration Steps
This section explains all of the migration steps, both the mandatory and optional ones. These steps include various examples with placeholders for the necessary values. For further details on these placeholder values please see the Placeholder Values section.
Before Performimg The Migration (Optional)
The migration is done using the Reindex operation and we need migrate one index each time. As a result we need to do the following steps for each index in the destination a9s Search service instance.
Before the migration, the Application Developers may take some actions based on the particularities of their deployment environment. These actions are related to migration decisions and performance matters.
Migrate Mapping And Settings (Optional)
Reindex does not copy the settings from the source or its associated template. So any mapping, template, shard count, alias, replica, and so on must be configured ahead of time.
1 - Retrieve the desired information from the a9s Elasticsearch (migration source) and save this information.
- Settings
# Elasticsearch Request
# GET <target_index>/_settings?pretty
curl -k -u <source-elasticsearch-service-instance-username-localhost>:<source-elasticsearch-service-instance-password-localhost> -XGET https://<source-elasticsearch-service-instance-host-localhost>:<source-elasticsearch-service-instance-port-localhost>/<target_index>/_settings?pretty
Documentation Reference: Indices - Get Settings.
- Mapping
# Elasticsearch Request
# GET <target_index>/_mapping?pretty
curl -k -u <source-elasticsearch-service-instance-username-localhost>:<source-elasticsearch-service-instance-password-localhost> -XGET https://<source-elasticsearch-service-instance-host-localhost>:<source-elasticsearch-service-instance-port-localhost>/<target_index>/_mapping?pretty
Documentation Reference: Indices - Get Mapping.
- Templates
# Elasticsearch Request
# GET _template?pretty
# GET _template/<template_name>?pretty
curl -k -u <source-elasticsearch-service-instance-username-localhost>:<source-elasticsearch-service-instance-password-localhost> -XGET https://<source-elasticsearch-service-instance-host-localhost>:<source-elasticsearch-service-instance-port-localhost>/_template?pretty
# and
curl -k -u <source-elasticsearch-service-instance-username-localhost>:<source-elasticsearch-service-instance-password-localhost> -XGET https://<source-elasticsearch-service-instance-host-localhost>:<source-elasticsearch-service-instance-port-localhost>/_template/<template_name>?pretty
Documentation Reference: Get Templates.
- Alias
# Elasticsearch Request
# GET _alias?pretty
curl -k -u <source-elasticsearch-service-instance-username-localhost>:<source-elasticsearch-service-instance-password-localhost> -XGET https://<source-elasticsearch-service-instance-host-localhost>:<source-elasticsearch-service-instance-port-localhost>/_alias?pretty
Documentation Reference: Get Alias
2 - Create the index with the settings and mapping in the a9s Search (migration destination).
For migrating from Elasticsearch 6 only:
Additionally, incompatible keys inside "settings" (such as provided_name
) should be removed too. These keys can be
identified by the error messages from the OpenSearch PUT request in the following step.
More information about breaking changes between Elasticsearch 6 & 7 can be found here: Documentation Reference: settings changes Documentation Reference: mappings changes Documentation Reference: removal of mapping types
Elasticsearch 6 data example:
{
"settings" : {
"index" : {
"creation_date" : "1691071082956",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "8ggyfJXjTGGnHKyFTKs6UA",
"version" : {
"created" : "6082399"
},
"provided_name" : "twitter"
}
},
"mappings" : {
"tweet" : {
"properties" : {
"message" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
Modified data for a successful OpenSearch PUT request:
{
"settings" : {
"index" : {
"number_of_shards" : "5",
"number_of_replicas" : "1"
}
},
"mappings" : {
"properties" : {
"message" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
# Opensearch Request
# PUT <target_index>
#
# Data Example:
# {
# "settings": {
# "index": {
# "number_of_shards": 2,
# "number_of_replicas": 1
# }
# },
# "mappings": {
# "properties": {
# "age": {
# "type": "integer"
# }
# }
# }
# }
curl -k -u <destination-opensearch-service-instance-username>:<destination-opensearch-service-instance-password> -XPUT https://<destination-opensearch-service-instance-host>:<destination-opensearch-service-instance-port>/<target_index> -H 'Content-Type: application/json' -d '{
"settings": {
"index": {
"number_of_shards": 2,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"age": {
"type": "integer"
}
}
}
}'
Documentation Reference: Create index.
3 - Set templates
# Opensearch Request
# PUT _index_template/<template_name>
#
# Data Example:
# {
# "index_patterns": [
# "logs-2020-01-*"
# ],
# "template": {
# "aliases": {
# "my_logs": {}
# },
# "settings": {
# "number_of_shards": 2,
# "number_of_replicas": 1
# },
# "mappings": {
# "properties": {
# "timestamp": {
# "type": "date",
# "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
# },
# "value": {
# "type": "double"
# }
# }
# }
# }
# }
curl -k -u <destination-opensearch-service-instance-username>:<destination-opensearch-service-instance-password> -XPUT https://<destination-opensearch-service-instance-host>:<destination-opensearch-service-instance-port>/_index_template/<template_name> -H 'Content-Type: application/json' -d '{
"index_patterns": [
"logs-2020-01-*"
],
"template": {
"aliases": {
"my_logs": {}
},
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"value": {
"type": "double"
}
}
}
}
}'
Documentation Reference: Index Template.
Improve The Migration Performance (Optional)
As reindexing can be an expensive operation, depending on the size of your source index, we recommend that you disable
replicas in your destination index by setting number_of_replicas
to 0
, and to disable the refresh feature by setting
refresh_interval
to -1
. Then re-enable them once the reindex process is complete.
The index.refresh_interval
default value is "1s"
.
1 - Set the refresh_interval
and number_of_replicas
.
# Opensearch Request
# PUT /<target_index>/_settings
curl -k -u <destination-opensearch-service-instance-username>:<destination-opensearch-service-instance-password> -XPUT https://<destination-opensearch-service-instance-host>:<destination-opensearch-service-instance-port>/<target_index>/_settings -H 'Content-Type: application/json' -d '{
"index": {
"refresh_interval": -1,
"number_of_replicas": 0
}
}'
Documentation Reference: Update Setting.
2 - After the reindex process is complete, you can set your desired replica count and refresh interval settings.
Perform The Migration
- Use
curl
to perform the reindex request against the a9s Search service instance, to migrate the data from the a9s Elasticsearch service instance.
# Opensearch Request
# POST _reindex
curl -k -u <destination-opensearch-service-instance-username>:<destination-opensearch-service-instance-password> -XPOST https://<destination-opensearch-service-instance-host>:<destination-opensearch-service-instance-port>/_reindex?wait_for_completion=false -H 'Content-Type: application/json' -d '{
"source": {
"remote": {
"host": "https://<source-elasticsearch-service-instance-host>:<source-elasticsearch-service-instance-port>",
"username":"<source-elasticsearch-service-instance-username>",
"password":"<source-elasticsearch-service-instance-password>",
"socket_timeout": "<socket-timeout>"
},
"index": "<target_index>",
"size": <number-of-documents-to-reindex-by-batch>
},
"dest": {
"index": "<target_index>"
}
}'
Documentation Reference: Reindex.
Outcome Example:
{
"task" : "_hF1RPPpTyaqOPHtH0ZAig:5214"
}
- We will make this request asynchronously by using the parameter
wait_for_completion=false
. This is necessary because this operation can take a long time depending on the amount of data. - The
task
value will be named<task-id>
for future usage.
- Monitor the reindex process, to verify if it finished successfully.
# Opensearch Request
# GET _tasks/<task-id>?pretty
curl -k -u <destination-opensearch-service-instance-username>:<destination-opensearch-service-instance-password> -XGET https://<destination-opensearch-service-instance-host>:<destination-opensearch-service-instance-port>/_tasks/<task-id>?pretty
Documentation Reference: Tasks.
Outcome Example:
{
"completed" : true,
(...)
"action" : "indices:data/write/reindex",
(...)
"total" : 101,
"created" : 101,
"batches" : 21,
(...)
"failures" : []
"errors" : []
}
}
We are looking to verify that:
- The
completed
field istrue
. Also, theerrors
andfailures
fields are empty. - The
action
field iswrite/reindex
. - The
total
field has the total amount of documents existing in the source a9s Elasticsearch service instance.
Known Issues
The <number-of-documents-to-reindex-by-batch>
must be chosen accordingly to your indices properties. If the size is
bigger than the service instance capabilities support the operation will fail.
As an example, for the small service instance (2GB RAM and 4GB Disk), we would suggest:
- When the average size of the documents is around
10MB
use the size5
. - When the average size of the documents is around
1MB
use the size50
. - When the average size of the documents is around
100kb
use the size500
. - When the average size of the documents is less than
50kb
use the size1000
or greater.
Please be aware that if the value <number-of-documents-to-reindex-by-batch>
is incorrectly set the following errors
may happen:
"type" : "illegal_argument_exception",
"reason" : "Remote responded with a chunk that was too large. Use a smaller batch size.",
# or
"reason":"rejected execution of coordinating operation [coordinating_and_primary_bytes=0, replica_bytes=0, all_bytes=0, coordinating_operation_bytes=102278834, max_coordinating_and_primary_bytes=90282393]"}]
"type":"rejected_execution_exception"
# or
"reason":"Connection is closed"
"type":"connection_closed_exception"
If you encounter such an error, it is advisable to try to decrease <number-of-documents-to-reindex-by-batch>
.
Also, if the <number-of-documents-to-reindex-by-batch>
is too small the migration can take too long.
Check The Database Consistency
This section has some actions that can help to verify the data consistency after the migration. These steps include various examples with placeholders for the necessary values. For further details on these placeholder values please see the Placeholder Values section.
Check The Indices
- a9s Elasticsearch (Source).
1 - Refresh the indices.
# Elasticsearch Request
# POST <target_index>/_refresh
# POST _refresh
curl -k -u <source-elasticsearch-service-instance-username-localhost>:<source-elasticsearch-service-instance-password-localhost> https://<source-elasticsearch-service-instance-host-localhost>:<source-elasticsearch-service-instance-port-localhost>/<target_index>/_refresh
Documentation Reference: Refresh.
2 - List all indices.
# Elasticsearch Request
# GET _cat/indices?v
curl -k -u <source-elasticsearch-service-instance-username-localhost>:<source-elasticsearch-service-instance-password-localhost> https://<source-elasticsearch-service-instance-host-localhost>:<source-elasticsearch-service-instance-port-localhost>/_cat/indices?v
Documentation Reference: Cat index.
Outcome Example:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open index1 SGKtW0JgRimwh9yjhiU7Ig 1 2 101 0 671.6mb 671.6mb
green open index2 SGKtW0JgRimwh9yjhiU7Ig 1 2 50 0 232.6mb 232.6mb
- a9s Search (Destination)
1 - Refresh the indices.
# Opensearch Request
# POST <target_index>/_refresh
# POST _refresh
curl -k -u <destination-opensearch-service-instance-username>:<destination-opensearch-service-instance-password> https://<destination-opensearch-service-instance-host>:<destination-opensearch-service-instance-port>/<target_index>/_refresh
Documentation Reference: Refresh.
2 - List all indices
# Opensearch Request
# GET _cat/indices?v
curl -k -u <destination-opensearch-service-instance-username>:<destination-opensearch-service-instance-password> https://<destination-opensearch-service-instance-host>:<destination-opensearch-service-instance-port>/_cat/indices?v
Documentation Reference: Cat index.
Outcome Example:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open index1 SGKtW0JgRimwh9yjhiU7Ig 1 2 101 0 692.6mb 692.6mb
green open index2 SGKtW0JgRimwh9yjhiU7Ig 1 2 50 0 239.6mb 239.6mb
green open .opendistro_security RLj5S2PiSFG8eQ45T7El6A 1 0 10 0 59.7kb 59.7kb
green open .tasks 2My3SpdoRVKaPHR89DHDtQ 1 0 2 0 14.9kb 14.9kb
Note that the indices starting with .
, such as .opendistro_security
and .tasks
, are reserved for the Opensearch environment.
We must not consider it during the checking.
Checking:
- Verify if both service instances contain the same indices created by the application developer(s).
- Verify if each index has the same amount of
docs.count
. It must have the same value since there should be no write operation during the migration. - Verify some samples of documents on each index if they are equivalent in both service instances. Notice that this comparison
depends on the database particularities, therefore Application Developer must decide how to do it. For example, documents with
an specific
timestamp
can be compared between indices.
It is known that the OpenSearch indices can be larger in disk space than the corresponding Elasticsearch indices. Please
consider this when creating the a9s Search 2 service instance intended to be the destination of the data migration.
To have OpenSearch using less disk space, change the index codec setting to best_compression
, at the expense of
slower stored fields performance. For more information see:
Index Settings
Check The Settings and Mapping (Optional)
Follow the same steps and Point 1 at Migrate Mapping and Setting for a9s Elasticsearch and a9s Search and compare it.
It's advisable to do whether you have done the step of the Before Performing The Migration section and do care about the setting and mapping.
Glossary
Placeholder Values
Placeholder | Description | Example | Required? |
---|---|---|---|
<destination-opensearch-service-instance-host> | The domain of the destination service instance | localhost or 127.0.0.1 once the developer will use the Reverse SSH Tunnel | Yes |
<destination-opensearch-service-instance-port> | The port of the destination service instance | 9200 once the developer will use the Reverse SSH Tunnel | Yes |
<destination-opensearch-service-instance-password> | The password of the destination service instance | --- | Yes |
<destination-opensearch-service-instance-username> | The username of the destination service instance | --- | Yes |
<source-elasticsearch-service-instance-host> | The domain of the source service instance | Service keys host | Yes |
<source-elasticsearch-service-instance-port> | The port of the service instance that contains data to be migrated | 9200 | Yes |
<source-elasticsearch-service-instance-password> | The password of the source service instance | --- | Yes |
<source-elasticsearch-service-instance-username> | The username of the source service instance | --- | Yes |
<number-of-documents-to-reindex-by-batch> | The number of documents to reindex. The default number is 1000 | 100 | No |
<socket-timeout> | The wait time for socket reads. The Default number is 30s | "60m" safe number. | No |
<target_index> | The index | "my_index_from_source" | Yes |
<template_name> | Template name | --- | Yes |
<source-elasticsearch-service-instance-host-localhost> | The domain of the source service instance | localhost or 127.0.0.1 once the developer will use the Reverse SSH Tunnel | Yes |
<source-elasticsearch-service-instance-port-localhost> | The port of the service instance that contains data to be migrated | 9201 once the developer will use the Reverse SSH Tunnel | Yes |
<source-elasticsearch-service-instance-password-localhost> | The password of the source service instance | --- | Yes |
<source-elasticsearch-service-instance-username-localhost> | The username of the source service instance | --- | Yes |