Data Migration
This document details about migration of data from cloud absolute paths stored in the database with relative paths OR with new CSP absolute paths.
This document details about the migration of data with respect to
Replace existing absolute paths in database with relative paths.
Migration of data while changing the CSP provider
Example: Moving from Azure to AWS service provider.
Below are the data that currently(upto release-5.2.0) store cloud specific absolute URLs that are to be migrated to relative paths OR with new CSP provider absolute URLs:
neo4J fields based on objectType:
Cassandra data that will be migrated
hierarchy_keystore
content_hierarchy
hierarchy
content_keystore
content_data
body
content_keystore
question_data
body, editorState, answer, solutions, instructions, media
questionset_keystore
questionset_hierarchy
hierarchy
dialcodes
dialcode_images
url
dialcodes
dialcode_batch
url
ECAR (needs live nodes republishing)
streamingUrl (needs regeneration based on new Media service provider)
Reference diagram to know how the migration of existing data with CNAME(storing relative path DB)
Flink Jobs used for migration:
Migration Steps
Go to Deploy/KnowledgePlatform/Neo4jElasticSearchSyncTool jenkins job.
Select the command as migratecspdata
And copy and paste the parameter one by one in parameter section in jenkins deployment job.
1
Video Asset
--graphId domain --objectType Asset --mimeType video/webm,video/mp4 --delay 2000
2
Other Asset
--graphId domain --objectType Asset --delay 2000
3
Video Content
--graphId domain --objectType Content,ContentImage --mimeType video/mp4,video/webm --delay 2000
4
Plugin, Youtube Content, PDF Content,EPUB Content
--graphId domain --objectType Content,ContentImage --mimeType application/vnd.ekstep.plugin-archive,video/x-youtube,application/pdf,application/epub --delay 2000
5
AssessmentItem
--graphId domain --objectType AssessmentItem --delay 2000
6
ItemSet
--graphId domain --objectType ItemSet --delay 2000
7
H5P Content
--graphId domain --objectType Content,ContentImage --mimeType application/vnd.ekstep.h5p-archive --delay 2000
8
HTML
--graphId domain --objectType Content,ContentImage --mimeType application/vnd.ekstep.html-archive --delay 2000
9
ECML
--graphId domain --objectType Content,ContentImage --mimeType application/vnd.ekstep.ecml-archive --delay 2000
10
Remaining Contents
--graphId domain --objectType Content,ContentImage --delay 2000
11
Collection
--graphId domain --objectType Content,ContentImage,Collection,CollectionImage --mimeType application/vnd.ekstep.content-collection --delay 2000
12
dialcodes.dialcode_images
Push below event to "{{env}}.cassandra.data.migration.request" kafka topic
13
dialcodes.dialcode_batch
Push below event to "{{env}}.cassandra.data.migration.request" kafka topic
Migration status: migrationVersion of the node object
1.Neo4J & Cassandra data migration started
no version => Data migration started for each Neo4j node. It will migrate the Neo4j data and Cassandra data migration failed for the object
1.0 => Success: Neo4j data and Cassandra data migration completed for the object(node)
0.1 => Fail: Data migration is failed for the Neo4J data or Cassandra data of the specific node(identifier is the key to know for which node it failed. We can check the logs of the service to know the reason of failure)
0.5 => Skipped: migration skipped for the object
2.ECAR Generation(after previous step of Neo4J & Cassandra data migration success)
1.1 => Success: Neo4j data and Cassandra data migration completed for the object and ECAR is republished.
0.2 => Fail: Neo4j data and Cassandra data migration completed for the object. But, ECAR republish has failed.
3.Video streaming generation (after previous step of ECAR generate is success)
1.2 => Success: Neo4j data and Cassandra data migration completed for video type of asset/content and streamingUrl is regenerated successfully.
Verification of migration steps:
More details of verification steps are added in the below confluence wiki
CNAME URL Configuration:
'cloudstorage_base_path' private/public repo variable will be updated with CNAME URL ONLY.
'valid_cloudstorage_base_urls' private/public repo update with CNAME URL along with Cloud storage BLOB URL
Update of 'dial_service_schema_base_path' private repo variable with CNAME if it is having blob URL.
'sunbird_cloud_storage_urls' public/private devops repo variable update with CNAME URL addition.
Restart all services and jobs (Services: Assessment service, Content Service, Taxonomy Service, Learning Service, DIAL Service. Jobs: asset-enrichment, content-auto-creator, content-publish, qrcode-image-generator, search-indexer)
Run Sync tool using 'syncbyobjectType' command to sync all assets, contents and collections from neo4j to ElasticSearch with CNAME URLs.
If CNAME is configured post migration, republishing of all Live Contents and Collections is to be triggered for ECARs to be updated with CNAME URLs.
Last updated
Was this helpful?