LogoLogo
  • LEARN
    • Overview
    • Capabilities
      • Rich and Diverse Assets
      • Organised Collections
      • Asset Lifecycle Management
      • Powerful Discovery
      • Phygital Discovery
      • Observability
    • Technical Architecture
    • Product & Developer Guide
      • Content Service
        • Content APIs
          • Features
          • Architecture
          • Code Structure
          • Schemas
          • APIs
          • Jobs
            • Content publish
            • Asset enrichment
            • Audit event generator
            • Video stream generator
          • FAQs
            • Upload Content
        • Media APIs
          • Features
          • Architecture
          • Schemas
          • APIs
          • Jobs
            • Asset enrichment
            • Audit event generator
          • FAQs
        • Channel APIs
          • Architecture
          • Schema
          • APIs
          • Configuration
          • FAQs
        • License APIs
          • Architecture
          • Schema
          • APIs
          • Configuration
          • FAQs
        • Collection APIs
          • Features
          • Architecture
          • Schemas
          • APIs
          • Jobs
            • Content publish
            • Post publish processor
            • Audit event generator
          • FAQs
      • Search Service
        • Architecture
        • Code Structure
          • Configuration
        • APIs
        • Jobs
          • Search indexer
          • Audit history indexer
        • FAQs
      • Taxonomy Service (Taxonomy & Tagging)
        • Framework APIs
          • Architecture
          • Code Structure
            • Configuration
          • Schemas
          • APIs
          • Jobs
            • Audit event generator
          • FAQs
            • Create Framework
            • Add Content to Framework
        • Object Category APIs
          • Architecture
          • Code Structure
          • Schemas
          • APIs
          • Jobs
            • Audit event generator
          • Configuration
          • FAQs
      • DIAL Service
        • Architecture
        • APIs
        • Jobs
        • Configuration
        • FAQs
          • Link DIAL Code to a Book
      • Player
        • V1
          • Features
          • Architecture
          • Players
            • ECML Player - v1
              • Quiz
              • Create Content
            • Video Player v1
            • Epub Player - v1
            • PDF Player - v1
            • HTML-h5p Player - v1
          • Content import and preview folder creation
          • FAQs
        • V2
          • PDF Player
            • Features
            • Architecture
            • Configuration
            • FAQ's
          • Epub Player
            • Features
            • Architecture
            • Configuration
            • FAQ's
          • Video Player
            • Features
            • Architecture
            • Configuration
            • FAQ's
        • Telemetry Events
          • Offline Telemetry
          • Generate API keys
      • Editors
        • Architecture
        • Interactive Editor
          • Features
            • Content Editor
            • Adding Question Set
            • Concept Selector
            • Add Image
            • Add Video
            • Adding Math Function
            • Limited Publishing and Sharing
        • File Upload Editor
          • Features
          • FAQ's
            • Content Upload
        • Collection Editor - V1
        • Collection Editor - V2
          • Features
          • Architecture
          • APIs
          • FAQ's
        • How to contribute as a JS plugin
          • How do I think about a plugin?
          • Interactive Editor Plugin Guide
          • Using SDK to create and test the plugin
      • Other Knowlg Jobs
        • Configuration
        • FAQs
      • Other
        • Data Migration
        • Schema strucure
    • Product Roadmap
  • ENGAGE
    • Discuss
    • Contribute to SB Knowlg
  • USE
    • Overview
    • System requirements
    • Installation guide
      • Players
        • V1
          • How to setup
        • V2
          • Epub Player
          • Video Player
          • Pdf Player
      • Editors
        • Interactive Editor
        • File Upload Editor
        • Collection Editor - V1
        • Collection Editor - V2
        • Plugins
      • Services
        • Content Service
          • Configuration
        • Search Service
        • Taxonomy Service (Taxonomy & Tagging)
        • DIAL Service
        • Knowlg Jobs
    • Deployment
    • Release notes
      • Release - 6.2.0 (Ongoing)
      • Release - 6.1.0 (latest)
      • Release - 6.0.0
      • Release - 5.7.0
      • Release - 5.6.0
      • Release - 5.5.0
      • Release - 5.4.0
      • Release - 5.3.0
      • Release - 5.2.0
      • Release - 5.1.0
      • Release - 5.0.0
      • Release - 4.10.0
      • Release - 4.8.0
    • Breaking Changes
    • Deprecations
      • Release-5.2.0
      • Release-5.1.0
    • Release
    • USE
      • Infra Requirements
    • Additional Reading
      • Content Service Environment Variables
      • Import External Content
  • Archived
    • Powerful Discovery
      • Usecase
      • Sample data
Powered by GitBook
On this page
  • Migration Steps
  • Migration status: migrationVersion of the node object
  • Verification of migration steps:
  • CNAME URL Configuration:

Was this helpful?

Edit on GitHub
  1. LEARN
  2. Product & Developer Guide
  3. Other

Data Migration

This document details about migration of data from cloud absolute paths stored in the database with relative paths OR with new CSP absolute paths.

This document details about the migration of data with respect to

  • Replace existing absolute paths in database with relative paths.

  • Migration of data while changing the CSP provider

    Example: Moving from Azure to AWS service provider.

Below are the data that currently(upto release-5.2.0) store cloud specific absolute URLs that are to be migrated to relative paths OR with new CSP provider absolute URLs:

  • neo4J fields based on objectType:

"asset": ["artifactUrl", "thumbnail", "downloadUrl"],
"content": ["appIcon", "artifactUrl", "posterImage", "previewUrl", "thumbnail", "assetsMap", "certTemplate", "itemSetPreviewUrl", "grayScaleAppIcon", "sourceURL", "variants", "downloadUrl","streamingUrl"],
"contentimage": ["appIcon", "artifactUrl", "posterImage", "previewUrl", "thumbnail", "assetsMap", "certTemplate", "itemSetPreviewUrl", "grayScaleAppIcon", "sourceURL", "variants", "downloadUrl","streamingUrl"],
"collection": ["appIcon", "artifactUrl", "posterImage", "previewUrl", "thumbnail", "toc_url", "grayScaleAppIcon", "variants", "downloadUrl"],
"collectionimage": ["appIcon", "artifactUrl", "posterImage", "previewUrl", "thumbnail", "toc_url", "grayScaleAppIcon", "variants", "downloadUrl"],
"plugins": ["artifactUrl"],
"itemset": ["previewUrl", "downloadUrl"],
"assessmentitem": ["data", "question", "solutions", "editorState", "media"],
"question": ["appIcon","artifactUrl", "posterImage", "previewUrl","downloadUrl", "variants","pdfUrl"],
"questionimage": ["appIcon","artifactUrl", "posterImage", "previewUrl","downloadUrl", "variants","pdfUrl"],
"questionset": ["appIcon","artifactUrl", "posterImage", "previewUrl","downloadUrl", "variants","pdfUrl"],
"questionsetimage": ["appIcon","artifactUrl", "posterImage", "previewUrl","downloadUrl", "variants","pdfUrl"]

Note: Above data is available as configuration(neo4j_fields_to_migrate) in 'csp-migrator' job.

  • Cassandra data that will be migrated

Keyspace
Table
Column

hierarchy_keystore

content_hierarchy

hierarchy

content_keystore

content_data

body

content_keystore

question_data

body, editorState, answer, solutions, instructions, media

questionset_keystore

questionset_hierarchy

hierarchy

dialcodes

dialcode_images

url

dialcodes

dialcode_batch

url

  • ECAR (needs live nodes republishing)

  • streamingUrl (needs regeneration based on new Media service provider)

Reference diagram to know how the migration of existing data with CNAME(storing relative path DB)

Flink Jobs used for migration:

Note:

  • Jenkins Job 'Neo4jElasticSearchSyncTool' is used to insert the events into 'csp-migrator' job input topic. 'csp-migrator' job will further insert topics into 'live-node-publisher' job and 'live-video-stream-generator' jobs based on conditions. Jenkins job command: migratecspdata

  • 'cassandra-data-migration' job is to be used for migration of 'dialcode_images' and 'dialcode_batch' cassandra tables in 'dialcodes' keyspace.

  • Run the migration flink jobs in a separate kafka setup with increased processing ability and storage for storing all kakfa events and logs.

  • Increase the infra for neo4j. Also, increase the neo4j max heap size in neo4j conf file.

  • Increase infra for logstash, search-indexer flink job and ElasticSearch to handle the neo4j transaction data sync.

Migration Steps

  • Go to Deploy/KnowledgePlatform/Neo4jElasticSearchSyncTool jenkins job.

  • Select the command as migratecspdata

  • And copy and paste the parameter one by one in parameter section in jenkins deployment job.

Sequence
Type
Sync Tool Jenkins Parameters

1

Video Asset

--graphId domain --objectType Asset --mimeType video/webm,video/mp4 --delay 2000

2

Other Asset

--graphId domain --objectType Asset --delay 2000

3

Video Content

--graphId domain --objectType Content,ContentImage --mimeType video/mp4,video/webm --delay 2000

4

Plugin, Youtube Content, PDF Content,EPUB Content

--graphId domain --objectType Content,ContentImage --mimeType application/vnd.ekstep.plugin-archive,video/x-youtube,application/pdf,application/epub --delay 2000

5

AssessmentItem

--graphId domain --objectType AssessmentItem --delay 2000

6

ItemSet

--graphId domain --objectType ItemSet --delay 2000

7

H5P Content

--graphId domain --objectType Content,ContentImage --mimeType application/vnd.ekstep.h5p-archive --delay 2000

8

HTML

--graphId domain --objectType Content,ContentImage --mimeType application/vnd.ekstep.html-archive --delay 2000

9

ECML

--graphId domain --objectType Content,ContentImage --mimeType application/vnd.ekstep.ecml-archive --delay 2000

10

Remaining Contents

--graphId domain --objectType Content,ContentImage --delay 2000

11

Collection

--graphId domain --objectType Content,ContentImage,Collection,CollectionImage --mimeType application/vnd.ekstep.content-collection --delay 2000

12

dialcodes.dialcode_images

Push below event to "{{env}}.cassandra.data.migration.request" kafka topic

13

dialcodes.dialcode_batch

Push below event to "{{env}}.cassandra.data.migration.request" kafka topic

Note:

  • If the 'migratecspdata' command stops before reaching 100%, please wait for 'csp-migrator' job lag to reach 0 before triggerring the same 'migratecspdata' command again.

  • ECML migration can be triggered only after Asset, AssessmentItem and ItemSet migration is completed.

  • For Collection migration to be triggered, pre-requisites are:

a. Question, QuestionSet migration should be completed (as part of Inquiry BB).

b. All assets and contents are to be migrated successfully.

c. All migrated contents data should be synced to ElasticSearch.

Migration status: migrationVersion of the node object

1.Neo4J & Cassandra data migration started

  • no version => Data migration started for each Neo4j node. It will migrate the Neo4j data and Cassandra data migration failed for the object

  • 1.0 => Success: Neo4j data and Cassandra data migration completed for the object(node)

  • 0.1 => Fail: Data migration is failed for the Neo4J data or Cassandra data of the specific node(identifier is the key to know for which node it failed. We can check the logs of the service to know the reason of failure)

  • 0.5 => Skipped: migration skipped for the object

2.ECAR Generation(after previous step of Neo4J & Cassandra data migration success)

  • 1.1 => Success: Neo4j data and Cassandra data migration completed for the object and ECAR is republished.

  • 0.2 => Fail: Neo4j data and Cassandra data migration completed for the object. But, ECAR republish has failed.

3.Video streaming generation (after previous step of ECAR generate is success)

  • 1.2 => Success: Neo4j data and Cassandra data migration completed for video type of asset/content and streamingUrl is regenerated successfully.

Verification of migration steps:

More details of verification steps are added in the below confluence wiki

CNAME URL Configuration:

  • 'cloudstorage_base_path' private/public repo variable will be updated with CNAME URL ONLY.

  • 'valid_cloudstorage_base_urls' private/public repo update with CNAME URL along with Cloud storage BLOB URL

  • Update of 'dial_service_schema_base_path' private repo variable with CNAME if it is having blob URL.

  • 'sunbird_cloud_storage_urls' public/private devops repo variable update with CNAME URL addition.

  • Restart all services and jobs (Services: Assessment service, Content Service, Taxonomy Service, Learning Service, DIAL Service. Jobs: asset-enrichment, content-auto-creator, content-publish, qrcode-image-generator, search-indexer)

  • Run Sync tool using 'syncbyobjectType' command to sync all assets, contents and collections from neo4j to ElasticSearch with CNAME URLs.

  • If CNAME is configured post migration, republishing of all Live Contents and Collections is to be triggered for ECARs to be updated with CNAME URLs.

PreviousOtherNextSchema strucure

Last updated 2 years ago

Was this helpful?

: For migration of data in eno4j and cassandra tables.

: For republishing of live nodes (Content and Collection).

: For regeneration of streamUrl using new Media service.

: For migration of data in any cassandra table, column wise.

The content migration should execute in the below order only. Otherwise there is a chances of migration failure because of dependent content is not yet migrated.

Before running the migration steps, go and run all the queries and keep the output to compare after migration.

{"eid":"BE_JOB_REQUEST","ets":1619527882745,"mid":"LP.1619527882745.32dc378a-430f-49f6-83b5-bd73b767ad36","actor":{"id":"cassandra-data-migration","type":"System"},"context":{"channel":"ORG_001","pdata":{"id":"org.sunbird.platform","ver":"1.0"},"env":"dev"},"edata":{"column":"url", "columnType":"String", "table": "dialcode_images", "keyspace": "dialcodes", "primaryKeyColumn": "filename", "primaryKeyColumnType": "String", "action":"migrate-cassandra","iteration":1}}
{"eid":"BE_JOB_REQUEST","ets":1619527882745,"mid":"LP.1619527882745.32dc378a-430f-49f6-83b5-bd73b767ad36","actor":{"id":"cassandra-data-migration","type":"System"},"context":{"channel":"ORG_001","pdata":{"id":"org.sunbird.platform","ver":"1.0"},"env":"dev"},"edata":{"column":"url", "columnType":"String", "table": "dialcode_batch", "keyspace": "dialcodes", "primaryKeyColumn": "processid", "primaryKeyColumnType": "UUID", "action":"migrate-cassandra","iteration":1}}
csp-migrator
live-node-publisher
live-video-stream-generator
cassandra-data-migration
more details
here
https://app.diagrams.net/#G1HzUob6a9_TrVWhcoo1nWoKPVN0PKzpmQapp.diagrams.net
migration flow diagram (migration-job flow & migration-process workflow)
Confluence
Logo