Drupal-to-Drupal data migration - Part 2: The architecture

  • 7 minute read

In a previous post, I described the basics of using the Drupal-to-Drupal data migration module (migrate_d2d) to import data into a Drupal 7 site from a separate Drupal 5, Drupal 6, or Drupal 7 site. Now that the first full release of migrate_d2d is available, I'd like to dig deeper, explaining the architecture of migrate_d2d and showing some more advanced examples of extending it.

Basic class hierarchy

At the root of our class hierarchy is DrupalMigration, which provides support for the common arguments available to all d2d migrations. From DrupalMigration we derive abstract classes for each kind of data supported - DrupalUserMigration, DrupalNodeMigration, etc. Then, for each supported version of Drupal we derive a class for each object - DrupalUser6Migration, DrupalNode7Migration, etc. These classes can be instantiated directly, or extended for application-specific purposes.

This type-centric hierarchy allows mappings and behavior common to a given type of data among all Drupal versions to be shared, minimizing the version-specific code necessary to implement the migrations. However, there are operations whose implementations are specific to a version (in particular, dealing with fields) but common among multiple types. In a programming language supporting multiple inheritance, we could mix our type-centric hierarchy directly with a version-specific hierarchy to achieve this:

class DrupalNode6Migration extends DrupalNodeMigration, DrupalVersion6 {...}

PHP does not support multiple inheritance, however. So, to make the version-specific operatons available to our type-based hierarchy, we embed a version-specific object in our migration class, derived from abstract class DrupalVersion. There is an implementation of DrupalVersion for each supported version - DrupalVersion5, DrupalVersion6, and DrupalVersion7. This class implements services that cut across multiple types of content for a given version of Drupal - most importantly, discovery of custom field data and metadata.

Modifying queries

In most migrations, your constructor defines a query directly and passes it to MigrateSourceSQL:

<?php
$query = db_select('migrate_example_beer_account', 'mea')
          ->fields('mea', array('aid', 'status', 'posted', 'name', 'nickname',
                   'password', 'mail', 'sex', 'beers'));
$this->source = new MigrateSourceSQL($query);
?>

The pattern in migrate_d2d, however, is to always define the query in a query() method:

<?php
public function __construct(array $arguments) {
...
$this->source = new MigrateSourceSQL($this->query(), $this->sourceFields,
   NULL, $this->sourceOptions);
...
}

protected function query() {
$query = Database::getConnection('default', $this->sourceConnection)
          ->select('users', 'u')
          ->fields('u')
          ->condition('u.uid', 1, '>');
return $query;
}
?>

This enables us to easily modify (or entirely replace) the query in a derived class. Suppose we have a node migration from Drupal 6 where we want to exclude nodes with a particular term assigned to them, which we have identified as having term ID 2. We can fetch the basic node query from the DrupalNode6Migration class (our parent) and add a join and condition to it:

<?php
protected function query() {
   $query = parent::query();
   $query->leftJoin('term_node', 'tn', 'n.vid=tn.vid AND tn.tid=2');
   $query->isNull('tn.tid');
   return $query;
}
?>

Adding source fields

Often there isn't a one-to-one correspondence between fields in our source and destination sites, and we need to add fields to the source row in prepareRow(). When defining a migration from scratch, we pass an array as the second argument to MigrateSourceSQL, but when that call is in the base class we're deriving from we can't do that directly. The migrate_d2d classes maintain a member array sourceFields, which each base class populates with source fields (both base fields for the type, plus CCK/custom fields added to it), and we can add to this array. Notably, we must do that before calling our parent constructor, because that's where those fields will be passed to MigrateSourceSQL:

<?php
class ArticleMigration extends DrupalNode6Migration {
public function __construct($arguments) {
   $this->sourceFields['state'] = t('State extracted from location');
   $this->sourceFields['city'] = t('City extracted from location');
   parent::__construct($arguments);
   $this->addFieldMapping('field_state', 'state');
   $this->addFieldMapping('field_city', 'city');
   $this->addFieldMapping(NULL, 'field_location')
        ->description(t('Split this into city and state for D7'));
...
public function prepareRow($row) {
// Always include this snippet, in case our parent class decides to ignore the row
if (parent::prepareRow($row) === FALSE) {
   return FALSE;
}
$location = explode(',', $row->field_location);
$row->city = trim($location[0]);
$row->state = trim($location[1]);
}
?>

Migrating between base types

The built-in classes are designed to migrate like-to-like - Drupal 6 nodes to Drupal 7 nodes, Drupal 5 users to Drupal 7 users, etc. What if you are changing an implementation entirely? This is an issue I faced in one project, where photo gallery nodes in a Drupal 6 site needed to be converted to photo category terms in the Drupal 7 site. In this case, you need to derive your own migration directly from DrupalMigration. You do need to do more work in this case - setting up your source, destination, and map classes directly, but you do get a bit of help from the migrate_d2d framework as well (as in setting up your source connection).

<?php
class Gallery6Migration extends DrupalMigration {
/**
  * The machine name of the node type we're migrating from.
  *
  * @var string
  */
protected $sourceType;

/**
  * The machine name of the Drupal 7 vocabulary we're migrating into.
  *
  * @var string
  */
protected $destinationVocabulary;

public function __construct($arguments) {
   parent::__construct($arguments);
   $this->sourceType = $arguments['source_type'];
   $this->destinationVocabulary = $arguments['destination_vocabulary'];

   $this->source = new MigrateSourceSQL($this->query(), $this->sourceFields,
     NULL, $this->sourceOptions);

   $this->destination = new MigrateDestinationTerm($this->destinationVocabulary);

   $this->map = new MigrateSQLMap($this->machineName,
     array(
       'nid' => array('type' => 'int',
                         'unsigned' => TRUE,
                         'not null' => TRUE,
                         'description' => 'Source node ID',
                       ),
     ),
      MigrateDestinationTerm::getKeySchema()
   );

   $this->addFieldMapping('name', 'title');

   $this->addUnmigratedDestinations(array(
     'description',
     'format',
     'parent',
     'parent_name',
     'path',
     'pathauto',
     'weight',
   ));
}

/**
  * Query the gallery nodes - all we need is the title.
  *
  * @return QueryConditionInterface
  */
protected function query() {
   $query = Database::getConnection('default', $this->sourceConnection)
            ->select('node', 'n')
            ->fields('n', array('nid', 'title'))
            ->condition('type', $this->sourceType);
   return $query;
}
}
?>