Migrate from a CSV to content entities with Paragraphs

Submitted by christophe on Tue, 25/06/2019 - 18:13
Migrate discography


This article will explain how to use migration templates with a CSV that contains Paragraphs data on several lines.

For Paragraphs we could have this first structure, inline: this case is covered by this excellent article Migration of CSV Data into Paragraphs.

ID Host entity title Paragraph1 field1 Paragraph1 field2 Paragraph2 field1 Paragraph2 field2
1 Jimi Hendrix Axis: Bold as Love https://www.deezer.com/fr/album/454044 Live At The Fillmore East https://www.deezer.com/fr/album/454045
2 The Doors Strange Days https://www.deezer.com/fr/album/340880 L.A. Woman https://www.deezer.com/fr/album/6415260

 

For our case, we will assume that our Paragraphs information are separated on several lines, so the structure is more looking like that:

ID Host entity title Paragraph field 1 Paragraph field 2
1 Jimi Hendrix Axis: Bold as Love https://www.deezer.com/fr/album/454044
2 Jimi Hendrix Live At The Fillmore East https://www.deezer.com/fr/album/454045
3 The Doors Strange Days https://www.deezer.com/fr/album/340880
4 The Doors L.A. Woman https://www.deezer.com/fr/album/6415260

 

We may say that the first structure seems ok to cover most use cases, but if we extend the discography example with more Albums or with Tracks migration, it could not fit so well. The second one will be more readable, especially if this list needs a round of manual edit/review before import.

We assume here we want to add a list of Albums with Tracks.

So our CSV file looks like:

id,album_title,track_title,track_url
1,Axis: Bold As Love,Exp,https://www.deezer.com/fr/track/4952828
2,Axis: Bold As Love,Up From The Skies,https://www.deezer.com/fr/track/4952829
3,Axis: Bold As Love,Spanish Castle Magic,https://www.deezer.com/fr/track/4952830
4,Axis: Bold As Love,Wait Until Tomorrow,https://www.deezer.com/fr/track/4952832
5,Axis: Bold As Love,Aint No Telling,https://www.deezer.com/fr/track/4952831
...

And we have this Drupal model:

Album media

  • Track (Paragraphs)
  • Name
  • (...)

Track paragraph

  • Link
  • Title
  • (...)

First thought: we might use a custom process plugin. This is not the best approach here because the migration will happen in two steps: first, the Tracks paragraphs then the Albums media.
So, it might lead to a second file creation, for the Albums, and we want to avoid this.

Second approach: re-use the same CSV for the Albums, but transform it with a data parser.

We will still use the Migrate Source CSV module to create the Tracks Paragraphs in a first template, as the original structure perfectly matches our use case.

migrate_plus.migration.track_paragraphs.yml

id: track_paragraphs
label: Track Paragraphs
migration_group: discography

source:
  plugin: csv
  path: modules/custom/migrate_discography/data/album_tracks.csv
  header_row_count: 1
  keys:
    - id

process:
  field_title: track_title
  field_link:
    plugin: urlencode
    source: track_url

destination:
  plugin: entity_reference_revisions:paragraph
  default_bundle: track

migration_dependencies:
  required: {}
  optional: {}

dependencies:
  enforced:
    module:
      - migrate_discography

Then, with a data parser, we will

  1. Dedupe the entity id's to create one Media per album id
  2. Change the structure so we can provide associative arrays to match what the Migrate Plus template expects.

We will extend the Json data parser from Migrate Plus for that.

migrate_plus.migration.album_media.yml

id: album_media
label: Album Media
migration_group: discography

source:
  plugin: url
  data_fetcher_plugin: file
  # Make use of a custom parser here, to convert the CSV
  # into associative arrays.
  data_parser_plugin: album_parser
  track_changes: true
  urls: modules/custom/migrate_discography/data/album_tracks.csv
  item_selector: /albums
  fields:
     -
      name: album_title
      label: Album title
      selector: album_title
     -
      # This field does not exist as is in the CSV
      # and is provided by the data parser.
      name: tracks
      label: Tracks
      selector: tracks
  ids:
    album_title:
      type: string

process:
  # Media name.
  name: album_title
  # Paragraphs field.
  field_tracks:
    plugin: sub_process
    source: tracks
    process:
      temporary_ids:
        plugin: migration_lookup
        migration: track_paragraphs
        # The id is the one from the CSV,
        # used to get the right paragraph.
        source: id
      target_id:
        plugin: extract
        source: '@temporary_ids'
        index:
          - 0
      target_revision_id:
        plugin: extract
        source: '@temporary_ids'
        index:
          - 1

destination:
  plugin: entity:media
  default_bundle: album

migration_dependencies:
  required:
    - track_paragraphs
  optional: {}

dependencies:
  enforced:
    module:
      - migrate_discography

AlbumParser.php

<?php

namespace Drupal\migrate_discography\Plugin\migrate_plus\data_parser;

use Drupal\migrate_plus\Plugin\migrate_plus\data_parser\Json;

/**
 * Builds relations between Albums and Tracks
 * and dedupes Album entities from a flat CSV.
 * Then delegates to the Json data parser for the selectors.
 *
 * @DataParser(
 *   id = "album_parser",
 *   title = @Translation("Album parser")
 * )
 */
class AlbumParser extends Json {

  /**
   * {@inheritdoc}
   */
  protected function getSourceData($url) {
    // Get the CSV.
    $response = $this->getDataFetcherPlugin()->getResponseContent($url);
    // Convert the flat CSV into associative arrays.
    // 0 = Id
    // 1 = Album title
    // 2 = Track title
    // 3 = Track url
    $source_data = [
      'albums' => [],
    ];
    $lines = explode("\n", $response);
    // Exclude the first (header) row. Could be moved in config.
    array_shift($lines);
    $albumDetails = [];
    foreach ($lines as $line) {
      $csvLine = str_getcsv($line);
      if (!empty($csvLine[1])) {
        if (!array_key_exists($csvLine[1], $albumDetails)) {
          $albumDetails[$csvLine[1]] = [
            'album_title' => $csvLine[1],
            'tracks' => [],
          ];
        }
        $albumDetails[$csvLine[1]]['tracks'][] = [
          'id' => $csvLine[0],
        ];
      }
    }
    // In two times, to avoid key indexed results by product id.
    foreach ($albumDetails as $albumDetail) {
      $source_data['albums'][] = $albumDetail;
    }

    // Section from parent class.

    // Backwards-compatibility for depth selection.
    if (is_int($this->itemSelector)) {
      return $this->selectByDepth($source_data);
    }

    // Otherwise, we're using xpath-like selectors.
    $selectors = explode('/', trim($this->itemSelector, '/'));
    foreach ($selectors as $selector) {
      if (!empty($selector)) {
        $source_data = $source_data[$selector];
      }
    }
    return $source_data;
  }

}

Then we can check the status.

drush migrate status

and import it ?

Drush migrate import

Here is the repository containing this example.

 

Photo by Dan Stark on Unsplash

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.