Incorporating Translations into Resource Files from Excel

In a previous post we showed how to provide simple globalization support for multiple cultures/languages in a Blazor application. In this post, we will walk through a process that converts client-provided translations into the resource file format (.resx) used by our application.

Introduction

The goals are:

  1. Provide an easy way for our client to provide initial translations.
  2. Move towards an approach that allows our client to self-maintain translations.
  3. Position the application to move away from embedded resource files and towards dynamic translations provided by database records.

To meet the first goal, we provide our client with a spreadsheet containing the following columns:

  • KeyName
  • English
  • German
  • French
  • Portuguese
  • Spanish

What the client initially receives may look like this:

The completed file we receive back in return may look like this:

The steps we will take to get this into the format we want include:

  1. Convert the spreadsheet data into a comma-separated-value format file.
  2. Create database tables to support our needs.
  3. Import data from our CSV file into a staging table in our database.
  4. Transform the data into a more robust table in our database.
  5. Extract data, one language at a time in a format compatible with our .resx resource files.

Converting Spreadsheet data to CSV format

The first step in our process is to save this file in a comma-separated-value format. Using Excel, this is as easy as using the File | Save As command, then selecting the CSV format.

It is worth noting from the screenshot that Excel will save the CSF file in UTF-8 format. There is a minor tweak we will need to do in a moment to ensure this file’s data is properly imported. First, though, let’s prepare two database tables – one for staging and one with a more robust schema for storing our translation data.

Create database tables

The example SQL provided below is compatible with SQL Server. The first table we create is designed to mirror the format of data in our CSV file.

The dbo.Translations table

CREATE TABLE [dbo].[Translations](
	[KeyName] [nvarchar](450) NOT NULL,
	[EnglishValue] [nvarchar](2000) NULL,
	[GermanValue] [nvarchar](2000) NULL,
	[FrenchValue] [nvarchar](2000) NULL,
	[PortugeseValue] [nvarchar](2000) NULL,
	[SpanishValue] [nvarchar](2000) NULL
) ON [PRIMARY]

Next, we will create a table in SQL server that is suitable for storing data in a slightly different format we desire for support of known use cases of our project’s upcoming phase 2.

The dbo.LocalizationRecords table

In a future project phase, the application we are writing will support dynamic changes to translations. In other words, we will eventually replace the embedded .resx resource files with database lookups. So, we are operating with the premise that phase 1 of this application will use the classic .resx approach, but we also want to position the application to support:

  • Persistence and retrieval of translation data in a database.
  • A user interface that allows our client to self-maintain translations.

There are other complexities, like a requirement to support offline use of the application, but those topics are beyond the scope of this blog post.

Continuing on, we can achieve our goal by creating a second, more robust table schema, borrowed from Blazor Boilerplate. It is worth noting that both of our tables use the ‘N’ character to establish ‘nvarchar’ type fields.

CREATE TABLE [dbo].[LocalizationRecords](
	[Id] [bigint] IDENTITY(1,1) NOT NULL,
	[MsgId] [nvarchar](450) NULL,
	[MsgIdPlural] [nvarchar](max) NULL,
	[Translation] [nvarchar](max) NULL,
	[Culture] [nvarchar](450) NULL,
	[ContextId] [nvarchar](450) NULL,
 CONSTRAINT [PK_LocalizationRecords] PRIMARY KEY CLUSTERED 
(
	[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

Import data from CSV to Staging

With our staging table and end-target tables created, let’s use SQL Server’s Bulk Insert command to import data from our CSV file. The command we run looks like this:

BULK INSERT Translations FROM 'C:\Temp\data.csv'
   WITH (
      CODEPAGE = '65001',
      DATAFILETYPE ='char',
      FIELDTERMINATOR = ',',
      ROWTERMINATOR = '\n'
);

This is the point where we remind ourselves that our source CSV file was in UTF-8 format. The CODEPAGE parameter in the above statement basically corresponds to the codepage number for UTF-8. Technically, ‘char’ is the default for the DATAFILETYPE parameter, so it may not be necessary. However, it is included for awareness. Also, there are other possible approaches such as opening our saved UTF-8 CSV file in Notepad++, and re-saving with the encoding changed to UCS-2 Big (or little) Endian. In any case, the above should work in most situations. The important thing to avoid is not being aware of the CODEPAGE parameter and then end up wondering why your imported data looks corrupted:

Compared to:

For more information on the Bulk Insert command, click here.

With our data now imported into a staging table, let’s see how to migrate it to the end target table.

Populate localization records using staging data

Given the schema of our two tables, we essentially want to rotate the columns from the Translations staging table using a SQL query and insert them into our target LocalizationRecords table. We will do this using combinations of SQL Server’s CROSS APPLY operator and common table expressions, or CTE.

There are three queries we are going to run to achieve our desired end result:

  1. Update records where they already exist
  2. Insert records that do not yet exist
  3. Delete records where the translation is null
-- Update existing records using CTE
;WITH cte (MsgId, Translation, [Culture], ContextId) AS (
    SELECT c.MsgId, c.Translation, c.[Culture], 'Global'
	FROM [dbo].[Translations]
	CROSS APPLY
	(
	  values
	  ([KeyName], [EnglishValue], 'en-US'),
	  ([KeyName], [GermanValue], 'de-DE'),
	  ([KeyName], [FrenchValue], 'fr-CA'),
	  ([KeyName], [PortugeseValue], 'pt-PT'),
	  ([KeyName], [SpanishValue], 'es-MX')
	) c (MsgId, Translation, Culture)
	INNER JOIN [dbo].[LocalizationRecords] existing
	ON existing.MsgId = c.MsgId
		AND existing.Culture = c.Culture
		AND existing.ContextId = 'Global'
)
UPDATE l
   SET [Translation] = cte.Translation
FROM [dbo].[LocalizationRecords] l
JOIN cte
  ON l.MsgId = cte.MsgId
    AND l.Culture = cte.Culture
    AND l.ContextId = cte.ContextId
-- Insert new records
INSERT INTO [dbo].[LocalizationRecords]
           ([MsgId]
           ,[Translation]
           ,[Culture]
           ,[ContextId])
(
	SELECT c.MsgId, c.Translation, c.[Culture], 'Global'
	FROM [dbo].[Translations]
	CROSS APPLY
	(
	  values
	  ([KeyName], [EnglishValue], 'en-US'),
	  ([KeyName], [GermanValue], 'de-DE'),
	  ([KeyName], [FrenchValue], 'fr-CA'),
	  ([KeyName], [PortugeseValue], 'pt-PT'),
	  ([KeyName], [SpanishValue], 'es-MX')
	) c (MsgId, Translation, Culture)
	LEFT JOIN [dbo].[LocalizationRecords] existing
	ON existing.MsgId = c.MsgId
		AND existing.Culture = c.Culture
		AND existing.ContextId = 'Global'
	WHERE existing.Id IS NULL
)
-- Delete useless records
DELETE FROM [LocalizationRecords] WHERE Translation IS NULL

Now that we have our data stored in our target table, we will next query it to produce results that can be used for our final stage.

Extract data for resource files

We can now use our LocalizationRecords table data to produce text that can be copied and pasted into our individual resource files. If you open up a .resx file using your favorite text editor, you will find that it is simply an XML document.  All we need to do is write some SQL to replicate the format of the <data> elements that follow the <resheader> elements and typically start around line 120.

The following query can be adapted and used for each .resx file that requires updating. We can use ASCII codes in our query to preserve exact spacing:

  • Char(10) – New Line / Line Break
  • Char(13) – Carriage Return
  • Char(9) – Tab
SELECT 
	'  <data name="' 
	+ [MsgId] 
	+ '" xml:space="preserve">' + CHAR(13) + CHAR(10)
	+ '    <value>'
	+ [Translation]
	+ '</value>' + CHAR(13) + CHAR(10)
	+ '  </data>'
FROM [dbo].[LocalizationRecords]
WHERE [Culture] = 'de-DE' 
  AND [ContextId] = 'Global'
ORDER BY MsgId

Once we have our results, we will want to save them to a text file using SQL Server Management Studio. However, before doing so, be sure to enable an option to retain the carriage return and line feed characters in our output. This can be accessed through the File | Options menu and then expanding Query Results | SQL Server | Results to Grid.

With the ‘Retain CR/LF on copy or save’ option enabled, we can now save our results.

It is easiest if we use a text file format as opposed to a CSV format for this step.

Open the output in the saved text file using your favorite text editor. Finally, copy/paste the results of the query into the appropriate place within the XML of your open resource file. Repeated this process for each culture and .resx file.

Congratulations! You now have a relatively painless way to convert those initial translations provided in a spreadsheet into resource files your application can use. Also, by storing these values in a structured database table, your application is well-positioned for its next phase. With a little added UI functionality, users will be able to maintain the application’s translated values. Correspondingly, the application can be adapted to access values in our database table instead of using embedded .resx files.