This whitepaper covers experience of database migration from Oracle to PostgreSQL made by Intelligent Converters for a large cargo delivery company. The key reason of migration was reducing cost of the database management system and transferring to open-source system with sufficient scalability, security and customization capabilities.
Here is brief specification of the source database:
-
Database server Oracle 12g
-
Database contains 190 tables, 50 GB of data
-
120000 lines of PL/SQL in stored procedures, functions and triggers
The project duration was 2 months: a half month of assessment and planning, one month for migration and a half month for testing. There were 3 specialists involved in the migration: 2 developers and 1 database administrator.
This Oracle to PostgreSQL database migration includes six basic phases:
- Find all Oracle-specific methods of storing and processing data in the source database and the scope of use (investigation and planning phase)
- Select the appropriate tools for schema migration and implement it
- Choose the most suitable method of data migration to decrease Oracle system downtime
- Run the data migration handling all transformations required by PostgreSQL DBMS
- Convert all PL/SQL code into PostgreSQL format (using tools for partial automaton and manual post-processing)
- Run performance and functional tests, fine-tuning of the resulting database
Migration of Table Definitions
There are some data types in Oracle having no direct equivalent in PostgreSQL. One of such types is DATE containing both date and time parts.
PostgreSQL supports:
- date – pure date without time part
- time – pure time without date part with time zone optional specification
- timestamp – date and time with time zone optional specification
There are two options of mapping Oracle dates into PostgreSQL: either use TIMESTAMP or set up extension to use Oracle-style date type oracle.date.
Spatial types also require special attention. Oracle has built-in type SDO_GEOMETRY while PostgreSQL needs PostGIS installation to work with spatial data types.
The table below illustrates Oracle to PostgreSQL safe type mapping:
Oracle |
PostgreSQL |
---|
BINARY_FLOAT |
REAL |
BINARY_INTEGER |
INTEGER |
BINARY_DOUBLE |
DOUBLE PRECISION |
BLOB, RAW(n), LONG RAW |
BYTEA (1GB limit) |
CLOB, LONG |
TEXT (1GB limit) |
DATE |
TIMESTAMP |
NUMBER, NUMBER(*) |
DOUBLE PRECISION or BIGINT if it is a part of Primary Key |
NUMBER(n,0), NUMBER(n) |
n<5 – SMALLINT5<=n<9 – INT9<=n<19 – BIGINTn>=19 – DECIMAL(n) |
NUMBER(p,s) |
DECIMAL(p,s) |
REAL |
DOUBLE PRECISION |
For numeric types it is important to understand the scope of use in database. If it is focused on accuracy, Oracle numeric types must be mapped in PostgreSQL NUMERIC. If the top priority is calculation speed, the best mapping would be REAL or DOUBLE PRECISION.
We used to automate migration of table definitions, indexes and constrains for this project. It maps Oracle types into the most appropriate PostgreSQL equivalents and allows to customize particular type mapping.
Data Migration
Since data migration may consume much time for large databases, it is extremely important to choose right strategy and tools for this step.
There are three common approaches to the data migration:
- snapshot – migrate all data at one step
- piecewise snapshot – migrate data by chunks in parallel threads or processes
- change data replication – continuous loading data by tracking incremental changes
Snapshot method requires essential downtime of the source database for the entire period of reading data to avoid data loss or corruption during migration. The downtime for piecewise snapshot approach is much less but still required. In this migration project we used piecewise snapshot method for migration of large tables containing millions of rows.
So, what about change data replication (CDR) technique?
It has two major implementations and each of them has its weakness. First one is known as trigger-based CDR. It requires creating triggers on insert, updates and delete for each table being replicated in the source database. Those triggers track all changes by storing information about changing events into special ‘history’ table. Based on this data the CDR tool replicates all changes to the target database. Trigger-based CDR causes extra overhead of the source database due to writing into ‘history’ table for every data update.
The second method is called Transaction Log CDR. It uses transaction logs created by Oracle DBMS to track and replicate changes into the target PostgreSQL database. The advantage of this approach compared to trigger-based CDR is that it does not modify the source database.
However, Transaction Log CDR has some weaknesses as well:
- Oracle constantly archives the transaction log files, so CDC tool may lose some changes if does not read transaction log before it is archived.
- In case of losing connection to the target database while replicating changes from transaction log (or any other kind of error), data may be lost or damaged due to lack of control over the transaction log.
The most challenging aspects of data migration from Oracle to PostgreSQL are: specific data formats of the source DBMS having no direct equivalent in the target and the external data.
In the previous section of this article BYTEA is stated as the most suitable PostgreSQL data type for binary data. However, when migrating large binary data (average field size is not less than 10MB), it is not recommended to use BYTEA. The reason is particularity of reading BYTEA data – it only can be extracted in one fragment; piecewise reading is not possible. This may cause essential RAM overhead. PostgreSQL type LARGE OBJECT may be used as workaround for this issue. All values of LARGE OBJECT type are stored in the system table ‘pg_largeobject’ that is a part of each database. There can be up to 4 billion of rows in table ‘pg_largeobject’. Max size of LARGE OBJECT is 4TB and piecewise reading is available.
Also, it is important to properly migrate ROWID that is pseudo-column identifying physical address of record in the table. PostgreSQL has a similar service field called ctid, however it is a direct equivalent of ROWID. says that ctid might be changed due to the vacuuming procedure.
There are three basic methods of emulating ROWID functionality in PostgreSQL:
- Use existing primary key (or create new one) to identify rows instead of ROWID
- Add a serial or bigserial column with auto-generated values and make it a primary/unique key to replace ROWID functionality
- When it is not possible to define a single column primary key, use unique index built over multiple columns (for this purpose determine minimal unique set of fields for every row)
Oracle may link external data that is stored outside the database. However, it can be processed as a regular table using ‘external table’ feature of the DBMS. PostgreSQL uses Foreign Data Wrapper library for the same purpose. For example, it provides extension ‘file_fdw’ to work with external CSV files as a regular table.
Challenges of PL/SQL Migration
This section explores issues that was resolved during migration of Oracle packages, stores procedures, functions and triggers to PostgreSQL.
Packages. PostgreSQL does not have packages. This missing feature can be emulated by grouping all entries belong to one package inside PostgreSQL schema with the same name, global variables can be stored in a service table.
Empty strings. Unlike Oracle, in PostgreSQL empty strings are not equal to NULL. For example, in Oracle:
SQL> select NVL('','This is null') as null_test from dual;
NULL_TEST
------------
This is null
The same in PostgreSQL:
testdb=# select coalesce('','This is null') as null_test;
null_test
-----------
(1 row)
Also, there is a different behavior of string concatenation with NULL in Oracle and PostgreSQL. In Oracle:
SQL> Select 'String' || NULL as cs from dual;
CS
--
String
In PostgreSQL:
synctest=# Select 'String' || NULL as concat_str;
concat_str
------------
(1 row)
Types casting. PostgreSQL requires strict types casting when calling functions, operators or when INSERT and UPDATE statements place the results of expressions into a table.
The workaround is to use pseudo-type ‘anyelement’:
create or replace function my_concat(str1 anyelement, str2 anyelement)
returns varchar
language plpgsql
as $$
begin
return str1::varchar || str2::varchar;
end;
$$;
Note: you can either use one anyelement argument in stored procedure/function or all parameters of the same type anyelement.
Sequences. Syntax of sequence declaration is similar for Oracle and PostgreSQL. For example:
CREATE SEQUENCE SEQ1 START WITH 10 INCREMENT BY 2 MAXVALUE 1000000 CACHE 20 NO CYCLE;
However, referencing the next value of sequence is different. In Oracle it is ‘sequence.nextval’ and in PostgreSQL nextval(‘sequence’).
Triggers. In Oracle the source code of trigger’s body in included in CREATE TRIGGER statement. PostgreSQL requires that trigger source code is compose as standalone function with the reference from CREATE TRIGGER statement:
CREATE OR REPLACE FUNCTION store_changes()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS
$$
BEGIN
IF (NEW.first_name <> OLD.first_name) OR (NEW.last_name <> OLD.last_name) OR (NEW.email <> OLD.email)
THEN
INSERT INTO changes_log(id,changed_on)
VALUES(OLD.id,now());
END IF;
RETURN NEW;
END;
$$
CREATE TRIGGER make_changes
BEFORE UPDATE
ON employees
FOR EACH ROW
EXECUTE PROCEDURE store_changes();
Autonomous Transactions. Autonomous transactions in Oracle allow a subprogram to commit or rollback SQL operations without committing or rolling-back the main transaction. For example, some data is inserted into a table as a part of insert-trigger.
This insert operation must succeed even if main transaction of insert-trigger is failed. In this case the corresponding INSERT statement must be enclosed in autonomous transaction:
CREATE OR REPLACE PROCEDURE insert_critical_data(v_data varchar2)
IS
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
INSERT INTO info_table (data) VALUES (v_data);
commit;
END;
Autonomous Transactions are not supported in PostgreSQL directly. One possible workaround is to refactor the code, so it does not need autonomous transactions.
Otherwise, you can use module ‘dblink‘ of PostgreSQL. With dblink, a new connection is established, and a query is executed using this connection and is immediately committed, irrespective of the main transaction. For example, the following function inserts a row into a table, and the insert operation will be committed even if the calling transaction is rolled back:
CREATE OR REPLACE FUNCTION insert_critical_data(v_data TEXT)
RETURNS void
AS
$BODY$
DECLARE
v_sql text;
BEGIN
PERFORM dblink_connect('myconn',
'dbname=mydbname user=… password=… host=… port=…');
v_sql := format('INSERT INTO info_table (data)
VALUES (%L)', v_data);
PERFORM dblink_exec('myconn', v_sql);
PERFORM dblink_disconnect('myconn');
END;
$BODY$
LANGUAGE plpgsql;
Built-in Functions. Oracle and PostgreSQL provide similar but not equal sets of built-in functions. The table below includes Oracle functions that require porting into PostgreSQL equivalents:
Oracle |
PostgreSQL |
---|
ADD_MONTH($date,$n_month) |
$date + $n_month * interval '1 month' |
DECODE($exp, $when, $then, ...) |
CASE $exp WHEN $when THEN $then ... END |
INSTR($str1, $str2) |
*POSITION($str2 in $str1) |
ROWNUM |
**row_number() over () |
SYSDATE |
CURRENT_DATE |
SYS_GUID() |
uuid_generate_v1() |
*Complex porting of Oracle INSTR function to PostgreSQL can be found at:
Oracle condition ‘where rownum < N’ must be converted into ‘limit N’ in PostgreSQL We used Oracle-to-PostgreSQL Code Converter to partially automate conversion of triggers, stored procedures and functions. It supports conversion of Oracle built-in functions into PostgreSQL equivalents and processes most syntax patterns of PL/SQL. However, complex source code may require manual post-processing of the output.
Useful Modules for Oracle to PostgreSQL Migration
This section contains brief description of PostgreSQL modules that can help to migrate from Oracle to PostgreSQL and test the results of migration.
One of the most important PostgreSQL modules when migrating from Oracle is . It emulates specific types, functions and operators of Oracle database management system.
Extensions ‘pgTAP’ and ‘pg_prove’ can be used for functional testing of PostgreSQL functions. After installing ‘pgTAP’, it creates a lot of stored functions used for writing tests. Test function must be declared as ‘returns setof text’. Then it can be run from psql console client:
psql -h $db_host -p $db_port -U $db_user $db_name -f tests/name_of_test.sql
Detailed information about pgTAP can be found here:
PostgreSQL extension helps to validate PL/pgSQL code. For example, if there is missing variable declaration or misspelled column name inside PostgreSQL function, this will definitely fail during execution. Use capabilities of ‘plpgsql_check’ to diagnose such issues before going to production:
select * from plpgsql_check_function_tb('{name of function}');
Module helps to analyze performance of PL/pgSQL code in functions and stored procedures.
PostgreSQL extension provides connection to Oracle databases through Foreign Data Wrapper feature of DBMS. For example, if Oracle database ‘OCRL’ is running on the server ‘server.mydomain.com’, oracle_fdw must be configured as follows:
CREATE EXTENSION oracle_fdw;
CREATE SERVER oraserver FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver '//server.mydomain.com:1521/ORCL');
GRANT USAGE ON FOREIGN SERVER oraserver TO pguser;
CREATE USER MAPPING FOR pguser SERVER oraserver OPTIONS (user 'oracle user', password 'oracle password');
CREATE FOREIGN TABLE oratable ( { column definitions } ) SERVER oraserver OPTIONS (schema 'ORAUSER', table 'ORATAB');
Then, you can use ‘oratable
’ as a regular PostgreSQL table.