Discussion:
[GTALUG] looking for a tool to transform table relationships
Alex Volkov via talk
2018-07-20 17:42:27 UTC
Permalink
Hey Everyone,

I'm looking for a tool to transform (possibly migrate) data from one
postgres db to another.

I have two postgres databases -- old (not normalized) has all the data
and new (normalized with some major schema changes) has no data.

The new db has more tables and table relationship structured
differently, some data residing in old db is in text column in csv
format, whereas in new database it's a separate table, and so on.

I've been thinking of writing a script that would just transform data
from one format to the other, but before writing a bunch of code that's
going to be run exactly once, I'm wondering if there a tool out there
which I can use to express one-way transformation rules for these databases.


Thanks,


Alex.

---
Talk Mailing List
***@gtalug.org
http
D. Hugh Redelmeier via talk
2018-07-20 19:35:21 UTC
Permalink
| From: Alex Volkov via talk <***@gtalug.org>

| I'm looking for a tool to transform (possibly migrate) data from one postgres
| db to another.

Super naive question:

Could you not do this transformation in SQL?
---
Talk Mailing List
***@gtalug.org
https://gtalug.org/mailman/listin
Alex Volkov via talk
2018-07-20 19:40:50 UTC
Permalink
Yes, now as I've done a bit more research just doing SQL with dblink
might be the way to go. The thing I don't like is SQL for text
transformation can be pretty awkward.
Post by D. Hugh Redelmeier via talk
| I'm looking for a tool to transform (possibly migrate) data from one postgres
| db to another.
Could you not do this transformation in SQL?
---
Talk Mailing List
https://gtalug.org/mailman/listinfo/talk
---
Talk Mailing List
Clifford Ilkay via talk
2018-07-20 20:28:17 UTC
Permalink
Post by Alex Volkov via talk
Yes, now as I've done a bit more research just doing SQL with dblink
might be the way to go. The thing I don't like is SQL for text
transformation can be pretty awkward.
PostgreSQL has great text manipulation functions actually. What is an
example of a text field you want to transform?

You might find something useful here: <https://github.com/pawl/awesome-etl>,
though if this is a one-off, it may be quicker to write your own scripts to
do this. If your dump file is a comma or tab delimited file, you can easily
open it in your spreadsheet of choice and write a formula to transform
data. I've done this many times.

Regards,

Clifford Ilkay

+1 647-778-8696
D. Hugh Redelmeier via talk
2018-07-20 20:44:32 UTC
Permalink
| From: Alex Volkov via talk <***@gtalug.org>

| On 2018-07-20 03:35 PM, D. Hugh Redelmeier via talk wrote:

| > Super naive question:

I should disclose just how naive. I've essentially never used SQL.
OK, perhaps a query or two when puzzled by MythTV.

| > Could you not do this transformation in SQL?
|
| Yes, now as I've done a bit more research just doing SQL with dblink might be
| the way to go. The thing I don't like is SQL for text transformation can be
| pretty awkward.

Surely there is ssome way to use these functions
<https://www.postgresql.org/docs/9.4/static/functions-string.html>

Or PL/psSQL or PL/python or PL/java etc.
---
Talk Mailing List
***@gtalug.org
ht
Christopher Browne via talk
2018-07-20 20:34:17 UTC
Permalink
Post by Alex Volkov via talk
Hey Everyone,
I'm looking for a tool to transform (possibly migrate) data from one
postgres db to another.
I have two postgres databases -- old (not normalized) has all the data
and new (normalized with some major schema changes) has no data.
The new db has more tables and table relationship structured
differently, some data residing in old db is in text column in csv
format, whereas in new database it's a separate table, and so on.
I've been thinking of writing a script that would just transform data
from one format to the other, but before writing a bunch of code that's
going to be run exactly once, I'm wondering if there a tool out there
which I can use to express one-way transformation rules for these databases.
I have done this a few times ;-)

The first time, it was pretty much "scripts manipulating text files"...

These days, I seek to get the data out in some sort of "logical form",
where, as much as possible, internal details such as object IDs get
thrown out, and the data then gets loaded into some tables, and then
cleansed and loaded into the target schema.

Often, what happens is that the data files represent data that I
receive from someone else (that used to operate a domain name
registry...), and so the notion of using dblink isn't ever in my head.

But then, in effect, the data flows...

- Loaded into raw_this, and raw_that, and raw_other...

- Some cleansing is done to make sure that invalid data is
fixed or thrown out, giving me tables cleansed_this, cleansed_that,
cleansed_other...

Finally, the data from the "cleansed" tables gets loaded into the
new schema essentially via a series of INSERT statements.

By having that intermediate stage, you have some ability to
audit the data, to know that all the data loaded was in good
form, or to have specific identification of data discarded because
it was bad., or to identify all the data that got modified to
clean it up (e.g. - I have lots of postal addresses, so have to
clean up country codes and the like).
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
---
Talk Mailing List
***@gtalug.org
Alex Volkov via talk
2018-07-20 22:17:52 UTC
Permalink
Hey Chris,

Thank you for your reply, this is the way to do this task properly. In
my case I don't need to clean any data, it already comes validated from
the db, so I might get away with using dblink and some functions, if
that fails, I'll do dump/restore into temporary tables.

So far I figured out the right package that's has dblink.so (used to be
dblink.sql), then I activated it from under postgres user with CREATE
EXTENSON dblink;
Now I figured out the syntax of dblink command that allows me to connect
via socket, it's --  FROM dblink('dbname=fmcc host=/var/run/postgresql
port=5432 password=test'...

The only problem is I'm getting this interesting error, for some reason
dblink *really wants to do password verification*

ERROR:  password is required
DETAIL:  Non-superuser cannot connect if the server does not request a
password.
HINT:  Target server's authentication method must be changed.
********** Error **********

ERROR: password is required
SQL state: 2F003
Detail: Non-superuser cannot connect if the server does not request a
password.
Hint: Target server's authentication method must be changed.

Alex.
Post by Christopher Browne via talk
Post by Alex Volkov via talk
Hey Everyone,
I'm looking for a tool to transform (possibly migrate) data from one
postgres db to another.
I have two postgres databases -- old (not normalized) has all the data
and new (normalized with some major schema changes) has no data.
The new db has more tables and table relationship structured
differently, some data residing in old db is in text column in csv
format, whereas in new database it's a separate table, and so on.
I've been thinking of writing a script that would just transform data
from one format to the other, but before writing a bunch of code that's
going to be run exactly once, I'm wondering if there a tool out there
which I can use to express one-way transformation rules for these databases.
I have done this a few times ;-)
The first time, it was pretty much "scripts manipulating text files"...
These days, I seek to get the data out in some sort of "logical form",
where, as much as possible, internal details such as object IDs get
thrown out, and the data then gets loaded into some tables, and then
cleansed and loaded into the target schema.
Often, what happens is that the data files represent data that I
receive from someone else (that used to operate a domain name
registry...), and so the notion of using dblink isn't ever in my head.
But then, in effect, the data flows...
- Loaded into raw_this, and raw_that, and raw_other...
- Some cleansing is done to make sure that invalid data is
fixed or thrown out, giving me tables cleansed_this, cleansed_that,
cleansed_other...
Finally, the data from the "cleansed" tables gets loaded into the
new schema essentially via a series of INSERT statements.
By having that intermediate stage, you have some ability to
audit the data, to know that all the data loaded was in good
form, or to have specific identification of data discarded because
it was bad., or to identify all the data that got modified to
clean it up (e.g. - I have lots of postal addresses, so have to
clean up country codes and the like).
---
Talk Mailing List
***@gtalug.org
William Park via talk
2018-07-21 04:05:37 UTC
Permalink
This is an open-ended question, where 5 peoples will give you 7 answers.
I would say,
- extract old data
- see what you have to do to reformat/trim/merge them into new
structure. You said, data structure is different, not just data
format.
- insert new data
But, you already know that.
--
Post by Alex Volkov via talk
Hey Everyone,
I'm looking for a tool to transform (possibly migrate) data from one
postgres db to another.
I have two postgres databases -- old (not normalized) has all the data and
new (normalized with some major schema changes) has no data.
The new db has more tables and table relationship structured differently,
some data residing in old db is in text column in csv format, whereas in new
database it's a separate table, and so on.
I've been thinking of writing a script that would just transform data from
one format to the other, but before writing a bunch of code that's going to
be run exactly once, I'm wondering if there a tool out there which I can use
to express one-way transformation rules for these databases.
Thanks,
Alex.
---
Talk Mailing List
https://gtalug.org/mailman/listinfo/talk
---
Talk Mailing List
***@gtalug.org
https://gtalug.org/mailm
Steve Petrie, P.Eng. via talk
2018-07-21 10:55:13 UTC
Permalink
You could also consult the PostgreSQL users email forum pgsql-***@lists.postgresql.org

Needs registration, but when I was receiving the forum messages, I noticed that participants were always promptly helpful to all comers. But sometimes there is a LOT of email traffic.

I dropped out from this PosthreSQL forum a while ago, just because I chose another direction for my database application. Nothing wrong with PostgreSQL at all -- it's a great product.

Steve

----- Original Message -----
From: Alex Volkov via talk
To: GTALUG Talk
Sent: Friday, July 20, 2018 1:42 PM
Subject: [GTALUG] looking for a tool to transform table relationships


Hey Everyone,

I'm looking for a tool to transform (possibly migrate) data from one
postgres db to another.

I have two postgres databases -- old (not normalized) has all the data
and new (normalized with some major schema changes) has no data.

The new db has more tables and table relationship structured
differently, some data residing in old db is in text column in csv
format, whereas in new database it's a separate table, and so on.

I've been thinking of writing a script that would just transform data
from one format to the other, but before writing a bunch of code that's
going to be run exactly once, I'm wondering if there a tool out there
which I can use to express one-way transformation rules for these databases.


Thanks,


Alex.

---
Talk Mailing List
***@gtalug.org
https://gtalug.org/mailman/listinfo/talk

Loading...