EasyCFM.COM ColdFusion Forums / MySQL Related Issues / splitting varchar field data to 2 fields

   Reply to Discussion | New Discussion << previous || next >> 
Posted By Discussion Topic: splitting varchar field data to 2 fields

book mark this topic Printer-friendly Version  send this discussion to a friend  new posts last

asmostad
02-27-2008 @ 3:45 AM
Reply
Edit
Profile
Send P.M.
My Gravatar!
Powered by Gravatar
New Member
Posts: 21
Joined: Nov 2005

I have a database of 66 million consumer addresses. Unfortunately, I have the first and last name in one field. I need to seperate this into 2 fields; first_name and last_name. If there is any way to do this please help!!! I was thinking of somehow looping through the field and taking any text after the first space and putting that into a seperate field (last_name). I understand that not all of my data is formatted FIRSTNAME LASTNAME (ie some of it is FIRSTNAME MIDDLEINITIAL LASTNAME), but I could accept the inconsistencies that would arrise from that or even losing some of the records as a result. Any ideas?

Lossed
02-27-2008 @ 3:38 PM
Reply
Edit
Profile
Send P.M.
My Gravatar!
Powered by Gravatar
Senior Member
Posts: 1095
Joined: Apr 2004

Hi asmostad,

The sheer weight of data any solution needs to lift  probably necessitates not only something that will work but that it's as efficient as possible.

You would probably get the best solution by asking this question on the MySQL forum.

When you have nutted out the best solution, please can you post back in here or link to any discussion of it? It would be great to see what the experts came up with.


**edit** my first thought was using substring index with positive and negative counts around an empty space delimiter, but there is bound to be better ways than that.


Lossed
---------------------------
When the only tool you have is a hammer, everything looks like a nail Smile
-----------------------------

This message was edited by Lossed on 2-27-08 @ 3:42 PM

waynobweno
02-27-2008 @ 5:06 PM
Reply
Edit
Profile
Send P.M.
My Gravatar!
Powered by Gravatar
Junior Member
Posts: 77
Joined: Oct 2007

Here is what i would do... though i have never dealt with 66 million entries,...

set the name to a list and replace the spaces with commas...

<cfset thename = "#replace(nametbl, " ",",","all")#">

Get the first name: #listgeat(thename, 1)# - this will select the first name always...

Now get the length of the list: #listlen(thename)#

if length equals 2: there is only a first and last name
if length equals 3: there is a first middle and last
if length equals 4 and above: I have no clue what country they are from, you gotta figure this one out..

<cfif listlen(thename) eq 2>
   #listgetat(thename, 2)# - this will give you the last name...
<cfelseif listlen(thename) eq 3>
   #listgetat(thename, 3)# - this will give you the last name... and by pass the middle name which would be position two in the list...
</cfif>

Hope that help...

Real Estate CPR
www.realestatecpr.com
Demo: www.realestatecpr.com/demo/
Content=Presentation=Results!!!

This message was edited by waynobweno on 2-27-08 @ 5:07 PM

asmostad
03-14-2008 @ 3:24 PM
Reply
Edit
Profile
Send P.M.
My Gravatar!
Powered by Gravatar
New Member
Posts: 21
Joined: Nov 2005

sorry this took me so long to review, test, and respond. this worked fabulously!!! I was really under the gun with this and you have helped tremendously. really, if i can do anything to recipricate just let me know. i run a list company (and also do all the tech crap with our site and dbs). if you have any marketing needs let me know!

thanks,
adam johnson
president
green apple marketing
866-345-9455 x103
www.greenapplemarketing.net

asmostad
03-20-2008 @ 7:19 PM
Reply
Edit
Profile
Send P.M.
My Gravatar!
Powered by Gravatar
New Member
Posts: 21
Joined: Nov 2005

if anyone is still rading these posts, both solutions above work. i have been using the cf solution, looping through the field as a list. i did replace all the spaces with commas, but i believe that you can also simply set the list delimiter to a space within the cf code, so that step may not be necessary. however, after review the first proposed solution (mysql substring index, which can be read about at http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_substring-index), i think that would be more efficient because it can all be done within mysql and leave the cf server out of it. the problem i ran into using cf was that paging more than 1 million records or so locks up the cf server, so i have been running the code in blocks of 1 million and it takes about an hour per million. i am assuming that it would go faster running it all in mysql with the substring index code.

at any rate, they both work and it has helped me to complete my project.

Lossed
03-21-2008 @ 4:27 PM
Reply
Edit
Profile
Send P.M.
My Gravatar!
Powered by Gravatar
Senior Member
Posts: 1095
Joined: Apr 2004

Tks for posting back, asmostad.
That's a serious amount of heavy lifting Smile
I hope it all works out well.
This is OT, but I am hoping you may be able to help:

I am trying to learn what issues may crop up when I need to email, say, 10000 opt-in subscribers an email alert or two, all at the same time. Most of the subscriber emails will be gmail/hotmail and I'm worried about them thinking it's spam and blacklisting the site and subscribers complaining they never got their email alerts.

Or what if some subscribers mark their alerts as spam even though they opted into the subscription? Will this lead to a blacklisting and ruin it for all the other subscribers using the same email service?

With 66 million consumer records, you may have come across this sort of thing before. If you haven't, or haven't the time, that's fine. I just thought I'd ask.

The link to the forum question is here:
http://www.easycfm.com/forums/viewmessages.cfm?Forum=8&Topic=13229

TIA.

Lossed
---------------------------
When the only tool you have is a hammer, everything looks like a nail Smile
-----------------------------

This message was edited by Lossed on 3-21-08 @ 4:29 PM

cfSearchin
03-24-2008 @ 5:55 PM
Reply
Edit
Profile
Send P.M.
My Gravatar!
Powered by Gravatar
Senior Member
Posts: 608
Joined: Feb 2008

quote:


however, after review the first proposed solution (mysql substring index, which can be read about at http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_substring-index), i think that would be more efficient because it can all be done within mysql and leave the cf server out of it.



Yes.  The basic logic is sound, but updating that amount of data should definitely take place in the database itself, not using CF or looping.

Breaking the updates into smaller chunks (even smaller than 1 million) is also a good idea.




Website Designed and Developed by Pablo Varando.