I have a hash names hsh that has values that are UTF-8 encoided. For example:
hsh ={:name => some_utf_8_string, :text => :some_other_utf_8_string}
I am currently doing the following:
$KCODE="UTF8"
File.open("save.tsv","w") do{|file|
file.puts hsh.values.map{|x| x.to_s.gsub("\t",' ')}.join("\t")
}
But this croaks randomly because I think some of the multibyte contents sort of match "\t" and it fails. Is there a recommended string I can use instead of "\t" and also is there a better way of doing the above?
Thanks
-
If your data is valid utf8, there is no way for a tab character to "sort of" match part of a multibyte sequence (this is one of the advantages of utf8 over some other multibyte encodings). Can you go into more detail about what you mean by "croak"?
rampion : Logan's right - in UTF8, there are three kinds of bytes - the ones covering 7-bit ascii (0XXXXXXX), the first byte of multi-byte characters (110XXXXX, 1110XXXX, 11110XXX) or a followup byte of a multi-byte character (10XXXXXX). Tab (00000101=0x09) only matches itself, not any part of a multi-byte.
0 comments:
Post a Comment