Java编程中字符编码、乱码产生的原因

2015-07-11 10:23:24 暂无评论为者常成随笔杂文 916

四种编码字符集：
1.ISO8859-1, 单字节编码，最多表示0-255的字符范围，主要表示英文；
2.GBK/GB2312，中文国际编码，双字节编码；
3.UNICODE， java使用的16位标准编码，但不支持ISO8859-1编码，占用空间大，不利于传输与保存；
4.UTF，为了解决UNICODE产生，兼容ISO8859,可以表示所有语言字符，属于不定长编码：（1-6字节），多用于中文网页节省空间；
—-访问本地默认编码：
package com.mldn;
public class EncodeTest
{
        public static void main(String[] args)
        {
                System.out.println(“系统默认编码：” + System.getProperty(“file.encoding”));
        }
}
/*
administrator@xu-desktop:~$ java com.mldn.EncodeTest
系统默认编码：UTF-8
*/
———乱码：不兼容的编码风格：
package com.mldn;
import java.io.UnsupportedEncodingException;
public class CharsetDemo
{
        public static void main(String[] args)
        {
                String str = “中国北京！”;
                try
                {
                        System.out.println(“ISO-8859-1:” + new String(str.getBytes(“ISO-8859-1”)));        // 强制转码：
                        System.out.println(“GBk:” + new String(str.getBytes(“GBK”)));
                        System.out.println(“UNICODE:” + new String(str.getBytes(“UNICODE”)));
                        System.out.println(“GB2312:” + new String(str.getBytes(“GB2312”)));
                        System.out.println(“UTF-8:” + new String(str.getBytes(“UTF-8”)));
                        System.out.println(“US-ASCII:” + new String(str.getBytes(“US-ASCII”)));
                        System.out.println(“UTF-16:” + new String(str.getBytes(“UTF-16”)));
                        System.out.println(“UTF-16BE:” + new String(str.getBytes(“UTF-16BE”)));
                        System.out.println(“UTF-16LE:” + new String(str.getBytes(“UTF-16LE”)));
                }
                catch (UnsupportedEncodingException e)
                {
                        e.printStackTrace();
                }
        }
}
/*
administrator@xu-desktop:~$ java com.mldn.CharsetDemo
ISO8859-1:?????
GBk: