使用 Javascript/Windows 批处理文件混合将非 ASCII 字符编码为 HTML [英] Encode non-ASCII characters to HTML using Javascript/Windows batch file hybrid

查看:25
本文介绍了使用 Javascript/Windows 批处理文件混合将非 ASCII 字符编码为 HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要复制这个网站的确切功能http://www.unicodetools.com/unicode/convert-to-html.php 在混合 Javascript/Windows 批处理脚本中执行.我对 Javascript 的了解为零,但似乎是最简单的(对于那些知识渊博的人)用文本文件中的 HTML 实体等效项替换特殊非 ASCII 字符的可能方法:têxt"到têxt",例如,但使用输入和输出文本文件而不是网络表单.我已经看到了 JREPL.bat(一个正则表达式/查找和替换工具)的奇迹,所以我认为这可以实现.

I need to replicate the exact function this website http://www.unicodetools.com/unicode/convert-to-html.php does in a hybrid Javascript/Windows batch script. I have zero knowledge about Javascript but it seems it is the easiest (for those knowledgeable) possible way to replace special non-ASCII characters with their HTML entity equivalents within text files: "têxt" to "têxt", for example, but using input and output text files instead of web forms. I've seen the wonders JREPL.bat (a regex/find and replace tool) does so I thought this could be achieved.

请原谅我提出这个问题,但这是我好几天都无法解决的问题的一部分.关于这个悬而未决的问题,https://stackoverflow.com/questions/35121949/curl-data-urlencode-posts-broken-non-english-characters.我发现文本文件中的日语和其他 UTF-8 字符可以通过 CURL post 请求传递,而不会出现乱码,首先在 --data-urlencode 部分之前将它们编码为 HTML 代码.

Pardon me for asking this question but this is part of a problem I could not wrap my head around for days. It is in regard to this unanswered question, https://stackoverflow.com/questions/35121949/curl-data-urlencode-posts-broken-non-english-characters. I figured out that the Japanese and other UTF-8 characters in the text file can be passed through CURL post request without being garbled by first encoding them to HTML code before the --data-urlencode part.

也就是说,我想问一下是否有人会创建一个简单的 JScript/Windows 批处理脚本混合体,其中包含上述网站用于编码非 ASCII 的 Javascript 代码文本文件中的 HTML 实体的字符,我可以使用这样的一行代码从另一个批处理文件调用:

That said, I am kindly asking if someone would be so kind as to create a simple JScript/Windows batch script hybrid incorporating the Javascript code the above-mentioned website uses to encode only non-ASCII characters to HTML entities within a text file which I can call from another batch file using a line of code like this:

CALL EncodetoHTML.bat -i "input.txt" -o "output.txt"

推荐答案

我编写了自己的脚本.我花了一整天时间在互联网上搜索我能找到的有用代码段,然后将它们组合起来以达到我想要的效果.

I wrote my own script. It took me a whole day basically scouring the Internet for useful pieces of code I could find and combining them to achieve the effect I wanted.

将下面的代码保存到 tohtmlent.bat.从 CMD 中使用它,例如 tohtmlent.bat filename.txt 或从另一个批处理文件中调用它,例如 call tohtmlent.bat filename.txt 其中filename.txt"是输入文件.输出将显示在控制台中,因此如果要将输出通过管道传输到文件,请使用 >.输入文件应严格以 UTF-8 编码.输出是 ANSI.该脚本的作用是将十进制范围为 128 及更高的所有 Unicode 字符转换为其等效的数字 HTML 实体.

Save the code below to tohtmlent.bat. Use it from CMD like tohtmlent.bat filename.txt or call it from another batch file like call tohtmlent.bat filename.txt where "filename.txt" is the input file. Output will be displayed in the console so use > if you would like to pipe the output to a file. The input file should strictly be encoded in UTF-8. Output is ANSI. What the script does is it converts all Unicode characters with decimal range 128 and higher to their numeric HTML entity equivalents.

考虑到我不是程序员,代码远非优雅,它仍然有更多的改进空间.但是,嘿,它完成了它的工作!

The code is nowhere near elegant considering I am not a programmer and it still has a lot more room for improvement. But hey, it does its job!

@if (@X)==(@Y) @end /*
@echo off
cscript //E:JScript //nologo "%~f0" %*
exit /b 0
*/

if (WScript.Arguments.Length < 1 ) {
    WScript.Echo("No file specified.");
    WScript.Quit(0)
}

var inputFile = WScript.Arguments.Item(0);
var fso= new ActiveXObject("Scripting.FileSystemObject");
var inputFile=WScript.Arguments.Item(0);

if (!fso.FileExists(inputFile)){
    WScript.Echo(inputFile + " does not exist.");
    WScript.Quit(1);
}

var objAdoS = WScript.CreateObject("ADODB.Stream");
objAdoS.Type = 2;
objAdoS.CharSet = "utf-8";
objAdoS.Open();
objAdoS.LoadFromFile(inputFile);
var strInput = objAdoS.ReadText();
objAdoS.Close();
var strOutput = '';
for(i=0; i<strInput.length; i++){
    if(strInput.charCodeAt(i)>127){ strOutput += '&#' + strInput.charCodeAt(i) + ';'; }else{ strOutput += strInput.charAt(i); }
}
WScript.Echo(strOutput);

这篇关于使用 Javascript/Windows 批处理文件混合将非 ASCII 字符编码为 HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆