PinInCS 1.1.0
dotnet add package PinInCS --version 1.1.0
NuGet\Install-Package PinInCS -Version 1.1.0
<PackageReference Include="PinInCS" Version="1.1.0" />
<PackageVersion Include="PinInCS" Version="1.1.0" />
<PackageReference Include="PinInCS" />
paket add PinInCS --version 1.1.0
#r "nuget: PinInCS, 1.1.0"
#:package PinInCS@1.1.0
#addin nuget:?package=PinInCS&version=1.1.0
#tool nuget:?package=PinInCS&version=1.1.0
PinInCS
Towdium/PinIn Java 库的 C# 移植版。用于解决各类汉语拼音匹配问题,提供基于 NFA 的即时匹配和类后缀树的索引匹配。同时支持将汉字转换为拼音字符串(ASCII、Unicode 带声调、注音符号)。
特性
- 极为灵活的简拼组合
- 7 种模糊音选项(zh↔z、sh↔s、ch↔c、ang↔an、ing↔in、eng↔en、u↔v)
- 支持全拼、双拼(小鹤、自然码、搜狗等 8 种布局)、注音(大千)
- 三种搜索器:
TreeSearcher(最快)、CachedSearcher(平衡)、SimpleSearcher(最省内存) - 允许运行时切换配置(模糊音、键盘布局)
- 无第三方依赖
对于"中国",允许的搜索串包括但不限于:
"中国""中guo""zhongguo""zhong国""zhong1国""zh1国""zh国"开启模糊音后还允许"zong国""z国"等。
安装
将 src/PinIn/PinIn.csproj 引用到你的项目中:
<ProjectReference Include="path/to/PinIn.csproj" />
目标框架:.NET Standard 2.1 / .NET 5+
快速开始
using PinInCS;
using PinInCS.Elements;
using PinInCS.Searchers;
using PinInCS.Utils;
var p = new PinInCS.PinInCS();
// ═══════════════════════════════════════
// 1. 即时匹配
// ═══════════════════════════════════════
// 包含匹配
p.Contains("测试文本", "ceshi"); // true
p.Contains("测试文本", "ce4shi4wb"); // true (带声调 + 简拼)
p.Contains("合金炉", "hejinlu"); // true
p.Contains("石头", "stou"); // true (简拼)
// 前缀匹配
p.Begins("测试文本", "ceshi"); // true
p.Begins("测试文本", "wenben"); // false (不在开头)
// 完全匹配
p.Matches("测试", "ce4shi4"); // true
p.Matches("测试", "ceshi"); // false (缺少声调不算完全匹配)
索引搜索器
当需要在大量词条中反复搜索时,使用搜索器比逐条调用 Contains 快数十倍。
var p = new PinInCS.PinInCS();
// ═══════════════════════════════════════
// 2. TreeSearcher — 最快搜索,适合大数据集
// ═══════════════════════════════════════
var tree = new TreeSearcher<int>(Logic.CONTAIN, p);
tree.Put("测试文本", 1);
tree.Put("合金炉", 2);
tree.Put("洗矿场", 3);
tree.Put("流体", 4);
List<int> results = tree.Search("ceshi"); // [1]
results = tree.Search("hejinlu"); // [2]
results = tree.Search("liu"); // [4]
// ═══════════════════════════════════════
// 3. SimpleSearcher — 最省内存
// ═══════════════════════════════════════
var simple = new SimpleSearcher<string>(Logic.CONTAIN, p);
simple.Put("铁锭", "iron_ingot");
simple.Put("铁块", "iron_block");
List<string> items = simple.Search("tie"); // ["iron_ingot", "iron_block"]
// ═══════════════════════════════════════
// 4. CachedSearcher — 平衡速度与内存
// ═══════════════════════════════════════
var cached = new CachedSearcher<int>(Logic.CONTAIN, p);
cached.Put("铜锭", 0);
cached.Put("铜块", 1);
List<int> r = cached.Search("tong"); // [0, 1]
搜索逻辑 (Logic)
| 枚举值 | 含义 | 等价 |
|---|---|---|
Logic.CONTAIN |
部分匹配(子串) | string.Contains |
Logic.BEGIN |
前缀匹配 | string.StartsWith |
Logic.EQUAL |
完全匹配 | string.Equals |
搜索器性能对比
| 搜索器 | 构建 | 搜索 (部分) | 内存 | 适用场景 |
|---|---|---|---|---|
TreeSearcher |
210ms | 0.19ms | 9.5MB | 大数据集、高频搜索 |
SimpleSearcher |
27ms | 9.1ms | 1.8MB | 小数据集、内存敏感 |
CachedSearcher |
28ms | 0.55ms | 可变 | 重复查询多的场景 |
以上数据基于约 37k 词条、400k 字符的测试样本。
配置
模糊音
var p = new PinInCS.PinInCS();
// 开启 sh ↔ s 模糊音
p.CreateConfig().FSh2S(true).Commit();
p.Contains("测试", "cesi"); // true (shi → si)
// 开启多项模糊音
p.CreateConfig()
.FZh2Z(true) // zh ↔ z
.FSh2S(true) // sh ↔ s
.FCh2C(true) // ch ↔ c
.FAng2An(true) // ang ↔ an
.FIng2In(true) // ing ↔ in
.FEng2En(true) // eng ↔ en
.FU2V(true) // u ↔ v (ü)
.Commit();
⚠️ 配置修改后必须调用
.Commit()才会生效。已创建的搜索器会自动感知配置变更。
键盘布局
var p = new PinInCS.PinInCS();
// 小鹤双拼
p.CreateConfig().Keyboard(Keyboard.XIAOHE).Commit();
p.Contains("测试文本", "ceuiwfbf"); // true
// 自然码
p.CreateConfig().Keyboard(Keyboard.ZIRANMA).Commit();
// 注音(大千)
p.CreateConfig().Keyboard(Keyboard.DAQIAN).Commit();
p.Contains("测试文本", "hk4g4jp61p3"); // true
可用键盘布局:
| 常量 | 布局 | 类型 |
|---|---|---|
Keyboard.QUANPIN |
全拼(默认) | 全拼 |
Keyboard.XIAOHE |
小鹤双拼 | 双拼 |
Keyboard.ZIRANMA |
自然码 | 双拼 |
Keyboard.SOUGOU |
搜狗双拼 | 双拼 |
Keyboard.GUOBIAO |
国标双拼 | 双拼 |
Keyboard.MICROSOFT |
微软双拼 | 双拼 |
Keyboard.PINYINPP |
拼音加加 | 双拼 |
Keyboard.ZIGUANG |
紫光双拼 | 双拼 |
Keyboard.DAQIAN |
大千(注音) | 注音 |
加速模式
当需要用相同搜索词对比大量不同字符串时(如遍历列表过滤),开启加速模式可显著提速:
p.CreateConfig().Accelerate(true).Commit();
// 场景:同一个搜索词 "ceshi",对 1000 个不同字符串调用 Contains
foreach (var item in items)
{
if (p.Contains(item, "ceshi")) { /* ... */ }
}
仅在上述场景有效;随机调用模式下加速模式反而会更慢。默认关闭。
拼音格式化
将汉字的拼音转换为不同格式的字符串:
var p = new PinInCS.PinInCS();
var ch = p.GetChar('圆');
var py = ch.Pinyins()[0];
PinyinFormat.NUMBER.Format(py); // "yuan2" — 数字声调
PinyinFormat.RAW.Format(py); // "yuan" — 无声调
PinyinFormat.UNICODE.Format(py); // "yuán" — Unicode 声调符号
PinyinFormat.PHONETIC.Format(py); // "ㄩㄢˊ" — 注音符号
// 也可以通过 PinIn 上下文格式化
p.CreateConfig().Format(PinyinFormat.PHONETIC).Commit();
p.Format(p.GetPinyin("le0")); // "˙ㄌㄜ" — 轻声在前
自定义字典
默认字典覆盖标准 CJK 汉字。如需添加私用区字符或自定义读音:
// 方式一:继承 Default,追加条目
public class MyDictLoader : IDictLoader.Default
{
public override void Load(Action<char, string[]> feed)
{
base.Load(feed); // 加载内置字典
feed('\uE900', new[] { "lu2" }); // 追加自定义字符
feed('圆', new[] { "yuan2", "huan2" }); // 覆盖/追加多音字
}
}
var p = new PinInCS.PinInCS(new MyDictLoader());
// 方式二:完全自定义(不加载内置字典)
public class MinimalLoader : IDictLoader
{
public void Load(Action<char, string[]> feed)
{
feed('你', new[] { "ni3" });
feed('好', new[] { "hao3" });
}
}
搜索器与配置联动
搜索器会自动响应 PinIn 上下文的配置变更:
var p = new PinInCS.PinInCS();
var tree = new TreeSearcher<int>(Logic.CONTAIN, p);
tree.Put("测试文本", 0);
tree.Search("ce4siw").Count; // 0 (默认不模糊)
p.CreateConfig().FSh2S(true).Commit(); // 开启 sh↔s
tree.Search("ce4siw").Count; // 1 (si 匹配 shi)
p.CreateConfig()
.FSh2S(false)
.Keyboard(Keyboard.DAQIAN)
.Commit(); // 切换到注音键盘
tree.Search("hk4g4jp61p3").Count; // 1 (注音编码匹配)
tree.Search("ce4shi4wb").Count; // 0 (全拼在注音模式下不匹配)
API 速查
PinInCS.PinInCS
| 方法 | 说明 |
|---|---|
Contains(s1, s2) |
s1 是否包含拼音 s2 |
Begins(s1, s2) |
s1 是否以拼音 s2 开头 |
Matches(s1, s2) |
s1 是否完全匹配拼音 s2 |
GetChar(c) |
获取字符的拼音信息 |
GetPinyin(s) |
获取拼音对象 |
GetPhoneme(s) |
获取音素对象 |
Format(pinyin) |
用当前格式输出拼音字符串 |
CreateConfig() |
创建配置构建器 |
Keyboard() |
获取当前键盘布局 |
PinInCS.PinInCS.Config
| 方法 | 说明 |
|---|---|
Keyboard(kb) |
设置键盘布局 |
FZh2Z(bool) |
zh ↔ z 模糊音 |
FSh2S(bool) |
sh ↔ s 模糊音 |
FCh2C(bool) |
ch ↔ c 模糊音 |
FAng2An(bool) |
ang ↔ an 模糊音 |
FIng2In(bool) |
ing ↔ in 模糊音 |
FEng2En(bool) |
eng ↔ en 模糊音 |
FU2V(bool) |
u ↔ v (ü) 模糊音 |
Accelerate(bool) |
加速模式 |
Format(fmt) |
设置拼音输出格式 |
Commit() |
提交配置,返回 PinIn 实例 |
ISearcher<T>
| 方法 | 说明 |
|---|---|
Put(name, id) |
添加词条 |
Search(query) |
搜索,返回匹配的标识符列表 |
致谢
- 原始 Java 实现:Towdium/PinIn
- 内置拼音数据来自 地球拼音 和 pinyin-data
- 原理介绍:再谈拼音搜索 系列
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
| .NET Core | netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
| .NET Standard | netstandard2.1 is compatible. |
| MonoAndroid | monoandroid was computed. |
| MonoMac | monomac was computed. |
| MonoTouch | monotouch was computed. |
| Tizen | tizen60 was computed. |
| Xamarin.iOS | xamarinios was computed. |
| Xamarin.Mac | xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.1
- No dependencies.
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.