reblGreen.AI.JohnE5
1.0.0
dotnet add package reblGreen.AI.JohnE5 --version 1.0.0
NuGet\Install-Package reblGreen.AI.JohnE5 -Version 1.0.0
<PackageReference Include="reblGreen.AI.JohnE5" Version="1.0.0" />
paket add reblGreen.AI.JohnE5 --version 1.0.0
#r "nuget: reblGreen.AI.JohnE5, 1.0.0"
// Install reblGreen.AI.JohnE5 as a Cake Addin #addin nuget:?package=reblGreen.AI.JohnE5&version=1.0.0 // Install reblGreen.AI.JohnE5 as a Cake Tool #tool nuget:?package=reblGreen.AI.JohnE5&version=1.0.0
reblGreen.AI.JohnE5
JohnE5 is a text/document classification or taxonomization classifier library for .NET and is written in C#. While JohnE5 is primarily designed for text classification, it is quite versatile in the sense that strings passed to the classifier as tokens can be in any form and will be categorized using standard methods. This means that you can train a classification model on integers, hashes, etc... Experimentation is key!
This project is open source so please feel free to contribute suggestions, make modifications and pull requests back to the repository.
What and why is reblGreen.AI.JohnE5?
reblGreen.AI.JohnE5 is an MIT license .NET Standard 2.0 C# Class Library which offers an open-source system for creating/training classification models and categorizing documents or text objects.
Its name is derrived from the original project developer John Earnshaw, and is based on the Johnny 5 character from the Short Circuit movie series. The reasoning behind this is that text classification works similarly to the movie clip where Johhny 5 reads books to learn, "input, more input!".
Weighted Classification
Neive Bayes classification algorithm uses a binary approach usually to train only 2 cattegories. This may work well for classifying in a boolean mannor such as true or false, good or bad, male or female, happy or sad, etc...
The concept behind the weighted classification algorithm used in JohnE5 works with floating point values rather a 1/0 binary approach. It could be similar to the way in which quantum computing works, in the sense that there are more than 2 states.
The more a token is found in a category, the stronger the token becomes when calculating classification. Common words are calculated with a diluted ranking for appearing across many categories which automatically creates a "stop word" system where common tokens have less weight.
A good example of the way that this works is to suggest "Soccer" and "Football", much of the content used to train these categories will most deffinately contain the words "football", "goal", "stadium", etc... Due to this, these tokens will be diluted to 50% of their weight across both categories and would require the existence of a unique or series of unique words to more accurately classify between these two categories. For example, unique words to "Soccer" category may be things like "saved", "goalkeeper", "penalty", etc.. and subsequently words like "helmet" for "Football" category.
How do I get set up?
Take a look at the reblGreen.AI.JohnE5.TestGUI project in the source code. Set this project as the "start up porject" in Visual studio for a GUI which enables you to see how training and classification works.
- Input text into the text box in the application.
- Press the "Classify" button.
- Press the "Add Category" button and create a new classification category.
- Ensure the "train" checkbox is selected on the category you which to train on the input text.
- Press the "Train" button.
- Repeat for more classification categories changing the input text each time.
If you press the "Classify" button a second time on the same input text, you should see the trained category appear in the categories list. This will display the accuracy probability converted into a percentatge to the top right of the category box.
As you train the model with more categories, the text at the top of the example training application may change to tell you which categories have a less token association count and will suggest that you add text to the category containing the least classified tokens.
Since this is a multi-category training model, it is designed for categories to be added in series and in a linear fashion. This means that training a single category to the maximum level before moving on to train a new category will impact the probability of accuracy for the categories trained prior. It is much more accurate to train a category with a single piece of text and then move on to the next category, itterating back to train the first category once all required categories are added.
The core project and examples will be well documented and if you get stuck or have any questions, please contact us and we'll be glad to help out.
For further documentation please see the repository wiki.
Contribution guidelines
- Fork reblGreen.AI.JohnE5, make some changes, make a pull request. Simple!
- Code will be reviewed when a pull request is made.
Who do I talk to?
- reblGreen.AI.JohnE5 repo owner via message or the issues board.
- Or contact us via our website at reblgreen.com.
License
- The MIT License (MIT) - You are free to use reblGreen.AI.JohnE5 in commercial projects and modify/redistribute the source code provided the copyright notice is not removed.
- If you use reblGreen.AI.JohnE5 in your own project we would love to hear about it, so drop us a line (and even a credit to reblGreen.AI.JohnE5 in your project if you feel like being really generous). We would be very happy to hear about your experiences using our reblGreen.AI.JohnE5 class library in your projects and any suggestions you may have for us to make it better.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
.NET Framework | net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen40 was computed. tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.0
- No dependencies.
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated |
---|---|---|
1.0.0 | 540 | 11/27/2019 |