DataLens 0.11.0
See the version list below for details.
dotnet add package DataLens --version 0.11.0
NuGet\Install-Package DataLens -Version 0.11.0
<PackageReference Include="DataLens" Version="0.11.0" />
<PackageVersion Include="DataLens" Version="0.11.0" />
<PackageReference Include="DataLens" />
paket add DataLens --version 0.11.0
#r "nuget: DataLens, 0.11.0"
#:package DataLens@0.11.0
#addin nuget:?package=DataLens&version=0.11.0
#tool nuget:?package=DataLens&version=0.11.0
DataLens
A .NET library for exploratory data analysis and statistical profiling.
Overview
DataLens answers the question: "What's in my data?" — before you clean it, before you model it.
Given a CSV/JSON dataset, DataLens produces comprehensive statistical analysis that helps you understand distributions, relationships, patterns, and anomalies. It combines FilePrepper for data ingestion with UInsight (Rust FFI) for high-performance computation.
Where DataLens Fits
CSV / JSON
│
├── "Understand" → DataLens → Analysis result + JSON
│
├── "Clean" → FilePrepper → Cleaned CSV
│
└── "Predict" → MLoop → Models, predictions
| Tool | Purpose | Input | Output |
|---|---|---|---|
| DataLens | Understand your data | CSV / JSON | Analysis result objects, JSON |
| FilePrepper | Clean & transform data | CSV | Cleaned CSV |
| MLoop | Train & deploy ML models | CSV | ML model, predictions |
DataLens is not a replacement for FilePrepper or MLoop — it's the first step before either of them.
Quick Start
Installation
dotnet add package DataLens
One-Line Analysis
using DataLens;
// Run the full analysis pipeline and write the result as JSON
var analysis = await DataLensEngine.Analyze("manufacturing_data.csv");
await analysis.ToJsonAsync("results.json");
HTML / chart rendering is intentionally out of scope. Pair the
AnalysisResultwith a renderer of your choice (Plotly.NET, ScottPlot, OxyPlot) or a future companion package such asDataLens.Reports.Plotly.
POCO Collections (no file required)
using DataLens;
record Sale(DateTime 주문일자, decimal 금액, string 고객명);
var sales = new List<Sale>
{
new(new DateTime(2026, 1, 1, 0, 0, 0, DateTimeKind.Utc), 1000m, "갑"),
new(new DateTime(2026, 1, 2, 0, 0, 0, DateTimeKind.Utc), 2500m, "을"),
// ...
};
var analysis = await DataLensEngine.Analyze(sales);
IEnumerable<T> is a first-class input. POCO properties are extracted via
reflection ([JsonIgnore] and [DataLensIgnore] are honored). Dictionary-like
inputs (IDictionary<string, object?>, ExpandoObject) are also supported —
keys become column names. Custom selectors and header aliases are available
via EnumerableSourceOptions<T>.
Programmatic Access
using DataLens;
var analysis = await DataLensEngine.Analyze("manufacturing_data.csv");
// Profile (row/column counts, per-column null %, type, basic stats)
Console.WriteLine($"Rows: {analysis.Profile!.RowCount}, Cols: {analysis.Profile.ColumnCount}");
foreach (var col in analysis.Profile.Columns)
{
Console.WriteLine($"{col.Name}: type={col.DataType}, null={col.NullPercentage:F1}%");
}
// Descriptive statistics (mean, std, skew, kurtosis, ...)
foreach (var col in analysis.Descriptive!.Columns)
{
Console.WriteLine($"{col.Name}: mean={col.Mean:F3}, skew={col.Skewness:F3}");
}
// Correlation — high pairs already filtered by AnalysisOptions.CorrelationThreshold
foreach (var pair in analysis.Correlation!.HighCorrelationPairs)
{
Console.WriteLine($"{pair.Column1} ~ {pair.Column2}: r={pair.Value:F3}");
}
// Clusters
var kmeans = analysis.Clusters!.KMeans;
if (kmeans is not null)
{
Console.WriteLine($"K={kmeans.K}, WCSS={kmeans.Wcss:F3}");
foreach (var cluster in kmeans.ClusterSizes)
{
Console.WriteLine($" Cluster {cluster.ClusterId}: {cluster.Size} rows ({cluster.Percentage:F1}%)");
}
}
// Outliers
Console.WriteLine($"Outliers: {analysis.Outliers!.OutlierCount} rows ({analysis.Outliers.OutlierPercentage:F1}%)");
Selecting analyses
Use AnalysisOptions to enable/disable specific analyzers:
var options = new AnalysisOptions
{
IncludeProfiling = true,
IncludeDescriptive = true,
IncludeCorrelation = true,
IncludeClustering = false,
IncludeOutliers = false,
IncludeFeatures = false,
IncludePca = false,
IncludeChangepoints = false,
CorrelationThreshold = 0.8
};
var analysis = await DataLensEngine.Analyze("data.csv", options);
var json = analysis.ToJson(Section.Correlation); // Single-section JSON
Analysis Modules
1. Data Profiling
Per-column overview: type detection, null counts, basic numeric summary.
var profile = await DataLensEngine.Profile("data.csv");
Console.WriteLine($"Rows: {profile.RowCount}, Columns: {profile.ColumnCount}");
foreach (var col in profile.Columns)
{
Console.WriteLine($"{col.Name}: type={col.DataType}, null={col.NullPercentage:F1}%");
}
2. Descriptive Statistics
Full numeric summary per column: count, mean, median, std, variance, Q1/Q3/IQR, skewness, kurtosis.
var analysis = await DataLensEngine.Analyze("data.csv");
foreach (var col in analysis.Descriptive!.Columns)
{
Console.WriteLine($"{col.Name}: mean={col.Mean:F3}, std={col.Std:F3}, skew={col.Skewness:F3}");
}
3. Correlation Analysis
- Pearson correlation matrix over numeric columns
- High-correlation pairs auto-filtered by
AnalysisOptions.CorrelationThreshold - Cramér's V for categorical associations
var corr = analysis.Correlation!;
foreach (var pair in corr.HighCorrelationPairs)
{
Console.WriteLine($"{pair.Column1} ~ {pair.Column2}: r={pair.Value:F3}");
}
4. Regression Analysis
Per-feature simple regression against AnalysisOptions.TargetColumn — one
RegressionEntry per feature.
var options = new AnalysisOptions { TargetColumn = "S_OutputPower", IncludeRegression = true };
var analysis = await DataLensEngine.Analyze("data.csv", options);
var regression = analysis.Regression!;
foreach (var entry in regression.Entries)
{
Console.WriteLine($"{entry.FeatureColumn}: slope={entry.Slope:F4}, R²={entry.RSquared:F4}");
}
5. Cluster Analysis
K-Means (with auto-K via Gap statistic), DBSCAN, Hierarchical, HDBSCAN.
var clusters = analysis.Clusters!;
Console.WriteLine($"Optimal K={clusters.OptimalK}");
if (clusters.KMeans is { } km)
{
foreach (var cluster in km.ClusterSizes)
{
Console.WriteLine($"Cluster {cluster.ClusterId}: {cluster.Size} rows");
}
}
6. Outlier Detection
Isolation Forest, LOF, and Mahalanobis distance.
var outliers = analysis.Outliers!;
Console.WriteLine($"Outliers: {outliers.OutlierCount} rows ({outliers.OutlierPercentage:F1}%)");
if (outliers.IsolationForest is { } iso)
{
Console.WriteLine($" IsolationForest: {iso.AnomalyCount} anomalies (threshold={iso.Threshold:F3})");
}
7. Feature Importance
ANOVA F-test, mutual information, and permutation importance against a target column.
var report = await DataLensEngine.FeatureImportance("data.csv", target: "Machining_Process");
foreach (var feat in report.Importance!.Scores)
{
Console.WriteLine($" {feat.Name}: {feat.Score:F4}");
}
8. Dimensionality Reduction (PCA)
var pca = analysis.Pca!;
Console.WriteLine($"Components: {pca.NComponents}, total variance explained: {pca.TotalExplainedVariance:P1}");
for (int i = 0; i < pca.ExplainedVariance.Length; i++)
{
Console.WriteLine($" PC{i + 1}: {pca.ExplainedVariance[i]:P1}");
}
9. Changepoint Detection
PELT-based changepoint detection (multivariate, configurable cost function).
var options = new AnalysisOptions
{
IncludeChangepoints = true,
ChangepointCost = 1, // 0=L2 mean, 1=Normal mean+variance
ChangepointMinSegmentLength = 10
};
var analysis = await DataLensEngine.Analyze("timeseries.csv", options);
Output
DataLens emits results as JSON. Chart / HTML rendering is delegated to renderer packages (see Out of Scope).
// Full result
var json = analysis.ToJson();
await analysis.ToJsonAsync("results.json");
// Section-scoped JSON
var corrJson = analysis.ToJson(Section.Correlation);
Section members: Profile, Descriptive, Correlation, Regression,
Clusters, Outliers, Distribution, Features, Pca, Changepoints.
Architecture
┌──────────────────────────────────────────┐
│ DataLens (C# .NET) │
│ │
│ ┌──────────┐ ┌───────────────────────┐ │
│ │ Analysis │ │ JSON Serializer │ │
│ │ Pipeline │ │ (renderer-agnostic) │ │
│ │ │ │ │ │
│ └─────┬────┘ └───────────┬───────────┘ │
│ │ │ │
├────────┴───────────────────┴─────────────┤
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ FilePrepper │ │ UInsight (C#) │ │
│ │ (C# native) │ │ ↓ FFI │ │
│ │ │ │ UInsight (Rust) │ │
│ │ • CSV / JSON │ │ │ │
│ │ • DataFrame │ │ • Statistics │ │
│ │ • Type detect │ │ • Correlation │ │
│ │ │ │ • Clustering │ │
│ └──────────────┘ │ • PCA │ │
│ │ • Outlier detect │ │
│ │ • Regression │ │
│ │ • Changepoints │ │
│ └──────────────────┘ │
└──────────────────────────────────────────┘
Integration with iyulab Tools
FilePrepper → DataLens
DataLens uses FilePrepper internally for CSV/JSON ingestion via CsvBridge.
For pre-cleaning, run a FilePrepper pipeline and feed the resulting CSV to
DataLens (or pass a DataFrame directly):
using FilePrepper.Pipeline;
var pipeline = await DataPipeline.FromCsvAsync("raw_data.csv");
// ... apply FilePrepper transforms ...
var df = pipeline.ToDataFrame();
var analysis = await DataLensEngine.Analyze(df);
DataLens → MLoop
DataLens analysis results can guide MLoop training decisions:
var options = new AnalysisOptions { TargetColumn = "target_column", IncludeFeatures = true };
var analysis = await DataLensEngine.Analyze("train.csv", options);
// Top features by ANOVA F-score
var topByAnova = analysis.Features!.Anova!.Features
.OrderByDescending(f => f.FStatistic)
.Take(15);
// High-correlation pairs (multicollinearity hints)
foreach (var pair in analysis.Correlation!.HighCorrelationPairs)
{
Console.WriteLine($"{pair.Column1} ~ {pair.Column2}: r={pair.Value:F3}");
}
// Then proceed to MLoop with confidence:
// mloop train datasets/train.csv target_column --time 120
Scope & Non-Goals
In Scope:
- Exploratory data analysis (EDA)
- Statistical profiling and summaries
- Relationship and pattern discovery (correlation, clustering, PCA)
- Outlier and changepoint detection
- JSON output for programmatic consumption
- CSV / JSON ingestion via FilePrepper
Out of Scope:
- Data cleaning / transformation (→ FilePrepper)
- ML model training / prediction (→ MLoop)
- Deep learning (CNN, LSTM, Autoencoder)
- Real-time streaming analysis
- Interactive notebook environments
- HTML / chart rendering — pair
AnalysisResultwith a renderer of your choice (Plotly.NET, ScottPlot, OxyPlot, or a future companion package such asDataLens.Reports.Plotly). The core stays JSON-first.
Available:
- Encoding auto-detection in
CsvBridge(FilePrepper 0.7.0+) —new CsvLoadOptions { Encoding = "auto" }(default) detects BOM and falls back to a CP949/EUC-KR/UTF-8 heuristic. Override with explicit codepage names (e.g.,"cp949","euc-kr","utf-8","utf-8-bom"). CLI: pass--encoding cp949to any command. JSON inputs are UTF-8 per RFC 8259.
Every code block in this README is exercised by samples/DataLens.Sample, so
build failures there fail the build. If a snippet here drifts from the actual
API, CI catches it.
Requirements
- .NET 10.0+
- Dependencies:
FilePrepper,UInsight
License
MIT License — Built by iyulab
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- FilePrepper (>= 0.7.0)
- UInsight (>= 0.9.1)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.