DataLens 0.11.0

There is a newer version of this package available.
See the version list below for details.
dotnet add package DataLens --version 0.11.0
                    
NuGet\Install-Package DataLens -Version 0.11.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="DataLens" Version="0.11.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="DataLens" Version="0.11.0" />
                    
Directory.Packages.props
<PackageReference Include="DataLens" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add DataLens --version 0.11.0
                    
#r "nuget: DataLens, 0.11.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package DataLens@0.11.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=DataLens&version=0.11.0
                    
Install as a Cake Addin
#tool nuget:?package=DataLens&version=0.11.0
                    
Install as a Cake Tool

DataLens

NuGet NuGet Downloads Build License: MIT

A .NET library for exploratory data analysis and statistical profiling.

Overview

DataLens answers the question: "What's in my data?" — before you clean it, before you model it.

Given a CSV/JSON dataset, DataLens produces comprehensive statistical analysis that helps you understand distributions, relationships, patterns, and anomalies. It combines FilePrepper for data ingestion with UInsight (Rust FFI) for high-performance computation.

Where DataLens Fits

CSV / JSON
  │
  ├── "Understand" → DataLens    → Analysis result + JSON
  │
  ├── "Clean"      → FilePrepper → Cleaned CSV
  │
  └── "Predict"    → MLoop       → Models, predictions
Tool Purpose Input Output
DataLens Understand your data CSV / JSON Analysis result objects, JSON
FilePrepper Clean & transform data CSV Cleaned CSV
MLoop Train & deploy ML models CSV ML model, predictions

DataLens is not a replacement for FilePrepper or MLoop — it's the first step before either of them.

Quick Start

Installation

dotnet add package DataLens

One-Line Analysis

using DataLens;

// Run the full analysis pipeline and write the result as JSON
var analysis = await DataLensEngine.Analyze("manufacturing_data.csv");
await analysis.ToJsonAsync("results.json");

HTML / chart rendering is intentionally out of scope. Pair the AnalysisResult with a renderer of your choice (Plotly.NET, ScottPlot, OxyPlot) or a future companion package such as DataLens.Reports.Plotly.

POCO Collections (no file required)

using DataLens;

record Sale(DateTime 주문일자, decimal 금액, string 고객명);

var sales = new List<Sale>
{
    new(new DateTime(2026, 1, 1, 0, 0, 0, DateTimeKind.Utc), 1000m, "갑"),
    new(new DateTime(2026, 1, 2, 0, 0, 0, DateTimeKind.Utc), 2500m, "을"),
    // ...
};

var analysis = await DataLensEngine.Analyze(sales);

IEnumerable<T> is a first-class input. POCO properties are extracted via reflection ([JsonIgnore] and [DataLensIgnore] are honored). Dictionary-like inputs (IDictionary<string, object?>, ExpandoObject) are also supported — keys become column names. Custom selectors and header aliases are available via EnumerableSourceOptions<T>.

Programmatic Access

using DataLens;

var analysis = await DataLensEngine.Analyze("manufacturing_data.csv");

// Profile (row/column counts, per-column null %, type, basic stats)
Console.WriteLine($"Rows: {analysis.Profile!.RowCount}, Cols: {analysis.Profile.ColumnCount}");
foreach (var col in analysis.Profile.Columns)
{
    Console.WriteLine($"{col.Name}: type={col.DataType}, null={col.NullPercentage:F1}%");
}

// Descriptive statistics (mean, std, skew, kurtosis, ...)
foreach (var col in analysis.Descriptive!.Columns)
{
    Console.WriteLine($"{col.Name}: mean={col.Mean:F3}, skew={col.Skewness:F3}");
}

// Correlation — high pairs already filtered by AnalysisOptions.CorrelationThreshold
foreach (var pair in analysis.Correlation!.HighCorrelationPairs)
{
    Console.WriteLine($"{pair.Column1} ~ {pair.Column2}: r={pair.Value:F3}");
}

// Clusters
var kmeans = analysis.Clusters!.KMeans;
if (kmeans is not null)
{
    Console.WriteLine($"K={kmeans.K}, WCSS={kmeans.Wcss:F3}");
    foreach (var cluster in kmeans.ClusterSizes)
    {
        Console.WriteLine($"  Cluster {cluster.ClusterId}: {cluster.Size} rows ({cluster.Percentage:F1}%)");
    }
}

// Outliers
Console.WriteLine($"Outliers: {analysis.Outliers!.OutlierCount} rows ({analysis.Outliers.OutlierPercentage:F1}%)");

Selecting analyses

Use AnalysisOptions to enable/disable specific analyzers:

var options = new AnalysisOptions
{
    IncludeProfiling   = true,
    IncludeDescriptive = true,
    IncludeCorrelation = true,
    IncludeClustering  = false,
    IncludeOutliers    = false,
    IncludeFeatures    = false,
    IncludePca         = false,
    IncludeChangepoints = false,
    CorrelationThreshold = 0.8
};

var analysis = await DataLensEngine.Analyze("data.csv", options);
var json = analysis.ToJson(Section.Correlation); // Single-section JSON

Analysis Modules

1. Data Profiling

Per-column overview: type detection, null counts, basic numeric summary.

var profile = await DataLensEngine.Profile("data.csv");
Console.WriteLine($"Rows: {profile.RowCount}, Columns: {profile.ColumnCount}");
foreach (var col in profile.Columns)
{
    Console.WriteLine($"{col.Name}: type={col.DataType}, null={col.NullPercentage:F1}%");
}

2. Descriptive Statistics

Full numeric summary per column: count, mean, median, std, variance, Q1/Q3/IQR, skewness, kurtosis.

var analysis = await DataLensEngine.Analyze("data.csv");
foreach (var col in analysis.Descriptive!.Columns)
{
    Console.WriteLine($"{col.Name}: mean={col.Mean:F3}, std={col.Std:F3}, skew={col.Skewness:F3}");
}

3. Correlation Analysis

  • Pearson correlation matrix over numeric columns
  • High-correlation pairs auto-filtered by AnalysisOptions.CorrelationThreshold
  • Cramér's V for categorical associations
var corr = analysis.Correlation!;
foreach (var pair in corr.HighCorrelationPairs)
{
    Console.WriteLine($"{pair.Column1} ~ {pair.Column2}: r={pair.Value:F3}");
}

4. Regression Analysis

Per-feature simple regression against AnalysisOptions.TargetColumn — one RegressionEntry per feature.

var options = new AnalysisOptions { TargetColumn = "S_OutputPower", IncludeRegression = true };
var analysis = await DataLensEngine.Analyze("data.csv", options);
var regression = analysis.Regression!;
foreach (var entry in regression.Entries)
{
    Console.WriteLine($"{entry.FeatureColumn}: slope={entry.Slope:F4}, R²={entry.RSquared:F4}");
}

5. Cluster Analysis

K-Means (with auto-K via Gap statistic), DBSCAN, Hierarchical, HDBSCAN.

var clusters = analysis.Clusters!;
Console.WriteLine($"Optimal K={clusters.OptimalK}");
if (clusters.KMeans is { } km)
{
    foreach (var cluster in km.ClusterSizes)
    {
        Console.WriteLine($"Cluster {cluster.ClusterId}: {cluster.Size} rows");
    }
}

6. Outlier Detection

Isolation Forest, LOF, and Mahalanobis distance.

var outliers = analysis.Outliers!;
Console.WriteLine($"Outliers: {outliers.OutlierCount} rows ({outliers.OutlierPercentage:F1}%)");
if (outliers.IsolationForest is { } iso)
{
    Console.WriteLine($"  IsolationForest: {iso.AnomalyCount} anomalies (threshold={iso.Threshold:F3})");
}

7. Feature Importance

ANOVA F-test, mutual information, and permutation importance against a target column.

var report = await DataLensEngine.FeatureImportance("data.csv", target: "Machining_Process");
foreach (var feat in report.Importance!.Scores)
{
    Console.WriteLine($"  {feat.Name}: {feat.Score:F4}");
}

8. Dimensionality Reduction (PCA)

var pca = analysis.Pca!;
Console.WriteLine($"Components: {pca.NComponents}, total variance explained: {pca.TotalExplainedVariance:P1}");
for (int i = 0; i < pca.ExplainedVariance.Length; i++)
{
    Console.WriteLine($"  PC{i + 1}: {pca.ExplainedVariance[i]:P1}");
}

9. Changepoint Detection

PELT-based changepoint detection (multivariate, configurable cost function).

var options = new AnalysisOptions
{
    IncludeChangepoints = true,
    ChangepointCost = 1, // 0=L2 mean, 1=Normal mean+variance
    ChangepointMinSegmentLength = 10
};
var analysis = await DataLensEngine.Analyze("timeseries.csv", options);

Output

DataLens emits results as JSON. Chart / HTML rendering is delegated to renderer packages (see Out of Scope).

// Full result
var json = analysis.ToJson();
await analysis.ToJsonAsync("results.json");

// Section-scoped JSON
var corrJson = analysis.ToJson(Section.Correlation);

Section members: Profile, Descriptive, Correlation, Regression, Clusters, Outliers, Distribution, Features, Pca, Changepoints.

Architecture

┌──────────────────────────────────────────┐
│           DataLens (C# .NET)             │
│                                          │
│  ┌──────────┐  ┌───────────────────────┐ │
│  │ Analysis  │  │ JSON Serializer       │ │
│  │ Pipeline  │  │  (renderer-agnostic)  │ │
│  │           │  │                       │ │
│  └─────┬────┘  └───────────┬───────────┘ │
│        │                   │             │
├────────┴───────────────────┴─────────────┤
│  ┌──────────────┐  ┌──────────────────┐  │
│  │ FilePrepper   │  │ UInsight (C#)    │  │
│  │ (C# native)   │  │ ↓ FFI            │  │
│  │               │  │ UInsight (Rust)  │  │
│  │ • CSV / JSON  │  │                  │  │
│  │ • DataFrame   │  │ • Statistics     │  │
│  │ • Type detect │  │ • Correlation    │  │
│  │               │  │ • Clustering     │  │
│  └──────────────┘  │ • PCA            │  │
│                    │ • Outlier detect │  │
│                    │ • Regression     │  │
│                    │ • Changepoints   │  │
│                    └──────────────────┘  │
└──────────────────────────────────────────┘

Integration with iyulab Tools

FilePrepper → DataLens

DataLens uses FilePrepper internally for CSV/JSON ingestion via CsvBridge. For pre-cleaning, run a FilePrepper pipeline and feed the resulting CSV to DataLens (or pass a DataFrame directly):

using FilePrepper.Pipeline;

var pipeline = await DataPipeline.FromCsvAsync("raw_data.csv");
// ... apply FilePrepper transforms ...
var df = pipeline.ToDataFrame();

var analysis = await DataLensEngine.Analyze(df);

DataLens → MLoop

DataLens analysis results can guide MLoop training decisions:

var options = new AnalysisOptions { TargetColumn = "target_column", IncludeFeatures = true };
var analysis = await DataLensEngine.Analyze("train.csv", options);

// Top features by ANOVA F-score
var topByAnova = analysis.Features!.Anova!.Features
    .OrderByDescending(f => f.FStatistic)
    .Take(15);

// High-correlation pairs (multicollinearity hints)
foreach (var pair in analysis.Correlation!.HighCorrelationPairs)
{
    Console.WriteLine($"{pair.Column1} ~ {pair.Column2}: r={pair.Value:F3}");
}

// Then proceed to MLoop with confidence:
// mloop train datasets/train.csv target_column --time 120

Scope & Non-Goals

In Scope:

  • Exploratory data analysis (EDA)
  • Statistical profiling and summaries
  • Relationship and pattern discovery (correlation, clustering, PCA)
  • Outlier and changepoint detection
  • JSON output for programmatic consumption
  • CSV / JSON ingestion via FilePrepper

Out of Scope:

  • Data cleaning / transformation (→ FilePrepper)
  • ML model training / prediction (→ MLoop)
  • Deep learning (CNN, LSTM, Autoencoder)
  • Real-time streaming analysis
  • Interactive notebook environments
  • HTML / chart rendering — pair AnalysisResult with a renderer of your choice (Plotly.NET, ScottPlot, OxyPlot, or a future companion package such as DataLens.Reports.Plotly). The core stays JSON-first.

Available:

  • Encoding auto-detection in CsvBridge (FilePrepper 0.7.0+) — new CsvLoadOptions { Encoding = "auto" } (default) detects BOM and falls back to a CP949/EUC-KR/UTF-8 heuristic. Override with explicit codepage names (e.g., "cp949", "euc-kr", "utf-8", "utf-8-bom"). CLI: pass --encoding cp949 to any command. JSON inputs are UTF-8 per RFC 8259.

Every code block in this README is exercised by samples/DataLens.Sample, so build failures there fail the build. If a snippet here drifts from the actual API, CI catches it.

Requirements

License

MIT License — Built by iyulab

Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.13.0 92 4/30/2026
0.11.0 95 4/29/2026
0.8.0 98 4/27/2026
0.7.1 103 4/27/2026
0.5.1 92 4/27/2026
0.4.0 120 3/23/2026
0.3.0 278 2/21/2026
0.2.1 111 2/21/2026
0.2.0 102 2/21/2026
0.1.0 116 2/13/2026