LumenWorksCsvReader2 4.4.0

.NET 6.0 .NET Standard 2.0 .NET Framework 4.8

dotnet add package LumenWorksCsvReader2 --version 4.4.0

NuGet\Install-Package LumenWorksCsvReader2 -Version 4.4.0

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="LumenWorksCsvReader2" Version="4.4.0" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="LumenWorksCsvReader2" Version="4.4.0" />
                    

                            Directory.Packages.props

<PackageReference Include="LumenWorksCsvReader2" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add LumenWorksCsvReader2 --version 4.4.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: LumenWorksCsvReader2, 4.4.0"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package LumenWorksCsvReader2@4.4.0

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=LumenWorksCsvReader2&version=4.4.0
                    

                            Install as a Cake Addin

#tool nuget:?package=LumenWorksCsvReader2&version=4.4.0
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

CSV Reader

The CsvReader library is an extended version of Sébastien Lorion's fast CSV Reader project and provides fast parsing and reading of CSV files

To this end it is a straight drop-in replacement for the existing NuGet package LumenWork.Framework.IO and LumenWorksCsvReader, but with additional capabilities; the other rationale for the project is that the code is not available elsewhere in a public source repository, making it difficult to extend/contribute to.

Welcome to contributions from anyone.

You can see the version history here.

Build the project

Install Fake
In the command line run dotnet fake build

Library License

The library is available under the MIT License, for more information see the License file in the GitHub repository.

Getting Started

A good starting point is to look at Sébastien's article on Code Project.

A basic use of the reader something like this...

    using System.IO;
    using LumenWorks.Framework.IO.Csv;

    void ReadCsv()
    {
        // open the file "data.csv" which is a CSV file with headers
        using (var csv = new CachedCsvReader(new StreamReader("data.csv"), true))
        {
            // Field headers will automatically be used as column names
            myDataGrid.DataSource = csv;
        }
    }

Having said that, there are some extensions built into this version of the library that it is worth mentioning.

Additional Features

Columns

One addition is the addition of a Column list which holds the names and types of the data in the CSV file. If there are no headers present, we default the column names to Column1, Column2 etc; this can be overridden by setting the DefaultColumnHeader property e.g.

    using System.IO;
    using LumenWorks.Framework.IO.Csv;

    void ReadCsv()
    {
        // open the file "data.csv" which is a CSV file with headers
        using (var csv = new CachedCsvReader(new StreamReader("data.csv"), false))
        {
            csv.DefaultColumnHeader = "Fred"

            // Field headers will now be Fred1, Fred2, etc
            myDataGrid.DataSource = csv;
        }
    }

You can specify the columns yourself if there are none, and also specify the expected type; this is especially important when using against SqlBulkCopy which we will come back to later.

    using System.IO;
    using LumenWorks.Framework.IO.Csv;

    void ReadCsv()
    {
        // open the file "data.csv" which is a CSV file with headers
        using (var csv = new CachedCsvReader(new StreamReader("data.csv"), false))
        {
            csv.Columns.Add(new Column { Name = "PriceDate", Type = typeof(DateTime) });
            csv.Columns.Add(new Column { Name = "OpenPrice", Type = typeof(decimal) });
            csv.Columns.Add(new Column { Name = "HighPrice", Type = typeof(decimal) });
            csv.Columns.Add(new Column { Name = "LowPrice", Type = typeof(decimal) });
            csv.Columns.Add(new Column { Name = "ClosePrice", Type = typeof(decimal) });
            csv.Columns.Add(new Column { Name = "Volume", Type = typeof(int) });

            // Field headers will now be picked from the Columns collection
            myDataGrid.DataSource = csv;
        }
    }

SQL Bulk Copy

One use of CSV Reader is to have a nice .NET way of using SQL Bulk Copy (SBC) rather than bcp for bulk loading of data into SQL Server.

A couple of issues arise when using SBC 1. SBC wants the data presented as the correct type rather than as string 2. You need to map between the table destination columns and the CSV if the order does not match exactly

Below is a example using the Columns collection to set up the correct metadata for SBC

	public void Import(string fileName, string connectionString)
	{
		using (var reader = new CsvReader(new StreamReader(fileName), false))
		{
			reader.Columns = new List<LumenWorks.Framework.IO.Csv.Column>
			{
				new LumenWorks.Framework.IO.Csv.Column { Name = "PriceDate", Type = typeof(DateTime) },
				new LumenWorks.Framework.IO.Csv.Column { Name = "OpenPrice", Type = typeof(decimal) },
				new LumenWorks.Framework.IO.Csv.Column { Name = "HighPrice", Type = typeof(decimal) },
				new LumenWorks.Framework.IO.Csv.Column { Name = "LowPrice", Type = typeof(decimal) },
				new LumenWorks.Framework.IO.Csv.Column { Name = "ClosePrice", Type = typeof(decimal) },
				new LumenWorks.Framework.IO.Csv.Column { Name = "Volume", Type = typeof(int) },
				new LumenWorks.Framework.IO.Csv.Column { Name = "IsActive", Type = typeof(bool) },
			};

			// With the help of CustomBooleanReplacer you can define a mapping between string values in the CSV file and boolean values
			// In this example, 'Y' and 'Yes' will be treated as true; 'N' and 'No' - as false value.
			reader.CustomBooleanReplacer = new Dictionary<string, bool>
			{
				{"Y", true},
				{"N", false},
				{"Yes", true},
				{"No", false},
			};

			// Now use SQL Bulk Copy to move the data
			using (var sbc = new SqlBulkCopy(connectionString))
			{
				sbc.DestinationTableName = "dbo.DailyPrice";
				sbc.BatchSize = 1000;

				sbc.AddColumnMapping("PriceDate", "PriceDate");
				sbc.AddColumnMapping("OpenPrice", "OpenPrice");
				sbc.AddColumnMapping("HighPrice", "HighPrice");
				sbc.AddColumnMapping("LowPrice", "LowPrice");
				sbc.AddColumnMapping("ClosePrice", "ClosePrice");
				sbc.AddColumnMapping("Volume", "Volume");
				sbc.AddColumnMapping("IsActive", "IsActive");

				sbc.WriteToServer(reader);
			}
		}
	}

The method AddColumnMapping is an extension I wrote to simplify adding mappings to SBC

	public static class SqlBulkCopyExtensions
	{
		public static SqlBulkCopyColumnMapping AddColumnMapping(this SqlBulkCopy sbc, int sourceColumnOrdinal, int targetColumnOrdinal)
		{
			var map = new SqlBulkCopyColumnMapping(sourceColumnOrdinal, targetColumnOrdinal);
			sbc.ColumnMappings.Add(map);

			return map;
		}

		public static SqlBulkCopyColumnMapping AddColumnMapping(this SqlBulkCopy sbc, string sourceColumn, string targetColumn)
		{
			var map = new SqlBulkCopyColumnMapping(sourceColumn, targetColumn);
			sbc.ColumnMappings.Add(map);

			return map;
		}
	}

One other issue recently arose where we wanted to use SBC but some of the data was not in the file itself, but metadata that needed to be included on every row. The solution was to amend the CSV reader and Columns collection to allow default values to be provided that are not in the data.

The additional columns should be added at the end of the Columns collection to avoid interfering with the parsing, see the amended example below...

	public void Import(string fileName, string connectionString)
	{
		using (var reader = new CsvReader(new StreamReader(fileName), false))
		{
			reader.Columns = new List<LumenWorks.Framework.IO.Csv.Column>
			{
				...
				new LumenWorks.Framework.IO.Csv.Column { Name = "Volume", Type = typeof(int) },
				// NB Fake column so bulk import works
				new LumenWorks.Framework.IO.Csv.Column { Name = "Ticker", Type = typeof(string) },
			};

			// Fix up the column defaults with the values we need
			reader.UseColumnDefaults = true;
			reader.Columns[reader.GetFieldIndex("Ticker")] = Path.GetFileNameWithoutExtension(fileName);

			// Now use SQL Bulk Copy to move the data
			using (var sbc = new SqlBulkCopy(connectionString))
			{
				...
				sbc.AddColumnMapping("Ticker", "Ticker");

				sbc.WriteToServer(reader);
			}
		}
	}

VirtualColumns

It may happen that your database table where you would like to import a CSV contains more or different columns than your CSV file. As SqlBulkCopy requires to define all column mappings from the target table, you can use the VirtualColumns functionality:

    csv.VirtualColumns.Add(new Column { Name = "SourceTypeId", Type = typeof(int), DefaultValue = "1", NumberStyles = NumberStyles.Integer });
    csv.VirtualColumns.Add(new Column { Name = "DataBatchId", Type = typeof(int), DefaultValue = dataBatchId.ToString(), NumberStyles = NumberStyles.Integer });

In this case you define 2 additional columns that do not exist in the source CSV file, but exist in the target table. Also you can set the DefaultValue that will be bulk-copied to the target table together with the CSV file content. Do not forget to include the defined virtual columns to the SqlBulkCopy column mapping!

ExcludeFilter

In case if your CSV file is big enough and you do not want to import a whole file but some set of data, you can set the ExcludeFilter action:

csv.ExcludeFilter = () => ((csv["Fmly"] ?? "") + (csv["Group"] ?? "") + (csv["Type"] ?? "")).ToUpperInvariant() == "EQDEQUIT";

In this case all rows that fit the defined criteria will not be imported to the database.

MapDataToDto<T>

Calling this method returns you an IEnumerable<T> where T is the type of an entity/DTO you want to map your CSV file. Before calling this method you should define Columns passing names and data type of all columns within CSV file.

    var expected = new List<SampleData3>
    {
        new SampleData3("John", "Doe", "120 jefferson st.", "Riverside", "NJ", 8075, true, null),
        new SampleData3("Jack", "McGinnis", "220 hobo Av.", "Phila", "PA", 9119, false, null),
        new SampleData3("John \"Da Man\"", "Repici", "120 Jefferson St.", "Riverside", "NJ", 8075, false, null),
        new SampleData3("Stephen", "Tyler", "7452 Terrace \"At the Plaza\" road", "SomeTown", "SD", 91234, false, null),
        new SampleData3(null, "Blankman", null, "SomeTown", "SD", 298, false, null),
        new SampleData3("Joan \"the bone\", Anne", "Jet", "9th, at Terrace plc", "Desert City", "CO", 123, false, null),
    };
	/// using propertyToColumnMapping parameter you can map column names from CSV file to property names of your entity/DTO
    var propertyToColumnMapping = new Dictionary<string, string>
    {
        { "FirstName", "First Name" },
        { "LastName", "Last Name" },
        { "ZipCode", "Zip Code" }
    };
	using (CsvReader csv = new CsvReader(new StringReader(CsvReaderSampleData.SampleData1), true))
	{
		/// you should define Columns before calling MapDataToDto<T>!
        csv.Columns = new ColumnCollection
        {
            {"First Name", typeof(string)},
            {"Last Name", typeof(string)},
            {"Address", typeof(string)},
            {"City", typeof(string)},
            {"State", typeof(string)},
            {"Zip Code", typeof(int)},
            {"IsActive", typeof(bool)},
        };
        csv.CustomBooleanReplacer = new Dictionary<string, bool> { { "Y", true }, { "N", false } };
        var result = csv.MapDataToDto<SampleData3>(propertyToColumnMapping).ToList();
        result.Should().BeEquivalentTo(expected);
	}

Performance

To give an idea of performance, this took a native sample app using an ORM from 2m 27s to 1.37s using SBC and the full import took just over 11m to import 9.8m records.

One of the main reasons for using this library is its excellent performance on reading/parsing raw data, here's a recent run of the benchmark (which is in the source)

Test	.NET 4.8	.NET 6	.NET 7	.NET 8
Test pass #1 - All fields
CsvReader - No cache	62.7333	80.1117	63.8059	37.0477
CachedCsvReader - Run 1	35.5160	41.7880	38.2042	39.8874
CachedCsvReader - Run 2	61426.7765	73999.3273	117302.0528	91973.2441
TextFieldParser	9.8574	13.9199	14.6264	16.4390
Regex	10.2663	17.9696	18.1544	20.7676

Test pass #1 - Field #72 (middle)
CsvReader - No cache	67.7263	95.0044	86.0328	87.4732
CachedCsvReader - Run 1	29.2253	36.9564	42.5468	44.7421
CachedCsvReader - Run 2	792792.7928	646108.6637	512820.5128	639534.8837
TextFieldParser	9.9626	13.1838	14.5921	18.2339
Regex	22.5060	37.9651	46.1343	50.7253

Test pass #2 - All fields
CsvReader - No cache	75.8756	88.6553	93.0768	109.0782
CachedCsvReader - Run 1	28.8864	38.4666	45.3834	40.4266
CachedCsvReader - Run 2	948275.8621	781527.5311	852713.1783	461699.8951
TextFieldParser	9.4858	13.8782	15.3555	17.4682
Regex	9.6566	18.4976	20.6475	22.6479

Test pass #2 - Field #72 (middle)
CsvReader - No cache	72.6275	98.7495	107.8248	111.1179
CachedCsvReader - Run 1	28.4391	35.7626	36.8091	52.7800
CachedCsvReader - Run 2	830188.6792	765217.3913	827067.6692	995475.1131
TextFieldParser	8.6734	14.6872	15.4038	18.2108
Regex	22.1135	44.0567	46.4395	50.7668

Test pass #3 - All fields
CsvReader - No cache	74.3428	90.2397	92.8137	111.8334
CachedCsvReader - Run 1	30.5301	35.6446	43.8796	49.9862
CachedCsvReader - Run 2	817843.8662	737018.4255	766550.5226	820895.5224
TextFieldParser	9.3366	14.4030	15.0148	17.9641
Regex	10.1904	19.1660	20.0524	21.7854

Test pass #3 - Field #72 (middle)
CsvReader - No cache	76.5840	104.5209	105.9584	113.8155
CachedCsvReader - Run 1	35.5272	38.0744	43.4385	37.0724
CachedCsvReader - Run 2	932203.3898	766550.5226	634005.7637	748299.3197
TextFieldParser	9.7928	14.4643	13.6437	17.7131
Regex	22.5506	44.6435	45.5831	49.3559

Average of all test passes	.NET 4.8	.NET Core 3.1	.NET 5	.NET 6
CsvReader - No cache	716.483	928.803	915.854	950.610
CachedCsvReader - Run 1	313.540	377.821	417.103	441.491
CachedCsvReader - Run 2	7.304.552.278	6.284.036.436	6.184.099.499	6.263.129.964
TextFieldParser	95.181	140.894	147.727	176.715
Regex	162.139	303.831	328.352	360.082

As you can see, an average performance slightly increases from full .NET Framework 4.8 to .NET Core 8. Performance Chart This was run on a Core i5-8400 (6 cores), 32Gb RAM and 2Tb SSD.

Product	Compatible and additional computed target framework versions.
.NET	net5.0 was computed. net5.0-windows was computed. net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 is compatible. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.
.NET Core	netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed.
.NET Standard	netstandard2.0 is compatible. netstandard2.1 was computed.
.NET Framework	net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 is compatible. net481 was computed.
MonoAndroid	monoandroid was computed.
MonoMac	monomac was computed.
MonoTouch	monotouch was computed.
Tizen	tizen40 was computed. tizen60 was computed.
Xamarin.iOS	xamarinios was computed.
Xamarin.Mac	xamarinmac was computed.
Xamarin.TVOS	xamarintvos was computed.
Xamarin.WatchOS	xamarinwatchos was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

.NETFramework 4.8
- No dependencies.
.NETStandard 2.0
- NETStandard.Library (>= 2.0.3)
net6.0
- No dependencies.
net7.0
- No dependencies.
net8.0
- No dependencies.

NuGet packages (2)

Showing the top 2 NuGet packages that depend on LumenWorksCsvReader2:

Package	Downloads
GnossApiWrapper.NetCore Gnoss Api Wrapper, for using the Gnoss API.	68.0K
GnossApiWrapper Gnoss Api Wrapper, for using the Gnoss API.	39.6K

GitHub repositories (1)

Showing the top 1 popular GitHub repositories that depend on LumenWorksCsvReader2:

Repository	Stars
RaythaHQ/raytha Raytha is a powerful CMS with an easy-to-use interface and fast performance. It offers custom content types, a template engine, and various access controls. It supports multiple storage providers and an automatically generated REST API. Upgrade your development workflow with Raytha.	238

Version	Downloads	Last Updated
4.4.0	127,986	9/12/2024
4.3.0	155,650	5/19/2022
4.2.1	3,102	4/29/2022
4.2.0	584	4/29/2022
4.1.2	38,896	7/12/2021
4.1.1	491	7/12/2021
4.1.0	6,294	11/4/2020

Added MapDataToDto<T> method to map CSV file to an IEnumerable<T> where T is a type of an entity/DTO
Removed support of .NET Framework 4.7.2, .NET Core 3.1 and .NET 5.0. Added support of .NET 7.0 and 8.0

Total 333.0K

Current version 128.0K

Per day average 167

CSV cvs-reader