CopyCatcher 1.0.0-alpha

This is a prerelease version of CopyCatcher.
dotnet add package CopyCatcher --version 1.0.0-alpha
NuGet\Install-Package CopyCatcher -Version 1.0.0-alpha
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="CopyCatcher" Version="1.0.0-alpha" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add CopyCatcher --version 1.0.0-alpha
#r "nuget: CopyCatcher, 1.0.0-alpha"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install CopyCatcher as a Cake Addin
#addin nuget:?package=CopyCatcher&version=1.0.0-alpha&prerelease

// Install CopyCatcher as a Cake Tool
#tool nuget:?package=CopyCatcher&version=1.0.0-alpha&prerelease

Copy Catcher

Table of Contents:

<a name="overview"></a> Overview

Copy Catcher is a NuGet package designed to identify and list duplicate files within a specified directory. It uses advanced techniques and optimizations to ensure efficient and accurate detection of files with identical content.

<a name="keybenefits"></a> Key Benefits & Features

  • Buffered Reading: Copy Catcher uses buffered reading to efficiently read large files in chunks, reducing memory usage and enhancing performance.

  • Asynchronous Operations: The package is designed to leverage asynchronous operations, ensuring non-blocking I/O operations. This results in a smoother user experience, especially when dealing with large directories or files.

  • Early Byte Exiting: Before hashing the entire file, Copy Catcher checks the initial bytes of files. If two files have different initial bytes, they are immediately identified as distinct, saving computational resources.

  • Chunk Hashing: Instead of hashing the entire file in one go, Copy Catcher hashes files in chunks. This approach is more memory-efficient and allows for faster identification of large duplicate files.

  • Parallelism: The package employs parallel processing to scan and hash multiple files concurrently. This takes full advantage of multi-core processors, drastically reducing the time required to identify duplicates in large directories.

<a name="gettingstarted"></a> Getting Started

<a name="prerequisites"></a>

Prerequisites

  • .NET SDK installed on your machine.
  • A .NET project where you want to use Copy Catcher.

<a name="installation"></a>

Installation

Install the Copy Catcher NuGet package using the NuGet Package Manager:

Install-Package CopyCatcher

Or using the .NET CLI:

dotnet add package CopyCatcher

Usage

<a name="integration"></a>

Integration

In your .NET project, add the following using directive:

using CopyCatcher.Shared;

Create an instance of the DuplicateFinderService:

var service = new DuplicateFinderService("path/to/directory");

Call the FindDuplicates method:

var duplicates = service.FindDuplicates();

<a name="output"></a>

Output

The FindDuplicates method will return a dictionary where keys are hash values and values are lists of file paths that have the same hash:

{
    "abc123def456": ["path/to/duplicate1.txt", "path/to/duplicate2.txt"],
    ...
}

<a name="console-app-example"></a>

Console App Example

A simple .NET Console app using Copy Catcher would look like this:

using CopyCatcher;

Console.WriteLine("Enter the directory path:");
var directoryPath = Console.ReadLine();

// Initialize the service and find duplicates
var duplicateFinderService = new DuplicateFinderService(directoryPath);
var duplicates = duplicateFinderService.FindDuplicates();

// Display results
foreach (var duplicate in duplicates)
{
    Console.WriteLine($"Hash: {duplicate.Key}");
    foreach (var filePath in duplicate.Value)
    {
        Console.WriteLine($" - {filePath}");
    }
}

How It Works

<a name="components"></a>

Components

  • FileReader: Reads files from the file system.
  • FileHasher: Computes a hash value for each file to determine duplicates.
  • DirectoryScanner: Scans the specified directory and retrieves a list of all files. It uses the DirectoryProvider to access the file system, ensuring better testability and separation of concerns.
  • DirectoryProvider: Provides direct access to the file system, used by DirectoryScanner.
  • DuplicateFinderService: The main service that ties all components together and provides an easy-to-use interface for finding duplicates.

<a name="workflow"></a>

Workflow

  1. The user specifies a directory to be scanned.
  2. DirectoryScanner retrieves a list of all files in the directory.
  3. FileHasher computes a hash for each file.
  4. Duplicate files are identified based on their hash values and returned in a dictionary.
Product Compatible and additional computed target framework versions.
.NET net7.0 is compatible.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net7.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.0.0-alpha 182 10/31/2023