CopyCatcher 1.0.0-alpha

This is a prerelease version of CopyCatcher.

dotnet add package CopyCatcher --version 1.0.0-alpha

NuGet\Install-Package CopyCatcher -Version 1.0.0-alpha

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="CopyCatcher" Version="1.0.0-alpha" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

paket add CopyCatcher --version 1.0.0-alpha

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: CopyCatcher, 1.0.0-alpha"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

// Install CopyCatcher as a Cake Addin
#addin nuget:?package=CopyCatcher&version=1.0.0-alpha&prerelease

// Install CopyCatcher as a Cake Tool
#tool nuget:?package=CopyCatcher&version=1.0.0-alpha&prerelease

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Copy Catcher

<a name="overview"></a> Overview

Copy Catcher is a NuGet package designed to identify and list duplicate files within a specified directory. It uses advanced techniques and optimizations to ensure efficient and accurate detection of files with identical content.

<a name="keybenefits"></a> Key Benefits & Features

Buffered Reading: Copy Catcher uses buffered reading to efficiently read large files in chunks, reducing memory usage and enhancing performance.
Asynchronous Operations: The package is designed to leverage asynchronous operations, ensuring non-blocking I/O operations. This results in a smoother user experience, especially when dealing with large directories or files.
Early Byte Exiting: Before hashing the entire file, Copy Catcher checks the initial bytes of files. If two files have different initial bytes, they are immediately identified as distinct, saving computational resources.
Chunk Hashing: Instead of hashing the entire file in one go, Copy Catcher hashes files in chunks. This approach is more memory-efficient and allows for faster identification of large duplicate files.
Parallelism: The package employs parallel processing to scan and hash multiple files concurrently. This takes full advantage of multi-core processors, drastically reducing the time required to identify duplicates in large directories.

<a name="gettingstarted"></a> Getting Started

Prerequisites

.NET SDK installed on your machine.
A .NET project where you want to use Copy Catcher.

Installation

Install the Copy Catcher NuGet package using the NuGet Package Manager:

Install-Package CopyCatcher

Or using the .NET CLI:

dotnet add package CopyCatcher

Usage

Integration

In your .NET project, add the following using directive:

using CopyCatcher.Shared;

Create an instance of the DuplicateFinderService:

var service = new DuplicateFinderService("path/to/directory");

Call the FindDuplicates method:

var duplicates = service.FindDuplicates();

Output

The FindDuplicates method will return a dictionary where keys are hash values and values are lists of file paths that have the same hash:

{
    "abc123def456": ["path/to/duplicate1.txt", "path/to/duplicate2.txt"],
    ...
}

Console App Example

A simple .NET Console app using Copy Catcher would look like this:

using CopyCatcher;

Console.WriteLine("Enter the directory path:");
var directoryPath = Console.ReadLine();

// Initialize the service and find duplicates
var duplicateFinderService = new DuplicateFinderService(directoryPath);
var duplicates = duplicateFinderService.FindDuplicates();

// Display results
foreach (var duplicate in duplicates)
{
    Console.WriteLine($"Hash: {duplicate.Key}");
    foreach (var filePath in duplicate.Value)
    {
        Console.WriteLine($" - {filePath}");
    }
}

How It Works

Components

FileReader: Reads files from the file system.
FileHasher: Computes a hash value for each file to determine duplicates.
DirectoryScanner: Scans the specified directory and retrieves a list of all files. It uses the DirectoryProvider to access the file system, ensuring better testability and separation of concerns.
DirectoryProvider: Provides direct access to the file system, used by DirectoryScanner.
DuplicateFinderService: The main service that ties all components together and provides an easy-to-use interface for finding duplicates.

Workflow

The user specifies a directory to be scanned.
DirectoryScanner retrieves a list of all files in the directory.
FileHasher computes a hash for each file.
Duplicate files are identified based on their hash values and returned in a dictionary.

Product	Compatible and additional computed target framework versions.
.NET	net7.0 is compatible. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net7.0
- No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last updated
1.0.0-alpha	182	10/31/2023