Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import Data from CSV and Excel 🚀 #2480

Open
tnaum-ms opened this issue Dec 2, 2024 · 0 comments
Open

Import Data from CSV and Excel 🚀 #2480

tnaum-ms opened this issue Dec 2, 2024 · 0 comments

Comments

@tnaum-ms
Copy link
Collaborator

tnaum-ms commented Dec 2, 2024

Import Data from CSV and Excel: Feedback and Collaboration Welcome! 🚀

We’re excited to propose a new feature enabling seamless import of data from CSV and Excel files into the Azure Databases extension for VS Code. This feature aims to simplify data management workflows while ensuring flexibility and accuracy during the import process. Here's what we envision and how your input can help shape this functionality.


Proposed Feature Overview

The data import feature will allow users to upload structured data from CSV and Excel files, providing options for handling complex data structures, data type mismatches, and defaults. It will cater to a range of use cases, from flat data structures to deeply nested documents.

Core Features

  1. Column Header Mapping

    • Automatically read column headers to detect field names.
    • Allow users to verify, adjust, or rename columns during the import process.
  2. Dot Notation Detection for Embedded Structures

    • Infer potential embedded structures when column headers use dot notation (e.g., address.street, address.city).
    • Provide an option for users to specify whether such fields should be treated as nested or top-level attributes.
    • Visual indicators and previews to guide user choices.
  3. Default Values for Missing Data

    • Specify default values for fields when no value is provided in the file.
    • Options for treating missing values as null, using a default based on the data type, or leaving the field blank.
  4. Data Type Specification

    • Allow users to define expected data types for each field (e.g., string, number, date).
    • Preview and validate data types to detect mismatches before import.
  5. Error Handling and Data Type Conversion

    • Options to manage data type mismatches:
      • Skip Invalid Rows: Log issues but continue the import.
      • Convert Where Possible: Attempt automatic conversion for minor mismatches (e.g., "123" to a number).
      • Prompt for Action: Stop the import and prompt the user to address errors manually.
  6. Data Preview and Mapping Interface

    • Interactive interface for previewing the first few rows of the file.
    • Enable users to map columns to database fields and set rules for handling discrepancies.
  7. Nested Document Support

    • Automatically detect and support importing nested structures for NoSQL databases.
    • Preview how nested objects will be created from the data.

We Need Your Feedback!

Discussion Areas

Your feedback will help refine this feature to ensure it meets community needs. Here are some areas where your input would be especially valuable:

  • How should we handle dot notation ambiguities?
  • What’s the best default behavior for missing values (e.g., null vs. inferred defaults)?
  • How strict should data type validation be? Should we allow leniency or enforce strict rules?
  • Would you prefer skipping problematic rows, halting the import, or being prompted for every issue?

Join the Conversation

We’d love to hear your thoughts and suggestions! Share your ideas in the comments or contribute directly to the issue. Every insight helps us create a better tool for everyone.


How It Will Work

  1. Header Reading

    • Automatically parse the column headers from the imported file.
    • Validate and confirm headers with user input to handle mismatched or missing names.
  2. Field Mapping and Type Configuration

    • Interactive mapping interface allows users to assign columns to database fields.
    • Users can set data types and default values, ensuring data consistency.
  3. Data Inference and Preview

    • Analyze column headers for dot notation to infer nested structures.
    • Display a preview of the transformed data for review before importing.
  4. Error Management

    • Provide options for resolving data mismatches and other issues:
      • Replace invalid values with defaults.
      • Skip or flag problematic rows.
      • Allow manual correction through an error summary interface.
  5. Data Ingestion

    • Validate and upload data into the target database.
    • Support incremental imports for large files to ensure smooth performance.

Draft Development Plan

  1. Header and Field Mapping Logic

    • Develop a parser to read headers and infer field names, including nested fields.
    • Build an interactive UI for field mapping and type configuration.
  2. Default Handling and Type Conversion

    • Implement logic for assigning default values or handling nulls for missing data.
    • Add robust type conversion and error logging mechanisms.
  3. Nested Structure Detection

    • Use dot notation to infer nested objects and arrays.
    • Provide tools for users to override inferred structures.
  4. Preview and Validation Interface

    • Design a preview panel showing how the data will be imported.
    • Enable users to confirm mappings and resolve conflicts before import.
  5. Error Reporting and Resolution

    • Include detailed error summaries for issues like type mismatches or missing data.
    • Allow configurable error-handling strategies (e.g., skip rows, replace values).
  6. Testing and Quality Assurance

    • Test with diverse datasets to ensure reliability and performance.
    • Validate compatibility across SQL and NoSQL databases.
  7. Documentation and User Guide

    • Provide clear instructions for mapping, previewing, and importing data.
    • Include best practices for managing errors and nested structures.

What’s Next?

This is the initial concept for the import feature. We expect multiple iterations based on your feedback. Let’s collaborate to build a flexible, user-friendly import tool for the VS Code Azure Databases extension. Together, we can make data management smoother and more intuitive! 🌟

@tnaum-ms tnaum-ms modified the milestones: 0.27.0, 0.26.0 Dec 2, 2024
@tnaum-ms tnaum-ms changed the title Import Data from CSV and Excel: Feedback Welcome! 🚀 Import Data from CSV and Excel 🚀 Dec 3, 2024
@tnaum-ms tnaum-ms modified the milestones: 0.26.0, 0.27.0 Dec 3, 2024
@tnaum-ms tnaum-ms modified the milestones: 0.27.0, 0.28.0 Dec 5, 2024
@tnaum-ms tnaum-ms self-assigned this Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant