-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat (cli): initialize the CLI of GraphAr and support baseETL functions #616
Conversation
…to graphar-cli
…to graphar-cli
…to graphar-cli
cli/import.full.json
Outdated
@@ -0,0 +1,223 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put these import configuration files to testing repo? I think the configuration files are better to put together with the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I will move it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can refer to 6a13df3 to fix the ORC support problem in CI.
cli/src/util.h
Outdated
|
||
#pragma once | ||
|
||
#include <arrow/adapters/orc/adapter.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
cli/src/util.h
Outdated
#include <graphar/graph_info.h> | ||
#include <parquet/arrow/reader.h> | ||
|
||
#include <iostream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the std library header should put at the begin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a loot! It looks like a very big step forward for GraphAr. The main question in my mind at the moment is why are we going to re-define basic structures as pyantic models instead of using the code, generated from the protobuf messages? In my understanding the whole idea of proto was to avoid multiple definitions of the same things across the project.
cli/src/graphar_cli/config.py
Outdated
|
||
logger = getLogger("graphar_cli") | ||
|
||
# TODO: convert to constants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect to see these variables in UPPER_CASE
cli/src/graphar_cli/logging.py
Outdated
# under the License. | ||
|
||
import logging | ||
from typing import Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thing it should be moved to the TYPE_CHECKING
block.
|
||
from logging import getLogger | ||
|
||
from .config import ImportConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thing it can be moved to the TYPE_CHECKING
block, isn't? I do not see runtime calls to this class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide some examples of using TYPE_CHECKING
? It would help me a lot, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.python.org/3/library/typing.html#typing.TYPE_CHECKING
Thanks! I will try it with the protobuf
changes.
.github/workflows/cli.yml
Outdated
# macos: | ||
# name: ${{ matrix.architecture }} macOS ${{ matrix.macos-version }} CLI | ||
# runs-on: macos-${{ matrix.macos-version }} | ||
# if: ${{ !contains(github.event.pull_request.title, 'WIP') && github.event.pull_request.draft == false }} | ||
# strategy: | ||
# fail-fast: false | ||
# matrix: | ||
# include: | ||
# - architecture: AMD64 | ||
# macos-version: "12" | ||
# - architecture: ARM64 | ||
# macos-version: "14" | ||
# steps: | ||
# - uses: actions/checkout@v3 | ||
# with: | ||
# submodules: true | ||
|
||
# - name: Install dependencies | ||
# run: | | ||
# brew bundle --file=cpp/Brewfile | ||
|
||
|
||
# - name: Build GraphAr And Run Tests | ||
# working-directory: "cli" | ||
# run: | | ||
# pip install ./ | ||
# graphar --help | ||
# graphar check -p ../testing/neo4j/MovieGraph.graph.yml | ||
# graphar show -p ../testing/neo4j/MovieGraph.graph.yml -v Person | ||
# graphar show -p ../testing/neo4j/MovieGraph.graph.yml -es Person -e ACTED_IN -ed Movie | ||
# graphar import -c import.full.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can disable the job with if: false
, which is considered better to me, rather than ignore the code.
similar to example in this doc:
Ps. there should also be a note to explain with why we skip this job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion! Thanks very much!
.github/workflows/cli.yml
Outdated
macos: | ||
name: ${{ matrix.architecture }} macOS ${{ matrix.macos-version }} CLI | ||
runs-on: macos-${{ matrix.macos-version }} | ||
if: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a note to explain why we skip this job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM~
Thank you!
Hi, Sam, The CI currently is based on the C++ library and the protobuf now not full integrate into c++ library yet, so this PR has a little re-define of the base structure but will update when the protobuf is ready. |
cli/import.full.yml
Outdated
@@ -0,0 +1,138 @@ | |||
graphar: # Global configuration of imported data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the import congfig files should move to testing repo too and it's better to use.the yaml as the only format to specify configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the import congfig files should move to testing repo too and it's better to use.the yaml as the only format to specify configurations.
cli/src/graphar_cli/config.py
Outdated
name: str | ||
vertex_chunk_size: Optional[int] = 100 | ||
edge_chunk_size: Optional[int] = 1024 | ||
file_type: Literal["parquet", "orc", "csv", "json"] = DEFAULT_FILE_TYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just curious what was the reason to use strings instead of creating a enumeration? I the file_type
in at least four places and it may become challenging to support it in the future... What do you think about something like this:
from enum import Enum
class FileType(str, Enum):
PARQUET = "parquet"
ORC = "orc"
CSV = "csv"
JSON = "json"
and use it instead of the string literals?
That also allows to avoid checking it each time like this if file_type not in SUPPORT_FILE_TYPES:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FileType(str, Enum)
with recent versions of pydantic
will raise a warning. I think it is necessary to pin a minor version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, I left a minor comment
…ns (apache#616) * init cli * fix include * add ci * add vertex info * change config * finish * license * fix license * remove conda recipe * enbale ci * fix typo * fix dependency * update ci * add data * fix ci * fix cmake * add arrow in cmake * add dependency * fix review * fix ci * fix ci * fix ci * fix ci * Update cli.yml * Update cli.yml * fix config type * fix ci with new testing * use enum * use enum * pin pydantic version
As title.