Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dp.show() no longer lists all files #35

Open
aecorn opened this issue Nov 29, 2022 · 4 comments
Open

dp.show() no longer lists all files #35

aecorn opened this issue Nov 29, 2022 · 4 comments

Comments

@aecorn
Copy link
Collaborator

aecorn commented Nov 29, 2022

Dict seems to be empty, and underdeveloped:

out[trimmed_name] = {}

Consider something like this, or maybe just returning a pure list, should mimic old, backported behavior.
image

On request from
@ohvssb

@bjornandre
Copy link
Contributor

dp.show() was originally written to recursively show all folders below a given path. So you are suggesting that it should recursively show all folders and files?

@aecorn
Copy link
Collaborator Author

aecorn commented Nov 30, 2022

I might have remembered it wrong, in that i thought it used to return a dataframe of all files in the respective folders as well.
Ive tried to dig through the old deprecated code to confirm, but the lead stopped at an old microservice that I think has been closed for business.
os.environ["CATALOG_URL"] is no longer available.
https://github.com/statisticsnorway/dapla-ipython-kernels/blob/2e1b4924b45ac0ead6ac09fc7ad6e46f7903f150/dapla/services/clients.py#L20

If my memory of what "show" should do, does not match actual history, disregard this issue.
But Id like to confirm in some way that the function actually did not list files, as that is what I remember.
Were "files" actually "folders" before? That might also be the culprit of the dissonance here....
Did dp.read_pandas() on a folder, use to, before, get the parquet file from inside the folder? But when saving with dp.write_pandas a folder was created? When we now instead just operate on "flat" files in the folder?

@bjornandre
Copy link
Contributor

Yep, "datasets" were actually folders in the old "catalog" structure. In a way that can make sense, since a dataset may consist of several files. The method dp.read_pandas() still works for both files and folders - which in the latter case will read all files inside the folder into a single dataframe. This is the way spark does it.

Now we tend to operate on files directly, just like you said.

So the question is, what do we need the show method for? Maybe it should list both files and folders recursively? I've created a PR for this.

@aecorn
Copy link
Collaborator Author

aecorn commented Dec 1, 2022

Me and @ohvssb both expected it to list both files and folders a couple of days earlier, before I started digging into the history and old codebases... When the method is named something as simple as "show" you should expect a "broader behavior" perhaps?

I would expect "show()" to list both.
Alternativly you could do multiple behaviors...
show() / show(type="all")
show_files() / show(type="files")
show_folders() / show(type="folders")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants