Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorten project structure #82

Open
1 task
hugolpz opened this issue Mar 1, 2021 · 3 comments
Open
1 task

Shorten project structure #82

hugolpz opened this issue Mar 1, 2021 · 3 comments

Comments

@hugolpz
Copy link

hugolpz commented Mar 1, 2021

Related to #80. Suggestion. Mainly, move the core codes up so it is more visible.
The crawlers are kept into their own folder.

  • Reoganize project structure from :
corpuscrawler
├─ README.md
├─ LICENSE
├─ LICENSE.md
├─ CONTRIBUTING.md
├─ corpuscrawler
└─ Lib
   └─ corpuscrawler
      ├─ *.py : utilities
      └─ crawl_{iso}.py : crawlers

to

corpuscrawler
├─ README.md
├─ LICENSE
├─ LICENSE.md
├─ CONTRIBUTING.md
├─ corpuscrawler
├─ *.py : utilities
└─ crawlers
   └─ crawl_{iso}.py : crawlers

Would such changes disturb some complementary toolchain ?

@hugolpz hugolpz changed the title Reorganize project structure Shorten project structure Mar 1, 2021
@hugolpz
Copy link
Author

hugolpz commented Feb 15, 2024

Hello @sffc . I noticed you made some py change 10adaec and are active on this project, so allow me to cc you on this minor issue.

@sffc
Copy link
Collaborator

sffc commented Feb 26, 2024

The project is currently structured as a PIP module, and it should stay a PIP module. However I would support reorganizing the utilities and crawlers into separate directories, but more along the lines of:

corpuscrawler
├─ README.md
├─ LICENSE
├─ LICENSE.md
├─ CONTRIBUTING.md
├─ corpuscrawler
└─ Lib
   └─ corpuscrawler
      ├─ util
      |   └─ *.py: utilities
      └─ crawlers
          └─crawl_{iso}.py : crawlers

@hugolpz
Copy link
Author

hugolpz commented Feb 27, 2024

This would add clarity yes. This current project lacks clear on-boarding manuals and pointers. A clean structure splitting the few utils from the 1000+ crawlers files would be an improvement for clarity and on-boarding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants