-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Organise the categories into groups #70
Comments
Organizing categories into macro-categories, or "groups", is something that I am thinking since a long time, but did not yet tackled as a somehow relatively big change in the organization of the data. I completely agree with your analysis. But since it would require some reorganization of the README file too, and of course some changes to the Python script to generate it, it is a change that have to be carefully planned. One idea would be to start incrementally, by introducing the groups in the CSV and keep using the categories in the README. In practice, there is a number of decisions to take here :-) Your suggested groups sound good: thanks for the contribution! Although all such "incremental" approaches may sound a half-baked solution. Overall, I/we should find enough time to work on it. |
@toolleeo Yeah, your plan on incremental changes, adding a column just on the CSV first and then updating the Python script, is indeed a good one. As for the script change, I agree that it will be not exactly trivial to implement. However, I would't say that it would be hard to do because the main aspects of the script are legibility and mantainability. IMO the performance of the script is not a critical aspect, if the script takes 500ms to run or even 1s it is totally fine, because it only needs to be run when the data changes. |
I definitely agree that the performance of the script is not relevant. The main point is find a suitable assignment for all or almost all the categories. |
Definitely, the main problem is the groups. I was going to talk about it, but I was a bit busy. Here we go: Gradual group assigningI think that if some categories on the CSV are left with no group it's fine for some time, IMO it would look good enough if, in the index, we display the categories without a group outside of a group Choosing goupsNowadays, IMO we don't need to take so much time thinking on which categories to use because we have extremely good LLMs (Large Language Model) such as ChatGPT and even some quite impressive open source ones, given that OpenAI has one of the largest public tokens dataset, if not the largest one. Promtping approachWe can take just the To create some groups, all we need is a good use of the best prompting techniques (if anyone can has more knowledge on this, please help me). From that point, we can just adjust the LLM result a little bit to make the 'final" version (sure, it can change later if someone finds an issue or if the categories change). TechniquesWe can combine different techniques to get the exact groups that we want:
ConclusionWhat about doing this? In my opinion, LLMs (at least ChatGPT, that's what I often use in my daily life) are currently very good for generating content about more open and not specific tasks. Altough they can make many mistakes on specific tasks like programming and math, for example, their ability to connect distant concepts is quite impressive. That's why I think they would do an excellent job on this task if we know how to use them. |
Wow, honestly I admit that I did not think about using a LLM to come up with the automatic generation / suggestion of the groups :-) It sounds like topic modeling applied to our dataset made by the categories. Probably a test would not require too much effort. Have you the chance to try it out? BTW, one thing that I thought in the past was about the automatic generation of the description, starting from the README or similar, using an LLM. But this is probably a topic for another thread :-) |
I made a quick test just feeding it with the categories I think last month and its response was not bad at a all, so I think it will be good :) But yeah, I agree that, at the moment, it's not really a huge deal, but a LLM can help us wkth maybe new ideas and other perspectives.
Wow, that is a good idea for sure, I think a Python script that takes the readme using the GitHub API or even one that uses the GitHub website itself would work quite well. Maybe even have an option to pass additional context like documentation or even code. We definitely have to discuss this 🚀 |
I'll try to try it out in a few days or maybe hours, let's see what we get. |
maybe you could use GitHub Docs, to allow you to include md files within the main readme? This would potentially allow you to have pages for sections, and provide a Table of contents. |
I don't know if @toolleeo has interest in this, but I'll describe my proposal.
TL;DR
There are too many categories now, so I propose adding a "group" column in
categories.csv
for, just in the index, grouping the existing categories.This would make the management and use of this list much better.
Motivation
Current approach
The tools are organized into one level of hierarchy, in which the categories include multiple tools. An app can only belong to one category.
My proposal
To address the above cons that are a consequence of the project growth, I propose adding groups that include multipe categories.
Implementation
IMO it seems relatively easy to implement:
Pros and cons
IMO the pros of the new approach and the cons of the current approach overcomes, by far, the pros of the current approach and the cons of the new approach.
Groups suggestions
I'm not exactly sure of each groups we could use but some suggestions are:
The text was updated successfully, but these errors were encountered: