Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct answers #45

Merged
merged 9 commits into from
Apr 15, 2024
Merged

Direct answers #45

merged 9 commits into from
Apr 15, 2024

Conversation

JKlueber
Copy link
Collaborator

@JKlueber JKlueber commented Apr 2, 2024

No description provided.

archive_query_log/orm.py Outdated Show resolved Hide resolved
archive_query_log/orm.py Outdated Show resolved Hide resolved
archive_query_log/parsers/warc_direct_answers.py Outdated Show resolved Hide resolved
archive_query_log/parsers/warc_direct_answers.py Outdated Show resolved Hide resolved
archive_query_log/orm.py Outdated Show resolved Hide resolved
@@ -190,6 +190,18 @@ class Snippet(SnippetId):
text: str | None = Text()


class DirectAnswerId(InnerDocument):
id: str = Keyword()
rank: int = Integer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a direct answer really have an interpretable "rank" (i.e., are the direct answers ordered in some way)?

- removed boxes and added url and text in DirectAnswer
- changed xpath str to xpaths List[str]
Copy link

codecov bot commented Apr 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.68%. Comparing base (cf8300e) to head (14ddc9a).

Additional details and impacted files
@@           Coverage Diff            @@
##           elastic      #45   +/-   ##
========================================
  Coverage    89.68%   89.68%           
========================================
  Files           61       61           
  Lines         2724     2724           
========================================
  Hits          2443     2443           
  Misses         281      281           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -461,9 +459,8 @@ class WarcDirectAnswerParser(BaseDocument):
url_pattern_regex: str | None = Keyword()
priority: float | None = RankFeature(positive_score_impact=True)
parser_type: WarcDirectAnswerParserType = Keyword()
xpath: str | None = Keyword()
xpaths: List[str] | None = Keyword()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned on Discord, we can just use one XPath with alternatives.

archive_query_log/parsers/warc_direct_answers.py Outdated Show resolved Hide resolved
Copy link
Contributor

@janheinrichmerker janheinrichmerker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice Julian! I think this is almost ready to merge 👍
Only two very minor fixes left.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Julian, actually, we don't need to have an importer for direct answer parsers as there are no such parsers in the "old" AQL. So you can just revert the changes in this file (which should fix most of the type-safety issues here).

else:
raise ValueError(f"Invalid parser type: {parser_type}")
WarcDirectAnswersParser.init(using=config.es.client)
add_warc_direct_answers_parser(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type-checker complains that a parameter is missing here.

@JKlueber JKlueber merged commit 218ae44 into elastic Apr 15, 2024
10 of 11 checks passed
@JKlueber JKlueber deleted the direct-answers branch April 15, 2024 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants