diff --git a/README.rst b/README.rst index 52098b6..d1e79d1 100644 --- a/README.rst +++ b/README.rst @@ -12,9 +12,9 @@ testing and evaluation of Gen AI applications. Getting started --------------- -- See the :doc:`ARTKIT Documentation ` for our User Guide, Examples, API reference, and more. -- See `Contributing `_ or visit our :doc:`Contributor Guide ` for information on contributing. -- We have an :doc:`FAQ ` for common questions. For anything else, please reach out to ARTKIT@bcg.com. +- See the `ARTKIT Documentation `_ for our User Guide, Examples, API reference, and more. +- See `Contributing `_ or visit our `Contributor Guide `_ for information on contributing. +- We have an `FAQ `_ for common questions. For anything else, please reach out to ARTKIT@bcg.com. .. _Introduction: @@ -29,30 +29,30 @@ readily adapted to meet the testing and evaluation needs of a wide variety of Ge .. image:: sphinx/source/_images/artkit_pipeline_schematic.png :alt: ARTKIT pipeline schematic -ARTKIT also supports automated :doc:`multi-turn conversations ` +ARTKIT also supports automated `multi-turn conversations `_ between a challenger bot and a target system. Issues and vulnerabilities are more likely to arise after extended interactions with Gen AI systems, so multi-turn testing is critical for interactive applications. -We recommend starting with our :doc:`User Guide ` +We recommend starting with our `User Guide `_ to learn the core concepts and functionality of ARTKIT. -Visit our :doc:`Examples ` to see how +Visit our `Examples `_ to see how ARTKIT can be used to test and evaluate Gen AI systems for: 1. Q&A Accuracy: - Generate a *Q&A golden dataset* from ground truth documents, augment questions to simulate variation in user inputs, - and evaluate system responses for :doc:`faithfulness, completeness, and relevancy `. + and evaluate system responses for `faithfulness, completeness, and relevancy `_. 2. Upholding Brand Values: - Implement *persona-based testing* to simulate diverse users interacting with your system and evaluate system responses for - :doc:`brand conformity `. + `brand conformity `_. 3. Equitability: - Run a *counterfactual experiment* by systematically modifying demographic indicators across a set of documents and statistically - evaluate system responses for :doc:`undesired demographic bias `. + evaluate system responses for `undesired demographic bias `_. 4. Safety: - Use *adversarial prompt augmentation* to strengthen adversarial prompts drawn from a prompt library and evaluate system responses for - :doc:`refusal to engage with adversarial inputs ` . + `refusal to engage with adversarial inputs `_ . 5. Security: - Use *multi-turn attackers* to execute multi-turn strategies for extracting the system prompt from a chatbot, challenging the system's @@ -95,7 +95,7 @@ ARTKIT provides out-of-the-box support for the following model providers: - `Hugging Face `_ - `OpenAI `_ -To connect to other services, users can develop :doc:`custom model classes `. +To connect to other services, users can develop `custom model classes `_. Installation ------------- @@ -319,14 +319,14 @@ From left to right, the results table shows: 3. ``ask_chad``: The response from AskChad, which mirrors the tone of the user 4. ``evaluation``: The evaluation score for the SARCASTIC metric, which flags the sarcastic response with a 1 -For a complete introduction to ARTKIT, please visit our :doc:`User Guide ` -and :doc:`Examples `. +For a complete introduction to ARTKIT, please visit our `User Guide `_ +and `Examples `_. Contributing ------------ -Contributions to ARTKIT are welcome and appreciated! Please see the :doc:`Contributor Guide ` section for information. +Contributions to ARTKIT are welcome and appreciated! Please see the `Contributor Guide `_ section for information. License