Skip to content

Commit

Permalink
Add action to spell check docs and README
Browse files Browse the repository at this point in the history
  • Loading branch information
Westwooo committed Sep 23, 2024
1 parent 177c57b commit 6a6df3d
Show file tree
Hide file tree
Showing 17 changed files with 189 additions and 31 deletions.
27 changes: 27 additions & 0 deletions .github/workflows/.spellcheck.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
matrix:
- name: Markdown
expect_match: false
aspell:
lang: en
dictionary:
wordlists:
- .github/workflows/.wordlist.txt
output: wordlist.dic
encoding: utf-8
pipeline:
- pyspelling.filters.markdown:
markdown_extensions:
- markdown.extensions.extra:
- pyspelling.filters.html:
comments: false
attributes:
- alt
ignores:
- ':matches(code, pre)'
- 'code'
- 'pre'
- 'blockquote'
sources:
- 'README.md'
- 'docs/*.adoc'
- 'docs/**/*.adoc'
120 changes: 120 additions & 0 deletions .github/workflows/.wordlist.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
// Actual words to ignore

aarch
adoc
analytics
Analytics
api
aws

capella
cb
cbenv
cbsh
cbshell
CBShell
CIDR
CIDRs
cli
CLI
config
Config
connstr
contentVector
couchbaselabs
couchbase
Couchbase
csv
CSV

darwin
datasets
dataverses
descriptionEmbedding
dotfile
dotfiles

EE
embeddings
env

fieldName
FileSize

github
gz

Homebrew
hostnames
html
http
https

ints
InVpc

json
JSON

kv

linux
llm
localdev
localhost
lookups

macOS
memcached
MiB
msvc

namespace
netlify
nowrap
nushell
Nushell
Nushell's

OpenAI
OpenSSL

pc
plaintext
png
pre
projectcapella

QL
quickstart
Quickstart

rustup

sectnums
sqlite
SRV
subcommands
subdoc

templating
tera
tls
TLS
toclevels
toml
toolchain

uments
unix
upsert
upserted
userguide

whoami
www

xattrs
xml

yaml
13 changes: 12 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -159,4 +159,15 @@ jobs:
- uses: hustcer/setup-nu@main
with:
version: "*"
- run: nu docs/sample_config/prompt_tests.nu
- run: nu docs/sample_config/prompt_tests.nu

check-spelling:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Check Spelling
uses: rojopolis/[email protected]
with:
config_path: .github/workflows/.spellcheck.yml
task_name: Markdown
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,8 @@ On top of [nushell](https://www.nushell.sh/) built-in commands, the following co
- `cb-env project` - Sets the active cloud project based on its name
- `cb-env scope` - Sets the active scope based on its name
- `cb-env timeouts` - Sets the default timeouts
- `clouds` - Lists all clusters on the active Capella organisation
- `clusters`- Lists all clusters on the active Capella organisation
- `clouds` - Lists all clusters on the active Capella organization
- `clusters`- Lists all clusters on the active Capella organization
- `clusters create` - Creates a new cluster against the active Capella organization
- `clusters drop` - Deletes a cluster from the active Capella organization
- `clusters get` - Gets a cluster from the active Capella organization
Expand Down
8 changes: 4 additions & 4 deletions docs/commands.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
== Couchbase Commands

The following sections discuss the individual couchbase specific commands in greater detail. Remember, you can always mix and match
them with built-in other shell commands as well as executables from your environment.
The following sections discuss the individual Couchbase specific commands in greater detail. Remember, you can always mix and match
them with built-in other shell commands as well as executable programs from your environment.

include::commands/buckets.adoc[]

Expand Down Expand Up @@ -52,7 +52,7 @@ You can retrieve a document with `doc get`:
```

To distinguish the actual content from the metadata, the content is nested in the `content` field.
If you want to have everything at the toplevel, you can pipe to the `flatten` command:
If you want to have everything at the top level, you can pipe to the `flatten` command:

[options="nowrap"]
```
Expand Down Expand Up @@ -288,7 +288,7 @@ The answering of questions with supplied context can be used to easily implement

=== `version`

The `version` command lists the version of the couchbase shell.
The `version` command lists the version of the Couchbase shell.

```
> version
Expand Down
2 changes: 1 addition & 1 deletion docs/commands/query.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ The query commands can be used to explore/create indexes and execute queries.

==== `query`

Takes a n1ql statement and executes it against the active cluster.
Takes a N1QL statement and executes it against the active cluster.

```
👤 Charlie 🏠 local in 🗄 travel-sample._default._default
Expand Down
2 changes: 1 addition & 1 deletion docs/commands/vector.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ Embedding batch 1/1
The resulting document is the same as the original, but with a new field `contentVector` which contains the result of embedding the content field with the <<_cb_env_llm,active llm>>.
The name of the field that the embedding will be written to will default to the name of the original field with "Vector" appended.
This default behaviour can be overwritten with the `vectorField` flag.
This default behavior can be overwritten with the `vectorField` flag.
The resulting document is formatted with an id and content column which allows it to be piped into a `doc upsert` command to store it in the connected couchbase cluster.
```
Expand Down
4 changes: 2 additions & 2 deletions docs/exporting-data.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ If you want to only store the document body then you can use `doc get <id> | get

===== To JSON

From KeyValue
From key-value
```
> doc get airport_3719 --bucket travel-sample
╭───┬──────────────┬────────────────────────────────────┬─────────────────────┬───────┬─────────╮
Expand Down Expand Up @@ -152,7 +152,7 @@ To Multiple Documents

===== To CSV

From KeyValue
From key-value

[options="nowrap"]
```
Expand Down
2 changes: 1 addition & 1 deletion docs/importing-data.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Couchbase Shell supports loading data from a variety of formats and sources.

The simplest way to import data is using `doc import` as covered in <<_loading_data_into_the_shell,Loading data into the shell>>.
These recipes will cover more advanced usecases.
These recipes will cover more advanced use cases.

==== A Note On Data format

Expand Down
12 changes: 6 additions & 6 deletions docs/intro.adoc
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
== Introduction

Couchbase Shell is fully featured, so it does not only contain commands related to couchbase but is actually built on top of a
general purpose shell called https://www.nushell.sh/[nushell]. This allows you to interact with the file system or any other
Couchbase Shell is fully featured, so it does not only contain commands related to Couchbase but is actually built on top of a
general purpose shell called https://www.nushell.sh/[Nushell]. This allows you to interact with the file system or any other
command available on your machine, making it a great tool for both operational and development tasks on top of Couchbase.

The following introduction only touches on the basic concepts to make you productive quickly. We recommend also checking out the
great https://www.nushell.sh/book[nushell documentation] so you can get the most out of it.
great https://www.nushell.sh/book[Nushell documentation] so you can get the most out of it.

=== Navigating the Shell

Commands take inputs and produce output in a structured manner, most often represented as tables. Note how both the generic `ls`
command and the couchbase-specific `buckets` command both produce a table as their output:
command and the Couchbase-specific `buckets` command both produce a table as their output:

```
> ls
Expand Down Expand Up @@ -166,7 +166,7 @@ If we ran a `doc get` it would fetch the doc from travel-sample.inventory.landma
=== Loading Data into the Shell

If you want to import data into Couchbase, or just load it into the shell for further processing, there are different commands available to help you.
Once the data is loaded into the shell it can be sent to one of the couchbase save commands like `doc upsert` and `doc import`.
Once the data is loaded into the shell it can be sent to one of the Couchbase save commands like `doc upsert` and `doc import`.
Depending on the structure of the data, and the command used, you may also need to tweak it a little bit so it can be properly stored.

==== Doc import
Expand Down Expand Up @@ -259,7 +259,7 @@ In our case we use `from json`:
```

TIP: look at the many different import formats `from` supports, including csv, xml, yaml and even sqlite. With this simple tool
at hand you are able to load many different data formats quickly and import them into couchbase!
at hand you are able to load many different data formats quickly and import them into Couchbase!

We cannot use this format directly with commands like `doc upsert` as the command expects two "columns" in the data - id and content.
This means that we have to perform some translation from the above format to one that `doc upsert` understands.
Expand Down
4 changes: 2 additions & 2 deletions docs/quickstart.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ image::mac-binary-unsigned.png[macOS Warning,600]

==== Homebrew

If running on macOS you can install via the https://formulae.brew.sh/formula/couchbase-shell[homebrew] formula:
If running on macOS you can install via the https://formulae.brew.sh/formula/couchbase-shell[Homebrew] formula:

```
$ brew install couchbase-shell
Expand Down Expand Up @@ -99,7 +99,7 @@ To start experimenting with data operations load some sample data onto the clust
╰───┴─────────┴───────────────┴─────────╯
```

Now you can try running n1ql queries using the <<_query,query>> command.
Now you can try running N1QL queries using the <<_query,query>> command.

```
👤 Administrator 🏠 default
Expand Down
2 changes: 1 addition & 1 deletion docs/recipes.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
:sectnums:

Welcome to the recipes section of the Couchbase Shell `cbsh` documentation.
Here you can find how powerful tasks can be performed by a combination of pipelined statements using `cbsh`.
Here you can find how powerful tasks can be performed by a combining `cbsh` statements.

include::recipes/register_cluster.adoc[]

Expand Down
6 changes: 3 additions & 3 deletions docs/recipes/managing_multiple_clusters.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ To focus on the free memory that each cluster has, we can https://www.nushell.sh
╰────┴────────────┴─────────────╯
```
We can reformat the tables to make the the data more readable, but nushell's understanding of various data types allows us to reformat the values within the table.
We can reformat the tables to make the the data more readable, but Nushell's understanding of various data types allows us to reformat the values within the table.
For example we could convert the `memory_free` values from bytes to gigabytes:
[options="nowrap"]
Expand All @@ -85,10 +85,10 @@ For example we could convert the `memory_free` values from bytes to gigabytes:
╰───┴─────────────┴─────────────╯
```
We do this by iterating over each node and https://www.nushell.sh/commands/docs/update.html[updating] the value in the `memory_free` column by multiplying the current value by nushell's inbuilt https://www.nushell.sh/book/types_of_data.html#file-sizes[File Size] datatype.
We do this by iterating over each node and https://www.nushell.sh/commands/docs/update.html[updating] the value in the `memory_free` column by multiplying the current value by Nushell's inbuilt https://www.nushell.sh/book/types_of_data.html#file-sizes[File Size] datatype.
We can take this one step further and use the values returned to calculate new metrics about our clusters.
When performing a healthcheck it's be useful to know the memory utilization for each cluster.
When performing a health check it's be useful to know the memory utilization for each cluster.
There are two columns that can be used to calculate this: `memory_free` and `memory_total`.
[options="nowrap"]
Expand Down
2 changes: 1 addition & 1 deletion docs/recipes/moving_data.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ The first thing to do is to recreate all of the buckets that we have on the `loc
```
Here we simply get all of the buckets, then iterate over the list with https://www.nushell.sh/commands/docs/each.html[each] and create buckets with the same name and ram quota, specifying the `remote` cluster with the https://couchbase.sh/docs/#_the_clusters_flag[--clusters] flag.
Since the value for the ram quote is returned in bytes from `buckets` we convert it to MiB by dividing by nushell's 1MB https://www.nushell.sh/book/types_of_data.html#file-sizes[FileSize] datatype.
Since the value for the ram quota is returned in bytes from `buckets` we convert it to MiB by dividing by Nushell's 1MB https://www.nushell.sh/book/types_of_data.html#file-sizes[FileSize] datatype.
We can check that this has worked by running the `buckets` command against the remote cluster:
[options="nowrap"]
Expand Down
2 changes: 1 addition & 1 deletion docs/recipes/similarity_search.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -121,5 +121,5 @@ Embedding batch 1/1
```
Here we have done another similarity search using the same index, but our source vector is the result of embedding the phrase "physical exercise".
One important detail to remeber is that the embedding generated from `vector enrich-text` must have the same dimension as those over which the index was created, otherwise `vector search` will return no results.
One important detail to remember is that the embedding generated from `vector enrich-text` must have the same dimension as those over which the index was created, otherwise `vector search` will return no results.
See https://couchbase.sh/docs/#_vector_enrich_text[vector enrich-text] for how to specify the dimension of the generated embeddings.
6 changes: 3 additions & 3 deletions docs/recipes/simple_rag.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
== Simple RAG

Couchbase Shell's https://couchbase.sh/docs/#_vector_commands[vector commands] along with https://couchbase.sh/docs/#_ask[ask] can be used to implement simple Retrieval Augmented Generation, more commonly know as RAG.
In this process similarity search is used over chunks of a larger body of text to contextualise questions sent to a Large Language model to improve the answers given.
In this process similarity search is used over chunks of a larger body of text to contextualize questions sent to a Large Language model to improve the answers given.
For this demo we will use a text version of the Couchbase Shell docs as the source text for our chunks of data we have this stored locally as a text file.
```
Expand Down Expand Up @@ -147,7 +147,7 @@ The we use the question to generate an embedding which we then pipe to https://c
This returns the vector docs with the most semantically similar chunks to our question.
Using the returned doc ids we can use the https://couchbase.sh/docs/#_subdoc_get[subdoc get] command to retrieve the chunks.
These chunks can then be piped directly into `ask` where they will be used to contextualise the question:
These chunks can then be piped directly into `ask` where they will be used to contextualize the question:
```
👤 Charlie 🏠 remote in ☁️ RagChunks._default._default
Expand Down Expand Up @@ -175,4 +175,4 @@ Remember to consult the available flags and options for more customization and f
```
This allows `ask` to produce a much more accurate and informative answer using the context it was given.
Changing the size of the chunks, number og neighbours returned as well as the dimension of the embeddings can all have an impact on the result of RAG, and `cbsh` should help experimenting with these variables quick and easy.
Changing the size of the chunks, number of neighbors returned as well as the dimension of the embeddings can all have an impact on the result of RAG, and `cbsh` should help experimenting with these variables quick and easy.
Loading

0 comments on commit 6a6df3d

Please sign in to comment.