Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RowArrayAdapter #416

Draft
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

ayjayt
Copy link
Contributor

@ayjayt ayjayt commented Mar 11, 2024

Hi,

This is the start of an experimental feature that allow developers of visual chart libraries to, instead of copying data into a c2m handle, give c2m a read-only view into the memory object of the original visual chart library.

The benefits are

a) Reducing memory (RAM) usage by an order of magnitude
b) Clearly delineating which code controls the data
c) Relatively small addition (<<6kB full size)

Concept

For example, to store data, plotly uses a dictionary of arrays { x:[], y:[] }, and c2m uses an array of dictionaries [ {x:, y:}, {x:, y:} ]. The current solution is to create a new c2m style object containing a copy of all data, but this adapter allows us to create a class that will return a c2m style object upon request without copying and transforming the entire plotly object.

Challenges

The only downside is that c2m expects full access to the native arrays, but we can't override the index operator []. Luckily, all arrays support .at() as a synonym, and we can override that. But whenever c2m wants to index into a row, it has to use .at(), not [], since the adapter is not a real array, it is just array-like access to another object.

Testing

My solution for testing, since the adapter should work anywhere a (number | SupportedDataPointType)[] does, is to randomly wrap test data in the adapter w/o changing any test parameters. Tests where this is necessary are then parameterized to be run N times w/ frequency of data-wrapping increasing. Every possible mixture of types should work with the adapter. This testing could be pared down with maturity.

TODO

  • utils.ts
  • c2mChart.ts
  • search for more row access
  • continuous mode will probs be solved w/ a b-tree index instead of sorting the original array. Better for live data anyway. Investigate.

ayjayt added 10 commits March 10, 2024 15:06
The whole point of this is to convert datatypes to the one c2m expects.
No clue why I was returning an array instead of a SupportDataPointType.
When specifying the adapter, you now specify exactly what type of
datapoint it will return. This makes it easier for typescript to reason
around things.
1) Rows can't be accessed with [], they must be accessed with .at()

All arrays support .at(), and only .at() can be overridden

2) Tests should function the same whether using an array for data or an
   array adapter:

   The tests are now parameterized (_adapter_uilities.ts) to be run 11
   times w/ p-values [0, .1, .2, ..., 1], where each data array has p chances
   to be replaced with an array adapter. Each could be run slightly
   different, and every combination of real array/adapter array should
   pass and provide the correct error. There are specific cases where
   this isn't true, and it is commented.
@ayjayt
Copy link
Contributor Author

ayjayt commented Mar 11, 2024

Linter error because c2mChart.ts hasn't been adjusted to account for the new type

@julianna-langston
Copy link
Owner

Thanks for contributing! This use case has given me a lot to think about.

My one concern is that whoever implements their own adapter would need to keep up with all the methods we use against the data. It looks like you implemented findIndex (which we use in validate.ts), and will probably also implement map, forEach, filter, includes, reduce, indexOf, and flat. In addition to adding a workaround for Math.max and Math.min, we'd need work arounds for Object.keys() and Object.values(). However, if we started using another Array method, like reduceRight, then anyone implementing an Adapter would also need to update that Array method, and would need to keep up with any workarounds we use for Math/Object prototypes. This leads me to a point where using a new array method would become a breaking change for anyone with an adapter.

I think that this could potentially present a large maintenance problem moving forward. Do you have any thoughts about handling that?

@ayjayt
Copy link
Contributor Author

ayjayt commented Mar 13, 2024

Thank you so much for the analysis and response!

It could stay a hidden feature for quite a while (in a separate branch), and I'd be willing to maintain the experiment for a year to see if it works, and I'm writing plotly's integration to support both adapter and non-adapter versions.

You are correct about implementing those methods, that is my plan.

Breaking can be seen as a feature as well, not a bug, especially for large integrations like w/ plotly, I'll explain:

It is IMO always better to break instead of propagating undefined behavior. I want plotly to always own data and deny c2m w/ failure if c2m does something it doesn't understand.

  1. If data is copied, but then modified, undefined behavior will propagate to the user experience since plotly and c2m will have potentially divergent datasets, and it's unlikely these errors will ever be detected, users will just get bad data.
  2. With or without the adapter, some new features in c2m need to be accounted for in its integrations w/ other packages like plotly. Again, the benefit is that the adapter provides a single point of failure to find any breaking changes instead of propagating them. But the challenges:
    1. c2m release schedule needs to be made independent of any project's implementation of c2m, including c2m's test adapter. Would think about that towards the end.
    2. I'm sure there's a way to let the developers of a large project know they need to make changes to support new c2m features, or else they stick with an old version. Could be based partly on adapter failure.

I'd like to continue with the investigation while releasing other solutions for a full integration with plotly in the meantime, I think it'll get done w/in two weeks.

It's also a good exercise for me to learn more about the c2m internals while potentially releasing something useful.

But otherwise, c2m simply can't use [] to access rows. AFAIK all other behaviors can be shimmed (including for of).

ayjayt added 12 commits March 16, 2024 20:15
The array adapter interface accepts a type <T>, just like any other
array which wants to know what it is an array of. Previously, there had
been enforcements of SupportedDataPointType as the type of T, which
still isn't a terrible idea, but a) probably won't matter unless users
are using typescript importing types and b) makes testing validation
inconvenient because certain tests validate vanilla number arrays
(number[]) instead of SupportedDataPointType[], as that is a shortcut to
SimpleDataPoint (a supported type).
@ayjayt
Copy link
Contributor Author

ayjayt commented Mar 18, 2024

On track for full integration w/ plotly!

Quick question, @julianna-langston, I noticed that in utils.ts the calculateAxisMaximum and calculateAxisMinimum have no option for calculatings BoxDataPoint's. Is that intentional?

@julianna-langston
Copy link
Owner

fyi - I went through all the cases I could find of array[index], and replaced them with array.at(index). See commit: f69195e

That change is currently live in 1.16.2

@ayjayt
Copy link
Contributor Author

ayjayt commented Apr 16, 2024

:-O Much help! Thanks so much! I needed some positive vibes

Technical note for myself:

I'm writing in the technical comments that "RowArrayAdapter" doesn't support heterogeneous data,

[
 { x: 0, y: 0 },
 { x: 0, y2: 0 }
]

From a technical perspective it could, although it's not a need for plotly, for whom it is technically impossible.

In usesAxis, I branch isRowArrayAdapter instead of shimming find which now seems unnecessary as find may be necessary to shim later and isn't really any different than forEach.

ayjayt added 4 commits April 16, 2024 10:19
into pikul-feature-array-adapter

Note, this merge required massaging. es-lint complained more, maybe
because the merge included an update, and I had to bring the
RowArrayAdapter types into c2mChart.ts to satisfy it wrt to
convertDataRow(), which I didn't have to do before the merge. I didn't
see any _code_ in the merge that would have changed this behavior, so I
assume it's a dependency issue.

There were a couple conflicts in utils.ts where I had been working, but
I will make a note in the pull request julianna-langston#416
This reverts commit ceb36c7, reversing
changes made to 5d6a470.
@ayjayt
Copy link
Contributor Author

ayjayt commented Apr 16, 2024

Hi @julianna-langston! Was that conversion [] --> .at() done automatically or by hand?

The thing is, .at() is a read-only version of [], so if it's used where write access is wanted, it will throw a ReferenceError.
edit: here is an example (not an error throwing example) comment

I reverted it for now and I'm going to leave some notes here while we sort this. Apologies for the complexities.


  1. Read-only access is fine for data in rows but C2M needs could have if it wants write access for everything else. ie. c2m would never change an actual number but it might change some of its metadata on a group.

Unrelated notes that I will need once this is sorted:

  1. New es-lint version is stricter and c2mChart.ts now needs information about types, easy enough:
import type { RowArrayAdapter } from "./rowArrayAdapter";

/**
 * ValidUserInputRows are the types the user can submit as data rows.
 * Note that number will be converted to SimpleDataPoint[],
 * so the effective types we accept is more restrictive.
 */
type ValidUserInputRows =
    | RowArrayAdapter<SupportedDataPointType>
    | (number | SupportedDataPointType)[];

which can probably be eliminate later.

  1. do checkForNumberInput

@julianna-langston
Copy link
Owner

I did the conversion by hand. I un-did the .at() in assignment (oops), and used .at() in the checkForNumberInput. Does this work for you?
#450

As for c2m changing numbers, it shouldn't be. Do you have an example where c2m changes the data provided to it? If you're seeing side effects, I would definitely consider that a bug and try to fix it as soon as I can.

@ayjayt
Copy link
Contributor Author

ayjayt commented Apr 18, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants