Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use modules properly and allow composability #44

Closed
wants to merge 7 commits into from

Conversation

jdesboeufs
Copy link
Contributor

@jdesboeufs jdesboeufs commented Nov 20, 2017

Hi,

I needed to detect encoding in a small frontend tool.
This library is awesome, but too huge for my use case.
I started to investigate how to cut it down.

Sometimes, it is not necessary to be able to detect all encodings available in jschardet.
In my case, we develop a tool for French people, so we only need to support Windows-1252/Latin-1 and UTF-8.

This PR is a code reorganization, using modules the right way (with their own dependency tree).
It's a first step before going further!

For now, it breaks compatibility for those who use actual _debug and Constants.MINIMUM_THRESHOLD. I updated the README.

This reorganization make possible writing custom detectors.
It's at the moment very quick and dirty, but if we work together I'm sure we can do a better jschardet 2.0 :)

See example custom code:
jdesboeufs/jschardet@use-modules...jdesboeufs:fr-cut

Bundle Size Size (min) Size (min+gz)
Full 455 KB 328 KB 117 KB
Fr 37 KB 11 KB 2.9 KB 🎉 💃

See also my independant PR #42 then #43.

@jdesboeufs jdesboeufs changed the title Use modules properly Use modules properly and allow composability Nov 20, 2017
@aadsm
Copy link
Owner

aadsm commented Nov 26, 2017

This is awesome and super needed!!! Thank you so much for doing it, I always wanted to organize it this way but could never find the time.
I’ll add a few comments <3.

@aadsm aadsm self-requested a review November 26, 2017 23:43
var consts = jschardet.Constants;

jschardet.HZ_cls = [
exports.HZ_cls = [
Copy link
Owner

@aadsm aadsm Nov 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HZ_cls and a few more are never used outside of this scope so we don’t really need to add it to exports. Not sure why I added it to jschardet at the time, but we can avoid it now.

@@ -27,16 +27,19 @@
* 02110-1301 USA
*/

!function(jschardet) {
var CodingStateMachine = require('./codingstatemachine')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also use this reorganization to make the filenames camel case, so they match their symbols: CodingStateMachine.js.

@@ -39,13 +39,13 @@ Options
```javascript
// See all information related to the confidence levels of each encoding.
// This is useful to see why you're not getting the expected encoding.
jschardet.Constants._debug = true;
jschardet.setLogger();
Copy link
Owner

@aadsm aadsm Nov 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is not clear of its purpose. I had to read the setLogger code to figure out what it was doing.
My first reaction was to think that this wouldn’t do any logging because you were setting the logger to nothing.
Let’s add two functions: {enable, disable}Logging().

var setLogger = require('./logger').setLogger;

exports.VERSION = "1.4.1";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get rid of this.


// Default minimum accepted confidence level is 0.20 but sometimes this is not
// enough, specially when dealing with files mostly with numbers.
// To change this to 0 to always get something or any other value that can
// work for you.
jschardet.Constants.MINIMUM_THRESHOLD = 0;
jschardet.detect(str, { minimumThreshold: 0 });
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

return u.result;
}
exports.UniversalDetector = UniversalDetector;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t need to export this I believe.

exports.log = function () {};

exports.setLogger = function setLogger(fn) {
exports.enabled = true;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of exporting this and add the check everywhere, we can just make the check on the log function itself (at line 10).

Copy link
Owner

@aadsm aadsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about a compile mode that will bundle only the required modules?
You could start by having a config that specifies which encodings to use (to avoid having to modify or comment code like you did in your branch).

@danielgindi
Copy link
Contributor

@aadsm is there an ETA on this? as this is very important to allow static compilation, like with rollup. The circular dependency with init.js currently is a major problem.

@aadsm
Copy link
Owner

aadsm commented Jan 16, 2019

@jdesboeufs I had forgotten about this, I'll try to take a look over the weekend.

@aadsm
Copy link
Owner

aadsm commented Jan 22, 2019

I made some changes and I just published a new version with this.

@aadsm aadsm closed this Jan 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants