Skip to content

Latest commit

 

History

History
2237 lines (1674 loc) · 144 KB

CHANGELOG.md

File metadata and controls

2237 lines (1674 loc) · 144 KB

Change Log

All notable changes to Fili will be documented here. Changes are accumulated as new paragraphs at the top of the current major version. Each change has a link to the pull request that makes the change and to the issue that triggered the pull request if there was one.

Current

Added:

  • Adds FlagFromTagDimension

    • FlagFromTagDimension is a virtual dimension that exposes a flag based interface to API users, but is actually based on the presence or absence of a tag value in an underlying multivalued dimension.
    • This implementation is based on two underlying physical columns: a filtering column which can be efficiently filtered against using the default druid filter serialization, and a grouping dimension containing a comma separated string of tag values, which is parsed to determined the presence of the desired tag value and then converted to the appropriate truth value.
    • The filtering behavior is supported through the new FilterOptimizable interface and associated request mapper.
  • Add FilterOptimizable interface

    • Adds the FilterOptimizable interface, which indicates that the implementing object has the ability to optimize a Collection of ApiFilter objects.
    • Adds FilterOptimizingRequestMapper which will check if any of the filtered on dimensions can optimize their filters and performs the optimizations.
  • Add ImmutableSearchProvider interface and MapSearchProvider

    • Adds the ImmutableSearchProvider interface, which is a marker interface indicating that the SearchProvider implementation is immutable.
    • Adds MapSearchProvider which is an implementation of ImmutableSearchProvider based on a constant map.
  • Add support to DataCache for key-specific expirations

    • Adds a new method boolean set(String key, T value, int expiration) that allows customers to to set the expiration date for a key when it is being added to the cache.
    • The default implementation delegates to boolean set(String key, T value) (so throwing away the expiration), so this won't affect any customers who have their own DataCache.
    • The memcache-backed implementation implements the new set, and the old set delegates to it, passing in the configured EXPIRATION constant.
  • Add config parameter to control lookback on druid dimension loader

    • Add config parameter: bard__druid_dim_loader_lookback_period to control window of time used in loading.
  • Add ApiFilters to LogicalTable

    • Add ApiFilters to LogicalTable class. These filters function as a view on the underlying physical tables by restricting access to only a subset of the data present on the logical table.
    • These filters are merged with ApiFilters from the api request during druid query building and on the TablesApiRequestImpl for requests to the tables servlet.
    • Small Patch: ApiFilters contract was breaking downstream application tests, so switched to supporting Optional
    • Second small patch: Fix null pointer errors with TablesApiRequestImpl
  • Make current macro align on the end of network day

    • Added BardFeatureFlag.CURRENT_TIME_ZONE_ADJUSTMENT which determines if adjustment based on timezone is needed.
    • Added BardFeatureFlag.ADJUSTED_TIME_ZONE which tells to what timezone the macro has to be adjusted.
    • If CURRENT_TIME_ZONE_ADJUSTMENT flag is enabled, macro is aligned on end of UTC day.
  • Create a TagExtractionFunctionFactory to transform comma list values into a Boolean dimension

    • Create an extraction function to transform a comma list of values into a boolean dimension value.
  • Add Partial Data Feature Flags to separate query planning and data protection

    • BardFeatureFlag.PARTIAL_DATA_PROTECTION activates removal of time buckets based on availability
    • BardFeatureFlag.PARTIAL_DATA_QUERY_OPTIMIZATION activates the use of PartialData when query planning.
    • BardFeatureFlag.PARTIAL_DATA still activates both capabilities.
    • If any of these flags are active partial data answers are included in responses.
  • Add system config to disable requiring metrics in Api queries

    • Added the system config require_metrics_in_query which toggles whether or not metrics should be required in queries
      • this setting is turned ON by default
    • This property is controlled through the feature flag BardFeatureFlag.REQUIRE_METRICS_QUERY
  • Add more BoundFilterBuilding validation and hooks

    • Added minimum and maximum arguments to FilterOperation
    • Added validation on number of arguments to the bound filter builder
    • Added hook for normalizing BoundFilterBuilder arguments
  • Force update of cardinality to SearchIndexes

    • SearchProvider now has method int getDimensionCardinality(boolean refresh), where refresh indicates the cardinality count should be refreshed before being returned.
      • default implementation just defers to existing method int getDimensionCardinality()
      • LuceneSearchProvider overrides the default and refreshes the cardinality count if refresh is true
  • Added aliases to api filter operations

    • Filter ops now have aliases that match the relevant ops and aliases for havings.
  • Added filename parameter to api query

    • If the filename parameter is present in the request the response is assumed to be downloaded with the provided filename. The download format depends on the format provided to the format parameter.
    • Filename parameter is currently only available to data queries.
  • Ability to add Dimension objects to DimensionSpecs as a nonserialized config object

    • DimensionSpec and relevant subclasses have had a constructor added that takes a Dimension and a getter for the Dimension
  • Added expected start and end dates to PhysicalTableDefiniton

    • New constructors on PhysicalTableDefinition and ConcretePhysicalTableDefinition that take expected start and end date
    • New public getters on PhysicalTableDefinition for expected start and end date
  • Added expected start and end dates to availability

    • Add methods for getting expected start and end dates given a datasource constraint to the Availability interface.
      • start and end dates are optional, with an empty optional indicating no expected start or end date.
      • the new methods default to returning an empty optional.
    • The start and end dates are not concrete. If an availability has intervals outside of the expected range those intervals are NOT suppressed.
    • BaseCompositeAvailability reports its expected start and end dates as the earliest start date and latest end date of its composed availabilities.
      • no expected start or end date supercedes any configured start or end date, so if ANY of the composed availabilities has no start or end date, and empty optional is reported.
    • Add a constructor to StrictAvailability that takes start and end dates, which allow for direct configuration of expected start and end dates.
  • Fili can now route to one of several Druid webservices based on custom routing logic

    • This allows customers to put Fili in front of multiple Druid clusters, and then use custom logic to decide which cluster to query for each request.
    • We introduce a new interface DruidWebServiceSelector that wraps the routing logic, and pass an instance to the AsyncWebServiceRequestHandler for it to use.
  • Add Druid Bound filter support to Fili

    • Added the DruidBoundFilter class to support the Bound Filter supported by Druid.
  • Add static Factory build methods for BoundFilter

    • Added static factory methods for building lowerBound, upperBound, strictLowerBound and strictUpperBound Bound filters.
  • Add insertion order aware method for Stream Utils

    • Added orderedSetMerge that merges 2 sets in the order provided.
  • Add DimensionRow transformation support with ResultSetMapper

    • Added helper constructor to DimensionRow
    • Created MemoizingDimensionMappingResultSetMapper to support field transform use case
  • Added LogicalTable name metdata interface and BaseTableLoader methods to accept it

    • LogicalTable accepts LogicalTableName as a constructor parameter
    • BaseTableLoader.loadLogicalTablesWithGranularities accepts LogicalTableNames to pass to new LogicalTable constructor
    • Changed default retention for LogicalTable to null rather that P1Y

Changed:

Deprecated:

Removed:

Fixed:

Known Issues:

Contract changes:

v0.10.48 - 2018/10/04

0.10 Highlights

Extensibility improvements

A number of classes which were either not extensible or very difficult to extend have been restructured.

Response building, extraction functions, Web logging and Exception handling in DataServlet are now injectable.

Extensive deprecation cleanup

A lot of tech debt has been paid down in the form of removing deprecated code and moving code off deprecated methods. Old sketch support has been retired in favor of the official community open source version: (https://datasketches.github.io/docs/Theta/ThetaSketchFramework.html)

Some deprecations have been removed because the migration path off of those deprecated methods seemed less useful than simply supporting the older (simpler) contract.

General cleanup

Some packages have been rationalized together or apart. Some concrete classes became interfaces and vice-versa.

Externalizing filter building code

Generating ApiFilters and Druid Filters has been externalized to make it easier to reimplement with non-Regular Expression solutions.

Supporting Druid in-filters

A general performance improvement by implementing the in filter in druid (as opposed to long chains of single value select filters)

Added:

Changed:

Deprecated:

Fixed:

Known Issues:

v0.9.137 - 2018/04/13

0.9 Highlights

Fili Security Added!

Release security module for fili data security filters. Created ChainingRequestMapper, and a set of mappers for gatekeeping on security roles and whitelisting dimension filters.

Added by @michael-mclawhorn in yahoo#405

DataApiRequestFactory layer

Downstream projects now have more flexibility to construct DataApiRequest by using injectableFactory. An additional constructor for DataApiRequestImpl unpacks the config resources bundle to make it easier to override dictionaries.

Added by @michael-mclawhorn in yahoo#603

Make Field Accessor PostAggregation able to reference post aggregations in adddition to aggregations

Druid allows (but does not protect against ordering) post aggregation trees referencing columns that are also post aggregation trees. This makes it possible to send such a query by using a field accessor to reference another query expression. Using this capability may have some risk.

Added by @michael-mclawhorn in yahoo#543

Etag Cache

In the more recent versions of druid that are released after February 23rd, 2017. Druid added support for HTTP Etag. By including a If-None-Match header along with a druid query, druid will compute a hash as the etag in a way such that each unique response has a corresponding unique etag, the etag will be included in the header along with the response. In addition, if a query to druid includes the If-None-Match with a etag of the query, druid will check if the etag matches the response of the query, if yes, druid will return a HTTP Status 304 Content Not Modified response to indicate that the response is unchanged and matches the etag received from druid query request header. Otherwise druid will execute the query and respond normally with a new etag attached to the response header.

This new feature is designed by @garyluoex . For more info, visit @garyluoex 's design at yahoo#255

More robust Lucene Search Provider and Key Value Store

Lucene Search Provider can re-open in a bug-free way and close more cleanly

Added by @garyluoex in yahoo#551 and yahoo#521

Extraction Function on selector filter

Update Fili to accommodate the deprecated ExtractionFilter in druid, use selector filter with extraction function instead. Added extraction function on dimensional filter, defaults to extraction function on dimension if it exists.

Added by @garyluoex in yahoo#617

More controllable RequestLog

Exposes the LogInfo objects stored in the RequestLog, via RequestLog::retrieveAll making it easier for customers to implement their own scheme for logging the RequestLog

Added by @archolewa in yahoo#574

Druid lookup metadata load status check

Fili now supports checking Druid lookup status as one of it's health check. It will be very easy to identify any failed lookups.

Added by @QubitPi in yahoo#620

Add ability to use custom rate limiting schemes

While backward compatibility is guaranteed, Fili now allows users to rate limit(with a a new rate limiter) based on different criteria other than the default criteria.

Added by @efronbs in yahoo#591

Support Time Format Extraction Function in Fili

Druid TimeFormatExtractionFunction is added to Fili. API users could interact with Druid using TimeFormatExtractionFunction through Fili.

Added by @QubitPi in yahoo#611

Dimension load strategy indicator

In order to allow clients to be notified if a dimension's values are browsable and searchable, a storage strategy metadata is added to dimension. A browsable and searchable dimension is denoted by LOADED, whereas the opposite is denoted by NONE. This will be very useful for UI backed by Fili on sending dimension-related queries.

Added by @michael-mclawhorn, @garyluoex and @QubitPi in yahoo#575, yahoo#589, yahoo#558, yahoo#578

Query Split Logging

Include metrics in logging to allow for better evaluation of the impact of caching for split queries. There used to be only a binary flag (BardQueryInfo.cached) that is inconsistently set for split queries. Now 3 new metrics are added

  1. Number of split queries satisfied by cache
  2. Number of split queries actually sent to the fact store. (not satisfied by cache)
  3. Number of weight-checked queries

Added by @QubitPi in yahoo#537

Configurable Metric Long Name

Logical metric has more config-richness to not just configure metric name, but also metric long name, description, etc. MetricInstance is now created by accepting a LogicalMetricInfo which contains all these fields in addition to metric name.

Added by @QubitPi in yahoo#492

Search provider can hot-swap index and key value store can hot-swap store location

LuceneSearchProvider is able to hot swap index by replacing Lucene index by moving the old index directory to a different location, moving new indexes to a new directory with the same old name, and deleting the old index directory in file system. KeyValueStore is also made to support hot-swapping key value store location

Added by @QubitPi in yahoo#522

Uptime Status Metric

A metric showing how long Fili has been running is available.

Added by @mpardesh in yahoo#518

Consolidate UI & Non-UI broker configurations

ui_druid_broke and non_ui_druid_broker are not used separately anymore. Instead, a single druid_broker replaces the two. For backwards compatibility, Fili checks if druid_broker is set. If not, Fili uses non_ui_druid_broker and then ui_druid_broker

Added by @mpardesh in yahoo#489 Amended by @gab-umich in yahoo#933

Credits

Thanks to everyone who contributed to this release!

@michael-mclawhorn Michael Mclawhorn @garyluoex Gary Luo @archolewa Andrew Cholewa @QubitPi Jiaqi Liu @asifmansoora Asif Mansoor Amanullah @efronbs Ben Efron @deepakb91 Deepak Babu @tarrantzhang Tarrant Zhang @kevinhinterlong Kevin Hinterlong @mpardesh Monica Pardeshi @colemanProjects Neelan Coleman @onlinecco @dejan2609 Dejan Stojadinović

Added:

Changed:

Deprecated:

Fixed:

Known Issues:

Removed:

v0.8.69 - 2017/06/06

The main changes in this version are changes to the Table and Schema structure, including a major refactoring of Physical Table. The concept of Availability was split off from Physical Table, allowing Fili to better reason about availability of columns in Data Sources in ways that it couldn't easily do before, like in the case of Unions. As part of this refactor, Fili also gains 1st-class support for queries using the Union data source.

Full description of changes to Tables, Schemas, Physical Tables, Availability, PartialDataHandler, etc. tbd

This was a long and winding journey this cycle, so the changelog is not nearly as tight as we'd like (hopefully we'll come back and consolidate it for this release), but all of the changes are in there. Along the way, we also addressed a number of other small concerns. Here are some of the highlights beyond the main changes around Physical Tables:

Fixes:

  • Unicode characters are now properly sent back to Druid
  • Druid client now follows redirects

New Capabilities & Enhancements:

  • Can sort on dateTime
  • Can use Druid query response for final verification of response partiality
  • Class Scanner Spec can discover dependencies, making its dynamic equality testing easier to use
  • There's an example application that shows how to slurp configuration from an existing Druid instance
  • Druid queries return a Future instead of void, allowing for blocking requests if needed (though use sparingly!)
  • Support for extensions defining new Druid query types

Performance upgrades:

  • Lazy DruidFilters
  • Assorted log level reductions
  • Lucene "total results" 50% speedup

Deprecations:

  • DataSource::getDataSources no longer makes sense, since UnionDataSource only supports 1 table now
  • BaseTableLoader::loadPhysicalTable. Use loadPhysicalTablesWithDependency instead
  • LogicalMetricColumn isn't really a needed concept

Removals:

  • PartialDataHandler::findMissingRequestTimeGrainIntervals
  • permissive_column_availability_enabled feature flag, since the new Availability infrastructure now handles this
  • Lots of things on PhysicalTable, since that system was majorly overhauled
  • SegmentMetadataLoader, which had been deprecated for a while and relies on no longer supported Druid features

Added:

Changed:

Deprecated:

Fixed:

Removed:

  • Refactor Physical Table Definition and Update Table Loader

    • Removed deprecated PhysicalTableDefinition constructor that takes a ZonlessTimeGrain. Use ZonedTimeGrain instead
    • Removed BaseTableLoader::buildPhysicalTable. Table building logic has been moved to PhysicalTableDefinition
  • Move UnionDataSource to support only single tables

    • DataSource no longer accepts Set<Table> in a constructor
  • CompositePhsyicalTable Core Components Refactor

    • Removed deprecated method PartialDataHandler::findMissingRequestTimeGrainIntervals
    • Removed permissive_column_availability_enabled feature flag support and corresponding functionality in PartialDataHandler. Permissive availability is instead handled via table configuration, and continued usage of the configuration field generates a warning when Fili starts.
    • Removed getIntersectSubintervalsForColumns and getUnionSubintervalsForColumns from PartialDataHandler. Availability now handles these responsibilities.
    • Removed getIntervalsByColumnName, resetColumns and hasLogicalMapping methods in PhysicalTable. These methods were either part of the availability infrastructure, which changed completely, or the responsibilities have moved to PhysicalTableSchema (in the case of hasLogicalMapping).
    • Removed PartialDataHandler::getAvailability. Availability (on the PhysicalTables) has taken it's place.
    • Removed SegmentMetadataLoader because the endpoint this relied on had been deprecated in Druid. Use the DataSourceMetadataLoader instead.
      • Removed SegmentMetadataLoaderHealthCheck as well.
  • Major refactor for availability and schemas and tables

    • Removed ZonedSchema (all methods moved to child class ResultSetSchema)
    • PhysicalTable no longer supports mutable availability
      • Removed addColumn, removeColumn, getWorkingIntervals, and commit
      • Other mutators no longer exist, availability is immutable
      • Removed getAvailableIntervals. Availability::getAvailableIntervals replaces it.
    • Removed DruidResponseParser::buildSchema. That logic has moved to the ResultSetSchema constructor.
    • Removed redundant buildLogicalTable methods from BaseTableLoader

v0.7.37 - 2017/04/04

This patch is to back-port a fix for getting Druid to handle international / UTF character sets correctly. It is included in the v0.8.x stable releases.

Fixed:

v0.7.36 - 2017/01/30

This release is a mix of fixes, upgrades, and interface clean-up. The general themes for the changes are around metric configuration, logging and timing, and adding support for tagging dimension fields. Here are some of the highlights, but take a look in the lower sections for more details.

Fixes:

  • Deadlock in LuceneSearchProvider
  • CORS support when using the RoleBasedAuthFilter

New Capabilities & Enhancements:

  • Dimension field tagging
  • Controls around max size of Druid response to cache
  • Logging and timing enhancements

Deprecations / Removals:

  • RequestLog::switchTiming is deprecated due to it's difficulty to use correctly
  • Metric configuration has a number of deprecations as part of the effort to make configuration easier and less complex

Changes:

  • There was a major overhaul of Fili's dependencies to upgrade their versions

Added:

Changed:

Deprecated:

Fixed:

Removed:

v0.6.29 - 2016/11/16

This release is focused on general stability, with a number of bugs fixed, and also adds a few small new capabilities and enhancements. Here are some of the highlights, but take a look in the lower sections for more details.

Fixes:

  • Dimension keys are now properly case-sensitive (
    • Because this is a breaking change, the fix has been wrapped in a feature flag. For now, this defaults to the existing broken behavior, but this will change in a future version, and eventually the fix will be permanent.
  • all-grain queries are no longer split
  • Closed a race condition in the LuceneSearchProvider where readers would get an error if an update was in progress
  • Correctly interpreting List-type configs from the Environment tier as a true List
  • Stopped recording synchronous requests in the ApiJobStore, which is only intended to hold async requests

New Capabilities & Enhancements:

  • Customizable logging format
  • X-Request-Id header support, letting clients set a request ID that will be included in the Druid query
  • Support for Druid's In filter
  • Native support for building DimensionRows from AVRO files
  • Ability to set headers on Druid requests, letting Fili talk to a secure Druid
  • Better error messaging when things go wrong
  • Better ability to use custom Druid query types

Added:

Changed:

Deprecated:

Fixed:

v0.1.x - 2016/09/23

This release focuses on stabilization, especially of the Query Time Lookup (QTL) capabilities, and the Async API and Jobs resource. Here are the highlights of what's in this release:

  • A bugfix for the DruidDimensionLoader
  • A new default DimensionLoader
  • A bunch more tests and test upgrades
  • Filtering and pagination on the Jobs resource
  • A userId field for default Job resource representations
  • Package cleanup for the jobs-related classes

Added:

Deprecated:

Changed:

  • Removed physicalName lookup for metrics in TableUtils::getColumnNames to remove spurious warnings
    • Metrics are not mapped like dimensions are. Dimensions are aliased per physical table and metrics are aliazed per logical table.
    • Logical metric is mapped with one or many physical metrics. Same look up logic for dimension and metrics doesn't make sense.

Jobs:

  • HashPreResponseStore moved to test root directory.

    • The HashPreResponseStore is really intended only for testing, and does not have capabilities (i.e. TTL) that are needed for production.
  • The TestBinderFactory now uses the TestAsynchronousWorkflowsBuilder

    • This allows the asynchronous functional tests to add countdown latches to the workflows where necessary, allowing for thread-safe tests.
  • Removed JobsApiRequest::handleBroadcastChannelNotification

    • That logic does not really belong in the JobsApiRequest (which is responsible for modeling a response, not processing it), and has been consolidated into the JobsServlet.
  • ISSUE-17 Added pagination parameters to PreResponse

    • Updated JobsServlet::handlePreResponseWithError to update ResultSet object with pagination parameters
  • Enrich jobs endpoint with filtering functionality

    • The default job payload generated by DefaultJobPayloadBuilder now has a userId
  • Removed timing component in JobsApiRequestSpec

    • Rather than setting an async timeout, and then sleeping, JobsApiRequestSpec::handleBroadcastChannelNotification returns an empty Observable if a timeout occurs before the notification is received now verifies that the Observable returned terminates without sending any messages.
  • Reorganizes asynchronous package structure

    • The jobs package is renamed to async and split into the following subpackages:
      • broadcastchannels - Everything dealing with broadcast channels
      • jobs - Everything related to jobs, broken into subpackages
        • jobrows - Everything related to the content of the job metadata
        • payloads - Everything related to building the version of the job metadata to send to the user
        • stores - Everything related to the databases for job data
      • preresponses - Everything related to PreResponses, broken into subpackages
        • stores - Everything related to the the databases for PreResponse data
      • workflows - Everything related to the asynchronous workflow

Query Time Lookup (QTL)

  • QueryTimeLookup Functionality Testing

    • AbstractBinderFactory now uses TypeAwareDimensionLoader instead of KeyValueStoreDimensionLoader
  • Fix Dimension Serialization Problem with Nested Queries

    • Modified DimensionToDefaultDimensionSpec serializer to serialize Dimension to apiName if it's not in the inner-most query
    • Added Util::hasInnerQuery helper in serializer package to determine if query is the inner most query or not
    • Added tests for DimensionToDefaultDimensionSpec

General:

Fixed: