refactor!: Adapt to apify-client v3#719
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #719 +/- ##
==========================================
- Coverage 86.87% 86.36% -0.52%
==========================================
Files 48 48
Lines 2942 2918 -24
==========================================
- Hits 2556 2520 -36
- Misses 386 398 +12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
a45f20c to
4270cb9
Compare
c26dd9f to
7d6dbb1
Compare
3e14e50 to
12a29ae
Compare
ba4d354 to
e094d6a
Compare
4528205 to
9c44f3b
Compare
### Summary This is a major refactoring that introduces fully typed Pydantic models throughout the client library. The models are generated from the OpenAPI specifications. All API responses now return typed objects instead of raw dictionaries. This follows up on apify/apify-docs#2182. ### Issues - Closes: #21 - Closes: #481 ### Packages - Add direct dependency on `Pydantic`. - Removes the dependency on `apify-shared`. - Add dev dependency [datamodel-code-generator](https://koxudaxi.github.io/datamodel-code-generator/) for model generation. ### Key changes - Uses [datamodel-code-generator](https://koxudaxi.github.io/datamodel-code-generator/) tool configured via `pyproject.toml` to generate Pydantic models based on the [OpenAPI specs](https://docs.apify.com/api/openapi.json). - Refactors the whole codebase to adopt the new generated models. - All resource clients now return typed Pydantic models (`Actor`, `Task`, `Run`, etc.). - Adds response wrappers for validating and extracting API response data. - Updates list methods to return typed pagination models. - Documentation examples now use typed attribute access. - Updates the SDK to use the new typed client. - See the corresponding PR in `apify/apify-sdk-python` for details - apify/apify-sdk-python#719. - It will be merged later. ### Architecture - Get rid of 3/4/5 levels of inheritance. - Get rid of inline imports because of circular dependencies. - I had to utilize `ClientRegistry` to be able to achieve that (because of resource clients-siblings imports). ### Breaking changes - Client methods now return Pydantic models instead of dicts. - Access patterns change from dict-style (`result['key']`) to attribute-style (`result.key`). ### Test plan - Updated test concurrency to 16 workers. - A lot of new tests were implemented - coverage ~95%. - Unit tests - do not call production API, only for testing utils or other functionality using mocks. - Integration tests - call production API. - Thanks to the new tests, I was able to do a lot of fixes in the OpenAPI specs. ### Next steps - Explore the generation of resource clients using [openapi-python-client](https://github.com/openapi-generators/openapi-python-client). - Fully automate model updates based on changes in [apify-api/openapi](https://github.com/apify/apify-docs/tree/master/apify-api/openapi). - This will be released as part of the Apify client v3.0.
8057e7c to
05a8aba
Compare
a5fb9c8 to
fc2dde8
Compare
fc2dde8 to
e3439b4
Compare
7ac5f59 to
2019646
Compare
4a98a06 to
cb9ed09
Compare
cb9ed09 to
24a6fa8
Compare
1a1043e to
a23a8b5
Compare
d11de2b to
dcdee59
Compare
Build the SDK on apify-client v3 and drop the apify-shared dependency: typed model responses (Run, pricing info, webhook representations), Literal string aliases instead of StrEnum classes, the new tiered timeout system, and a slimmed-down @DataClass Webhook. Collapse the SDK's standalone pricing-info models into thin subclasses of the apify-client models that relax only the fields the platform's APIFY_ACTOR_PRICING_INFO env var omits, so Run.pricing_info from the API flows through unchanged and the converter is removed. Configuration.actor_pricing_info keeps its discriminated-union shape (no public API change), and event_price_usd is now correctly optional so tier-priced pay-per-event Actors no longer fail env-var validation. Document these changes in the v4 upgrading guide.
3f389f8 to
56f2959
Compare
| max_retries=8, | ||
| min_delay_between_retries_millis=500, | ||
| timeout_secs=360, | ||
| min_delay_between_retries=timedelta(milliseconds=500), |
There was a problem hiding this comment.
It is a default value, the same as timeout_secs. Maybe we can skip it?
| max_retries=8, | ||
| min_delay_between_retries_millis=500, | ||
| timeout_secs=360, | ||
| min_delay_between_retries=timedelta(milliseconds=500), |
There was a problem hiding this comment.
It is a default value, the same as timeout_secs. Maybe we can skip it?
| from crawlee._utils.byte_size import ByteSize | ||
| from crawlee._utils.file import json_dumps | ||
| from crawlee.storage_clients._base import DatasetClient | ||
| from crawlee.storage_clients.models import DatasetItemsListPage, DatasetMetadata |
There was a problem hiding this comment.
Do we need these manually created models?
Why not use the full client models, like Dataset from the client (apify-client-python/src/apify_client/_models.py)?
| if metadata is None: | ||
| raise ValueError('Failed to retrieve key-value store metadata.') | ||
|
|
||
| return ApifyKeyValueStoreMetadata( |
There was a problem hiding this comment.
Do we need these manually created models? (same as for dataset)
| _ensure_context = ensure_context('active') | ||
|
|
||
|
|
||
| # --- SDK-side Actor pricing-info models --------------------------------------------------------------- |
There was a problem hiding this comment.
This comment style looks out of place
| ) | ||
|
|
||
| return ActorRun.model_validate(api_result) | ||
| if run is None: |
There was a problem hiding this comment.
Can it be None?
Type hint says it is always Run
https://github.com/apify/apify-client-python/blob/master/src/apify_client/_resource_clients/actor.py#L236
| ENCRYPTED_STRING_VALUE_PREFIX = 'ENCRYPTED_VALUE' | ||
| """Prefix for encrypted string values in Actor input.""" | ||
|
|
||
| ENCRYPTED_JSON_VALUE_PREFIX = 'ENCRYPTED_JSON' |
There was a problem hiding this comment.
What is this used for now?
Is not using it on line 17 intentional?
| EVENT_LISTENERS_TIMEOUT = timedelta(seconds=5) | ||
| """Timeout for waiting on event listeners to finish during Actor exit.""" | ||
|
|
||
| BASE64_REGEXP = '[-A-Za-z0-9+/]*={0,3}' |
There was a problem hiding this comment.
I think the regexp was more readable with this variable instead of repeating it 3 times inline.
| api_url = os.getenv(_API_URL_ENV_VAR) | ||
|
|
||
| return ApifyClientAsync(apify_token, api_url=api_url) | ||
| if api_url is not None: |
There was a problem hiding this comment.
On line 243, the same thing is done using a different pattern. Maybe pick one and stick to it.
Description
apify-python-clientv3, which introduces fully typed API clients generated from OpenAPI specifications.Issues
RemainingTimeliteral toinherit#697PricingModel#853Testing
apify-python-clientv3.