ElectronJS/src/taskGrid/FlowChart.html
× Tip: Save successful!
× Tip: The task version you loaded is not the latest, which may cause compatibility issues.
× Tip: The save name does not conform to MySQL table naming conventions, please try again!
Set AnchorCut ElementCopy ElementDel ElementCancel Drag button above to the flowchart to insert.
■
×
The following are alternative XPath expressions, in addition to the default generated XPath, that can all locate the same element (although not entirely accurate, they may also locate other elements besides the intended one, so they are provided here for reference only). Each line contains an XPath expression (you can use the pre-installed XPath Helper extension for debugging):{{XPaths}}
Tip: Hover over the smiley face to view hints, double-click on an action in the flowchart to trial run (can only run when the webpage is fully loaded), right-click on an action to see more options.Option Name:
Use link inside the Loop
Links (one link per line, the entire workflow will be executed as many times as there are lines of links):Link (Only one link can be put here) Maximum wait time for page load (in seconds): After executed, whether scroll down:No scrollingScroll one screenScroll to the endKeep scrolling until the page data does not changeScroll Times (the wait time after scrolling ineffective when the scrolling type is set to no scrolling ):Wait time after scrolling (in seconds):
Click here to expand/collapse advanced operations
Set Cookies after page loaded:
Click here to get cookies of current page
Element is inside iframe
Use element located by xpath relative to the loop
XPath (Or use "point(10,10)" to represent clicking on the web page at coordinate position (10, 10), suitable for the situation when need to click on a blank area to leave popup dialog): ☺
Click here to view other equivalent XPath expressions
The final XPath of this element when the task is running:
Click here to expand/collapse advanced operations
Execute a JavaScript script before clicking on this element:Maximum wait time for script execution (0 represents unlimited wait time): Execute a JavaScript script after clicking on this element: Maximum wait time for script execution (0 represents unlimited wait time):
Maximum wait time for page load after clicking (in seconds):Click Type (including double-click):SeleniumJavaScriptDouble-clickOpen link in new tab:YesNoWhether to scroll down after clicking:No ScrollingScroll one screenScroll to the endKeep scrolling until the page data does not changeScroll Times:Wait time after scrolling (in seconds):Maximum file download wait time (in seconds):Way to handle pop-up windows after clicking:
No pop-up windowAccept pop-up windowReject pop-up window (only for Confirm pop-up window)
New Field
Clear other field existing values before extracting
This operation will generate a new row of data ☺
Field Name | Example Value | Operations |
| | {{params.parameters[i-1]["exampleValues"][0]["value"]}} | ModifyDeleteUpDown |
Current parameter name (Clicking on the "Modify" option of the field toggles the parameters)
{{params.parameters[paraIndex]["name"]}}
Use relative XPath
Element is inside iframe
XPATH (Field["FieldName"] and eval("your code") can be used in any XPATHS): ☺
Click here to view other equivalent XPath expressions
Final XPath of this field when the task is running:{{getFinalXPath(params.parameters[paraIndex]['relativeXPath'], params.parameters[paraIndex]['relative'])}} Trail Run (only test the first matched element)
Click here to expand/collapse advanced operations
Execute a JavaScript script before extracting data from this element: Maximum wait time for script execution (0 represents unlimited wait time): Execute a JavaScript script after extracting data from this element: Maximum wait time for script execution (0 represents unlimited wait time):
Extract TypeText (include child element)Text (exclude child element)innerHTMLouterHTMLBackground Image AddressWebpage URLWebpage TitleConstant StringElement ScreenshotOCR ResultsProperties of elementsReturn value of JavaScript code (for this element), starting with 'return')System command return valueValue of a Python expression (the "eval" operation)Selected value of the current select boxSelected text of the current select box Constant String:
Attribute Name:
Code (Use Field["FieldName"] to input the lastest value of a field): Maximum wait time for script execution (0 represents unlimited wait time): Node TypeOrdinary NodeLink TextLink AddressForm ValueImage Address Whether to download image after extracting the image address: NoYes Parameter type conversion (for Excel and Database):Text (for single values estimated to exceed 10,000 in length, please choose Large Text)Integer (up to 9 digits)Floating Number (Decimal)Large Text (single value length exceeding 10,000 but less than 1,000,000)Date TimeDateTimeSmall Text (single value length less than 50)Extra Large Text (single value length exceeding 1,000,000)Large Integer (more than 9 digits)Default value when cannot find this element:Wrap content to new line (set when collecting long articles and wanting to wrap):NoYesWhether to save this field: (Choose 'No' if you only want to treat this field as a variable and not save it):YesNoParameter Description:
Element is inside iframe
Use text from the loop (If unchecked, the text entered each time will be the text from the "Input Value" text box below. If checked, it will use the text set within the loop.)
Index value (0 represents using the entire current loop text. If greater than 0, it represents the text index value separated by "" within the current loop. For example, if the current loop text value is AB, index value 1 represents inputting A, 2 represents inputting B, and 0 represents inputting A~B)
Input value (Use Field["FieldName"] to input the latest extracted/returned value of a field or custom operation. Use <enter> or <ENTER> to simulate pressing the Enter key, use JS("return JS code") and eval("Python code") to replace with the return values of JavaScript and Python code: XPath: ☺
Click here to view other equivalent XPath expressions
Click here to expand/collapse advanced operations
Execute JavaScript script on this element before entering text into it: Maximum wait time for script execution (0 represents unlimited wait time): Execute JavaScript script on this element after entering text into it: Maximum wait time for script execution (0 represents unlimited wait time):
Action is inside iframe
Action ModeExecute JavaScript script (Start with "return " if you want to get return value)Execute operating system-level commandExecute JavaScript script for the current element inside the loopExit Current Loop (the "Break" operation)Skip Current Loop (the "Continue" operation)Run Python code on current environment (the "exec" operation)Get value of a Python expression (the "eval" operation)Pause program execution (such as when the captcha box appears)Exit ProgramRefresh pageSend EmailClear all field valuesGenerate new data row Code (Use Field["FieldName"] to input the lastest value of a field):
Please read the instructions first and then write the specific code in the input box above (not in this box). To execute a large amount of code, you can simply write "outside:myCode.py" and the program will read and execute the code within myCode.py under the EasySpider directory.
Be aware that statements containing exec and eval operations and XPath cannot be tested on the current page, and can only be run when they are actually called upon in a task.
This option is an advanced feature that allows direct manipulation of the running browser using Python code. You can also customize variables in the entire execution environment and perform operations such as modifying and assigning values. Here are some examples:
1. Use `self.browser` to refer to the current browser being operated. You can directly use Selenium's API to perform operations, such as `self.browser.find_element(By.CSS_SELECTOR, "body").send_keys(Keys.END)` to scroll to the bottom.
2. Define a global variable: `self.myVar = 1`
3. Manipulate the above-defined global variable: `self.myVar = self.myVar + 1`
4. Print the above-defined global variable: `print(self.myVar)`
5. Assign the value of a custom variable to the value extracted from a field: `self.myVar = self.outputParameters["Field Name"]`
If you want to record your custom variable as a field, please select the next option, "Get value of a Python expression (the "eval" operation)"
6. If you want to import and use a third-party library that the program itself does not carry, you need to first use tools such as pip to install this library locally, and then add the path of the installed library before import, like:
(1) In the system command line execute the following command to install the library:
pip install emotlib
(2) Write the following code in the code box:
sys.path.append("D:/Python38/Lib/site-packages") # Assume emotlib library exists in this path
import emotlib # Now you can use emotlib library
print(emotlib.emoji()) # Use one of its functions.
Please read the instructions first and then write the specific code in the input box above (not in this box). To execute a large amount of code, you can simply write "outside:myCode.py" and the program will read and execute the code within myCode.py under the EasySpider directory.
This option is an advanced feature that allows directly returning the expression value of Python code, and in other places, use Field["FieldName"] to represent the return value of this operation.. Here are some examples:
1. Return relevant values of the current browser object. Use `self.browser` to refer to the current browser being operated. You can directly use Selenium's API to perform operations, such as `self.browser.find_element(By.CSS_SELECTOR, "body").text` to return the text on the current page.
2. Return the value of a custom global variable: `self.myVar`
3. Return the result of a conditional statement: `self.myVar == 1`
4. Determining whether the value extracted from a certain field is equal to the value of a certain variable: self.outputParameters["field name"] == self.myVar
Please note that this feature does not support assigning values to variables. In other words, you cannot write something like `self.myVar = 1`. If you want to perform assignment operations, please select the previous option, "Run Python code on current environment (the "exec" operation)"
Whether to record the output/return value of the execution as a field:
NoYes
Convert parameter type to:Text (for single values estimated to exceed 10,000 in length, please choose Large Text)Integer (up to 9 digits)Floating Number (Decimal)Large Text (single value length exceeding 10,000 but less than 1,000,000)Date TimeDateTimeSmall Text (single value length less than 50)Extra Large Text (single value length exceeding 1,000,000)Large Integer (more than 9 digits)
Clear values of other parameters before record as a field:NoYes
This operation will generate a new row of data: YesNo
Maximum wait time for script execution (0 represents unlimited wait time):
This operation can pause program execution, such as when a captcha box appears, and it will not continue until you manually press and hold the pause/continue shortcut key (default: key p).
This operation can refresh the current page.
This operation can send emails, for example, to notify by email when a web scraping task is completed.SMTP email server host:Email server port:Sender email username:Sender email password (Be careful not to leak the task file if a password is set!):Recipient email address:Email subject:Email content:
This action can clear all field values, such as when used before starting a web scraping task to clear all values.
This action can generate a new row of data, such as when designing a web scraping task to not generate rows of data temporarily, and instead generate a new row of data once all fields have been extracted.
Element is inside iframe
Use index value from within the loop (if unchecked, the setting will be the value of the "Set Value" text box below. If checked, it will use the index value set within the loop's dropdown box.)
Relative index value of the loop value (0 represents using the entire current loop's text. If greater than 0, it represents the value of the text separated by "" within the current loop. For example, if the current loop's text value is 23 and you enter 2 here, it means taking the second item of the text, which is value 3, indicating setting the dropdown box to the third item.)
XPath:
Click here to view other equivalent XPath expressions
Option switch Mode
Switch to the next optionSwitch options by index (0 is the first option)Switch options by option valueSwitch options by option text
Set value (not applicable for "Switch to the next option" mode)
Element is inside iframe
Use element located by xpath relative to the loop
XPath:
Click here to view other equivalent XPath expressions
The final XPath of this element when the task is running:
Operation is in iframe
Loop Type:Single ElementUnfixed Element ListFixed Element ListText ListWeblink ListReturn value of JavaScript command (start with 'return ')Return value of system commandReturn value of Python code under current environment XPath: ☺
(Testing feature) Click here to view other possible XPath expressions
XPath List:
Content List (Use Field["FieldName"] to input the lastest value of a field):
Code (Use Field["FieldName"] to input the lastest value of a field):
Please read the instructions first and then write the specific code in the input box above (not in this box). To execute a large amount of code, you can simply write "outside:myCode.py" and the program will read and execute the code within myCode.py under the EasySpider directory.
Loop based on the expression value of Python code. Here are some examples:
1. Return relevant values of the current browser object. Use `self.browser` to refer to the current browser being operated. You can directly use Selenium's API to perform operations, such as `self.browser.find_element(By.CSS_SELECTOR, "body").text=="123"`, which checks whether the current page contains the text "123".
2. Return the value of a custom global variable: `self.myVar`
3. Return the result of a conditional statement: `self.myVar > 1`
4. Determining whether the value extracted from a certain field is equal to the value of a certain variable: self.outputParameters["field name"] == self.myVar
If the expression returns a value greater than 0 or evaluates to True, the loop continues; otherwise, it stops.
Maximum wait time for script execution (0 represents unlimited wait time):
Maximum number of loop iterations (0 represents an infinite loop until no more elements are found or no changes in page content are detected):Exit the loop when the content of the following elements on the page does not change (effective when the count is 0. If it is a multi-layer nested iframe, it is recommended to write an XPath for an element that only exists within the iframe page you want to extract, such as /html/body/div[@class='LeftSide_menu']):
(Advanced Operation) Define loop exit condition using code/script; or you can add a Custom Action , then select the "Exit Loop" option:
Do not set script (even if a script is written below, it will not be executed)JavaScript script (start with 'return ')Operating system-level command Maximum wait time for script execution (0 represents unlimited wait time):
Skip the first few loops (enter 2 to skip the first 2 loops): Waiting time in seconds after a history record rollback: After executed, whether scroll down:No ScrollingScroll one screenScroll to the endKeep scrolling until the page data does not changeScroll Times:Wait time after scrolling (in seconds):
The conditions are evaluated from left to right, which means if the condition in the leftmost branch is satisfied, the operations within that branch are executed. Otherwise, the condition in the next branch from left to right is evaluated, and so on. Clicking on a branch while designing tasks allows for dynamic debugging in the browser to verify if the branch satisfies the condition (not applicable to system commands and Python Eval operations).
The conditions are evaluated from left to right, which means if the condition in the leftmost branch is satisfied, the operations within that branch are executed. Otherwise, the condition in the next branch from left to right is evaluated, and so on. Clicking on a branch while designing tasks allows for dynamic debugging in the browser to verify if the branch satisfies the condition (not applicable to system commands and Python Eval operations).
Operation is in iframe
Condition Type:No ConditionText inside current pageElement inside current pageText inside current loopElement inside current loopReturn value of JavaScript command (start with 'return ')Return value of system commandReturn value of JavaScript command for the current loop itemReturn value of Python code under current environment Text/Element XPath to Include: ☺
Code/Script Content:
Please read the instructions first and then write the specific code in the input box above (not in this box). To execute a large amount of code, you can simply write "outside:myCode.py" and the program will read and execute the code within myCode.py under the EasySpider directory.
Use the expression value of Python code to determine whether a condition is satisfied. Here are some examples:
1. Return relevant values of the current browser object. Use `self.browser` to refer to the current browser being operated. You can directly use Selenium's API to perform operations, such as `self.browser.find_element(By.CSS_SELECTOR, "body").text=="123"`, which checks whether the current page contains the text "123".
2. Return the value of a custom global variable: `self.myVar`
3. Return the result of a conditional statement: `self.myVar == 1`
4. Determining whether the value extracted from a certain field is equal to the value of a certain variable: self.outputParameters["field name"] == self.myVar
If the expression returns a value greater than 0 or evaluates to True, the operations within this branch will be executed; otherwise, they will be skipped.
Maximum wait time for script execution (0 represents unlimited wait time):
Code/Script Content (Click here for more examples): Maximum wait time for script execution (0 represents unlimited wait time):
Wait for the following elements to appear before executing:In which iframe is the element located? Set to 0 if the element is not inside an iframe:Maximum waiting time for element appearance (in seconds): Wait seconds after execution (can set decimal values, e.g., 0.5):Wait TypeFixed wait (set to wait for 10 seconds then it will wait for 10 seconds)Random wait (set to wait for 10 seconds then it will randomly wait for 10 × 0.5 - 10 × 1.5 seconds)
×
Task Name:Task Description:Export Data Format (Excel/CSV/TXT/Database):CSV (Recommended for collecting long articles)XLSX (Excel file, recommended use CSV format when single cell exceeds 500 characters)TXTJSONMySQL Database (recommended for large amounts of data)Export File Name/Database Table Name (Can use ../ to represent relative path to change the file save location,the keyword "current_time" will be replaced with the timestamp when the task is executed):Data Write Mode (The export file/database table name above must be fixed, effective when the same task ID is executed multiple times):Append (If the file exists, append to it)Overwrite (If the file exists, overwrite it)Rename on Write (renames file if it already exists)Remove duplicates after execution (note that this function will be executed at the end of the task, and leaving the task midway will not perform deduplication):NoYes (Note that the file name above should be a fixed name rather than 'current_time', and the execution ID for each task execution should be the same)To modify the input parameters of each operation during execution, read the following Excel (.xlsx) file. Please click the "Read Input Parameters from Excel File" button when calling the task to view the file format:Browser Emulation Type:DesktopMobileWhether to maximize the browser window:NoYesSave Data Every N Rows (Specify N below, the larger the value, the faster the scraping speed, but there is a risk of data loss if unexpectedly exited):Do you want to resume execution from the last saved position when unexpectedly exiting and restarting the task (The record interval of the number of collected items is the value set above)?NoYes (Requires running the same task ID and the same file name, please execute from the command line and specify the ID)Wait time for the browser to close after the task is executed (in seconds), the temporary user data directory will be automatically deleted after the browser is closed:Maximum Display Length of Data in Console Preview:Record log when executing the task:YesNoPause/Continue Task Shortcut Key:
Save asSave